Addressing DuckDB-Inspired SQL Syntax Requests in SQLite
Analyzing Proposed SQL Syntax Enhancements for SQLite Compatibility
The discussion revolves around adapting specific SQL syntax features from DuckDB to SQLite, with a focus on improving usability during data analysis workflows. The proposed enhancements include:
SELECT * EXCLUDE
: A shorthand to exclude specific columns from a wildcard (*
) selection.SELECT * REPLACE
: A method to modify specific columns while retaining others in a wildcard selection.GROUP BY ALL
: A clause to group results by all non-aggregated columns in theSELECT
list.- Python-Style String Slicing: Using slice notation (e.g.,
[:-3]
) as an alternative tosubstr()
. - Trailing Commas: Allowing commas at the end of lists in
SELECT
,GROUP BY
, orINSERT
statements.
These features aim to reduce boilerplate code, minimize human error, and align SQLite’s syntax with modern programming conventions. However, their adoption in SQLite involves technical, syntactic, and philosophical considerations. Below, we dissect the challenges and explore actionable solutions.
Technical and Design Constraints in SQLite
1. Keyword Conflicts and Reserved Words
SQLite reserves EXCLUDE
for conflict resolution in UPSERT
clauses (e.g., DO UPDATE SET ... EXCLUDED.column
). Introducing SELECT * EXCLUDE
would create ambiguity, as the parser must distinguish between EXCLUDE
as a modifier in SELECT
versus its role in UPSERT
. DuckDB avoids this issue because it does not support EXCLUDED
in the same context.
2. Ambiguity in GROUP BY ALL
Semantics
In DuckDB, GROUP BY ALL
groups results by all non-aggregated columns in the SELECT
clause. However, SQLite already has unique behavior for bare columns in aggregate queries: if a GROUP BY
is omitted, SQLite selects a single row with MIN()
/MAX()
values for bare columns. Adding GROUP BY ALL
could confuse users expecting DuckDB-like behavior, especially if their queries mix aggregates and bare columns.
3. Overlap with Existing Functions
Python-style string slicing (e.g., 'text'[:-3]
) overlaps functionally with SQLite’s substr()
. While slicing offers syntactic sugar, it introduces a non-standard way to manipulate strings, conflicting with SQLite’s philosophy of avoiding redundant syntax.
4. Parsing and Backward Compatibility
Trailing commas are prohibited in SQLite to enforce strict syntax validation. Allowing them would require changes to the parser, risking backward incompatibility with applications that rely on the current behavior.
5. Design Philosophy
SQLite prioritizes simplicity, reliability, and strict adherence to the SQL standard unless there’s a compelling reason to deviate. Features like SELECT * EXCLUDE
or GROUP BY ALL
introduce non-standard syntax, which may conflict with this philosophy unless they solve widespread usability issues.
Workarounds, Implementation Strategies, and Long-Term Fixes
1. Emulating SELECT * EXCLUDE
and SELECT * REPLACE
Problem: Manually listing columns after SELECT *
is error-prone and tedious.
Solution:
- Explicit Column Lists: Use
PRAGMA table_info(table_name)
to dynamically generate column lists and exclude/replace columns programmatically.-- Exclude 'jar_jar_binks' and 'midichlorians' SELECT systems, planets, cities FROM star_wars;
- Views: Create views that exclude or transform columns upfront.
CREATE VIEW star_wars_filtered AS SELECT * EXCLUDE (jar_jar_binks, midichlorians) FROM star_wars;
Long-Term Fix: Propose a new operator like EXCEPT
(used in Redshift) to avoid keyword conflicts:
SELECT * EXCEPT (jar_jar_binks, midichlorians) FROM star_wars;
2. Replicating GROUP BY ALL
Behavior
Problem: Writing all non-aggregated columns in GROUP BY
is repetitive.
Solution:
- Explicit Column Enumeration: Use SQLite’s
GROUP BY
with column positions or names.SELECT systems, planets, cities, SUM(scum) FROM star_wars_locations GROUP BY 1, 2, 3;
- CTEs or Subqueries: Isolate non-aggregated columns in a subquery.
WITH base AS ( SELECT systems, planets, cities, scum FROM star_wars_locations ) SELECT systems, planets, cities, SUM(scum) FROM base GROUP BY systems, planets, cities;
Long-Term Fix: Introduce a GROUP BY NON_AGGREGATED
clause to explicitly reference non-aggregated columns.
3. String Slicing Alternatives
Problem: Python-style slicing is more intuitive than substr()
.
Solution: Use substr()
with arithmetic for negative indices.
-- 'I love you! I know'[:-3]
SELECT substr('I love you! I know', 1, length('I love you! I know') - 3);
Long-Term Fix: Add a slice()
function that accepts start/end indices:
SELECT slice('I love you! I know', -3) as nearly_soloed;
4. Handling Trailing Commas
Problem: Trailing commas break queries and complicate code generation.
Solution: Use code-generation tools that prefix commas:
columns = ["x_wing", "proton_torpedoes"]
query = "SELECT\n " + ",\n ".join(columns)
Long-Term Fix: Propose a parser update to ignore trailing commas in lists, similar to JSON5 or C++11.
5. Syntax Standardization and Community Feedback
Engage the SQLite community via mailing lists or forums to gauge demand for these features. Draft formal specifications for each enhancement, addressing:
- Backward Compatibility: Ensure new syntax does not break existing queries.
- Performance: Verify that parser changes do not degrade efficiency.
- Standard Alignment: Align with SQL standards or widely adopted extensions.
By addressing these challenges through a mix of immediate workarounds and long-term proposals, SQLite can evolve to meet modern usability expectations while preserving its core values of simplicity and reliability. Developers are encouraged to contribute to the SQLite enhancement process by submitting detailed proposals or patches for consideration.