Addressing DuckDB-Inspired SQL Syntax Requests in SQLite


Analyzing Proposed SQL Syntax Enhancements for SQLite Compatibility

The discussion revolves around adapting specific SQL syntax features from DuckDB to SQLite, with a focus on improving usability during data analysis workflows. The proposed enhancements include:

  1. SELECT * EXCLUDE: A shorthand to exclude specific columns from a wildcard (*) selection.
  2. SELECT * REPLACE: A method to modify specific columns while retaining others in a wildcard selection.
  3. GROUP BY ALL: A clause to group results by all non-aggregated columns in the SELECT list.
  4. Python-Style String Slicing: Using slice notation (e.g., [:-3]) as an alternative to substr().
  5. Trailing Commas: Allowing commas at the end of lists in SELECT, GROUP BY, or INSERT statements.

These features aim to reduce boilerplate code, minimize human error, and align SQLite’s syntax with modern programming conventions. However, their adoption in SQLite involves technical, syntactic, and philosophical considerations. Below, we dissect the challenges and explore actionable solutions.


Technical and Design Constraints in SQLite

1. Keyword Conflicts and Reserved Words

SQLite reserves EXCLUDE for conflict resolution in UPSERT clauses (e.g., DO UPDATE SET ... EXCLUDED.column). Introducing SELECT * EXCLUDE would create ambiguity, as the parser must distinguish between EXCLUDE as a modifier in SELECT versus its role in UPSERT. DuckDB avoids this issue because it does not support EXCLUDED in the same context.

2. Ambiguity in GROUP BY ALL Semantics

In DuckDB, GROUP BY ALL groups results by all non-aggregated columns in the SELECT clause. However, SQLite already has unique behavior for bare columns in aggregate queries: if a GROUP BY is omitted, SQLite selects a single row with MIN()/MAX() values for bare columns. Adding GROUP BY ALL could confuse users expecting DuckDB-like behavior, especially if their queries mix aggregates and bare columns.

3. Overlap with Existing Functions

Python-style string slicing (e.g., 'text'[:-3]) overlaps functionally with SQLite’s substr(). While slicing offers syntactic sugar, it introduces a non-standard way to manipulate strings, conflicting with SQLite’s philosophy of avoiding redundant syntax.

4. Parsing and Backward Compatibility

Trailing commas are prohibited in SQLite to enforce strict syntax validation. Allowing them would require changes to the parser, risking backward incompatibility with applications that rely on the current behavior.

5. Design Philosophy

SQLite prioritizes simplicity, reliability, and strict adherence to the SQL standard unless there’s a compelling reason to deviate. Features like SELECT * EXCLUDE or GROUP BY ALL introduce non-standard syntax, which may conflict with this philosophy unless they solve widespread usability issues.


Workarounds, Implementation Strategies, and Long-Term Fixes

1. Emulating SELECT * EXCLUDE and SELECT * REPLACE

Problem: Manually listing columns after SELECT * is error-prone and tedious.
Solution:

  • Explicit Column Lists: Use PRAGMA table_info(table_name) to dynamically generate column lists and exclude/replace columns programmatically.
    -- Exclude 'jar_jar_binks' and 'midichlorians'
    SELECT systems, planets, cities FROM star_wars;
    
  • Views: Create views that exclude or transform columns upfront.
    CREATE VIEW star_wars_filtered AS 
    SELECT * EXCLUDE (jar_jar_binks, midichlorians) FROM star_wars;
    

Long-Term Fix: Propose a new operator like EXCEPT (used in Redshift) to avoid keyword conflicts:

SELECT * EXCEPT (jar_jar_binks, midichlorians) FROM star_wars;

2. Replicating GROUP BY ALL Behavior

Problem: Writing all non-aggregated columns in GROUP BY is repetitive.
Solution:

  • Explicit Column Enumeration: Use SQLite’s GROUP BY with column positions or names.
    SELECT systems, planets, cities, SUM(scum) 
    FROM star_wars_locations 
    GROUP BY 1, 2, 3;
    
  • CTEs or Subqueries: Isolate non-aggregated columns in a subquery.
    WITH base AS (
      SELECT systems, planets, cities, scum FROM star_wars_locations
    )
    SELECT systems, planets, cities, SUM(scum) 
    FROM base 
    GROUP BY systems, planets, cities;
    

Long-Term Fix: Introduce a GROUP BY NON_AGGREGATED clause to explicitly reference non-aggregated columns.

3. String Slicing Alternatives

Problem: Python-style slicing is more intuitive than substr().
Solution: Use substr() with arithmetic for negative indices.

-- 'I love you! I know'[:-3]
SELECT substr('I love you! I know', 1, length('I love you! I know') - 3);

Long-Term Fix: Add a slice() function that accepts start/end indices:

SELECT slice('I love you! I know', -3) as nearly_soloed;

4. Handling Trailing Commas

Problem: Trailing commas break queries and complicate code generation.
Solution: Use code-generation tools that prefix commas:

columns = ["x_wing", "proton_torpedoes"]
query = "SELECT\n  " + ",\n  ".join(columns)

Long-Term Fix: Propose a parser update to ignore trailing commas in lists, similar to JSON5 or C++11.

5. Syntax Standardization and Community Feedback

Engage the SQLite community via mailing lists or forums to gauge demand for these features. Draft formal specifications for each enhancement, addressing:

  • Backward Compatibility: Ensure new syntax does not break existing queries.
  • Performance: Verify that parser changes do not degrade efficiency.
  • Standard Alignment: Align with SQL standards or widely adopted extensions.

By addressing these challenges through a mix of immediate workarounds and long-term proposals, SQLite can evolve to meet modern usability expectations while preserving its core values of simplicity and reliability. Developers are encouraged to contribute to the SQLite enhancement process by submitting detailed proposals or patches for consideration.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *