Unused Column Expression Evaluation in SQLite CTEs: Single vs. Multi-Row Optimization Discrepancies


Issue Overview: Unused Column Expressions Evaluated in Single-Row CTEs but Optimized in Multi-Row Scenarios

The core issue revolves around SQLite’s query optimizer selectively evaluating expressions in Common Table Expressions (CTEs) based on whether the CTE produces a single row of constant values or multiple rows derived from a table. When a CTE defines multiple columns but only a subset of those columns are referenced in the final SELECT statement, SQLite may or may not optimize away unused column expressions depending on the structure of the CTE.

Key Observations:

  1. Multi-Row CTE with Table Source:
    In a CTE that selects from a table with multiple rows (e.g., Y(x) with values 1–4), expressions in unused columns (e.g., echo(x+1), echo(x+2), echo(x+3)) are not evaluated by SQLite. This is due to optimization logic that skips computing values for columns not referenced in the outer query.

  2. Single-Row CTE with Constant Expressions:
    In a CTE that defines a single row using constant expressions (e.g., select 1, echo(1), echo(2), echo(3)), SQLite evaluates all expressions during CTE materialization, even if the outer query does not reference those columns. This results in side effects (e.g., echo function calls) being observable, indicating a lack of optimization.

  3. Version-Specific Behavior:
    The behavior was observed in SQLite 3.37.0 but may persist in other versions. The discrepancy arises from differences in how SQLite’s query planner handles CTEs based on their data sources (table-driven vs. constant-driven).

Technical Context:

  • CTE Materialization: SQLite materializes CTEs in memory when they are referenced multiple times or when the MATERIALIZED keyword is implied. However, even in non-materialized CTEs, the evaluation of expressions can vary depending on the optimizer’s decisions.
  • Constant Folding vs. Deferred Execution: Constant expressions in SQLite are often evaluated at compile time (constant folding), whereas expressions involving table data are evaluated at runtime. This distinction influences whether unused column expressions are pruned during optimization.
  • Loop Constructs: In multi-row CTEs, SQLite processes rows iteratively, allowing the optimizer to skip unused columns per row. Single-row CTEs with constants lack such a loop structure, leading to upfront evaluation of all expressions.

Possible Causes: Query Planner Logic, Expression Liftability, and CTE Materialization Strategies

1. Constant Expression Liftability

SQLite identifies "liftable" expressions—constant or deterministic expressions that can be evaluated once and reused. In the single-row CTE select 1, echo(1), echo(2), echo(3), the optimizer lifts constant expressions out of any implicit loops, evaluating them during the initial CTE setup. This occurs even if the outer query does not reference the lifted columns. In contrast, expressions dependent on table columns (e.g., x+1 in the multi-row CTE) are bound to row iteration and subject to per-row optimization.

2. Absence of Row Processing Loops in Single-Row CTEs

Multi-row CTEs derived from tables inherently involve row processing loops. The optimizer recognizes that each row’s unused columns can be skipped dynamically as the loop progresses. Single-row CTEs with constants are treated as static result sets, materialized in full before the outer query executes. Since there is no loop to optimize, all expressions in the CTE are evaluated during materialization.

3. Query Planner’s Column Pruning Heuristics

SQLite’s query planner employs column pruning to eliminate unnecessary computations. However, this pruning is more effective in contexts where expressions are tied to data sources requiring row-by-row processing. In single-row CTEs with constants, the planner may fail to prune columns because:

  • The entire CTE is materialized as a single unit.
  • Constant expressions are considered “cheap” to compute, reducing the incentive for aggressive pruning.
  • The planner lacks context to infer that outer query column references could allow partial materialization.

4. Side Effects in User-Defined Functions (UDFs)

The echo UDF in the example has observable side effects, exposing the optimization discrepancy. SQLite’s optimizer does not account for side effects when deciding whether to prune expressions. If a UDF is marked as deterministic (e.g., using sqlite3_create_function_v2 with SQLITE_DETERMINISTIC), the planner may apply additional optimizations, but this does not resolve the core issue of single-row CTE evaluation.


Troubleshooting Steps, Solutions & Fixes: Aligning CTE Behavior with Optimization Expectations

Step 1: Diagnose Query Plans with EXPLAIN and EXPLAIN QUERY PLAN

Use SQLite’s EXPLAIN command to compare the bytecode generated for both CTE scenarios. This reveals differences in how expressions are evaluated.

Multi-Row CTE Bytecode Highlights:

  • OpenRead opcodes to scan table Y.
  • Loop structure with Next opcodes advancing row iteration.
  • Absence of opcodes for echo calls in unused columns.

Single-Row CTE Bytecode Highlights:

  • ResultRow opcodes evaluating all constant expressions upfront.
  • Function opcodes invoking echo for all columns, including unused ones.

Example:

EXPLAIN
WITH foo(a,b,c,d) AS (SELECT 1, echo(1), echo(2), echo(3))
SELECT a FROM foo;

The output will show SCAN CONSTANT ROW with calls to echo, confirming upfront evaluation.

Step 2: Rewrite Single-Row CTEs to Mimic Multi-Row Optimization

Force SQLite to process the single-row CTE as a multi-row dataset by introducing a synthetic loop.

Solution: Use a virtual table (e.g., sqlite_sequence) or a VALUES clause with a WHERE condition to create a single-row loop:

WITH foo(a,b,c,d) AS (
  SELECT 1, echo(1), echo(2), echo(3) FROM (SELECT 1) WHERE 1
)
SELECT a FROM foo;

This encourages the optimizer to process the row in a loop-like context, potentially pruning unused columns.

Step 3: Isolate Constant Expressions in Subqueries

Prevent constant lifting by nesting expressions in subqueries that the optimizer cannot trivially materialize.

Example:

WITH foo(a,b,c,d) AS (
  SELECT * FROM (SELECT 1 AS a, echo(1) AS b, echo(2) AS c, echo(3) AS d)
)
SELECT a FROM foo;

The subquery SELECT * FROM (...) may delay evaluation until the outer query references the columns.

Step 4: Use Conditional Logic to Bypass Unused Columns

Employ CASE statements or IIF functions to conditionally evaluate expressions. This defers execution until column references are confirmed.

Example:

WITH foo(a,b,c,d) AS (
  SELECT 
    1 AS a,
    CASE WHEN 0 THEN echo(1) END AS b,
    CASE WHEN 0 THEN echo(2) END AS c,
    CASE WHEN 0 THEN echo(3) END AS d
)
SELECT a FROM foo;

The CASE WHEN 0 ensures echo is never called, as the condition is statically false.

Step 5: Update to a Recent SQLite Version

Newer SQLite versions (e.g., 3.39.0+) include improvements to the query planner’s handling of CTEs and unused columns. Test the behavior in the latest version to see if the optimization discrepancy persists.

Checklist:

  • Verify fixes in SQLite’s changelog related to CTE optimization.
  • Test the example queries with sqlite3_changes() to count side effects.

Step 6: Refactor Queries to Avoid CTEs When Possible

If CTE optimization is unreliable for single-row cases, rewrite the query using derived tables or temporary tables.

Derived Table Example:

SELECT a FROM (
  SELECT 1 AS a, echo(1) AS b, echo(2) AS c, echo(3) AS d
) AS foo;

Derived tables may undergo different optimization rules compared to CTEs.

Step 7: Leverage Deterministic Function Declarations

If UDFs are marked as deterministic (using SQLITE_DETERMINISTIC), SQLite may apply more aggressive optimizations, though this does not directly address single-row CTE evaluation.

Caveat: Side effects in deterministic functions are still observable if the function is called.

Step 8: File a Bug Report or Feature Request

If the behavior is deemed inconsistent or problematic, submit a detailed report to the SQLite team. Include:

  • Minimal reproducible examples.
  • EXPLAIN outputs for both CTE types.
  • Rationale for why single-row CTEs should optimize unused columns.

Reference: SQLite Bug Tracking


By systematically diagnosing query plans, restructuring CTEs, and leveraging SQLite’s optimization hints, developers can align single-row CTE behavior with multi-row efficiency. Understanding the interplay between constant lifting, loop structures, and column pruning is critical for writing performant SQLite queries that avoid unnecessary computations.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *