Query Plan Regression in SQLite 3.39+ Due to CTE Materialization Changes
Root Cause: CTE Materialization Strategy Alterations in SQLite 3.39+
Issue Overview
A complex analytical query involving multiple Common Table Expressions (CTEs) experienced a severe performance regression when executed in SQLite versions 3.39.0 through 3.40.1 compared to version 3.38.5. The query, which calculates a rolling least squares fit over time-series data stored in a summary
table with ~337k rows, shifted from executing in seconds to hours after upgrading. The regression stemmed from changes to how SQLite’s query planner handles CTE materialization and index selection.
The query relies heavily on the summary_ix
index (defined as CREATE UNIQUE INDEX summary_ix ON summary(country, date)
) to efficiently filter rows by country
and date
ranges. In SQLite 3.38.5, the query plan utilized this index directly for range scans and joins. However, in versions 3.39.0 and later, the planner began favoring automatic covering indexes and introduced additional materialization steps for CTEs, bypassing the optimal use of summary_ix
. This resulted in full table scans (SCAN summary
) and inefficient temporary B-tree operations (USE TEMP B-TREE
), drastically increasing execution time.
Key observations from the query plans:
3.38.5 Plan:
- Efficient use of
summary_ix
forSEARCH
operations withcountry=? AND date>?
predicates. - Minimal temporary B-tree usage (only for
ORDER BY
andGROUP BY
in CTEs). - Direct joins between CTEs and base tables using indexed columns.
- Efficient use of
3.40.1 Plan:
- Proliferation of
MATERIALIZE
steps for CTEs (raw
,init
,src
,vals
,sums
,mult
). - Reliance on
AUTOMATIC COVERING INDEX
instead ofsummary_ix
for critical date-range filters. - Increased use of temporary B-trees for intermediate results, slowing down large dataset processing.
- Proliferation of
The regression was traced to changes in SQLite’s cost model for materializing CTEs, introduced in version 3.39.0 alongside support for explicit MATERIALIZED
/NOT MATERIALIZED
hints. These changes inadvertently disabled query flattening optimizations for CTEs referenced multiple times, forcing unnecessary materialization and suboptimal index selection.
Critical Factors Behind the Regression
CTE Materialization Forced by Query Planner
- In SQLite 3.39+, CTEs referenced more than once were implicitly treated as
MATERIALIZED
, preventing query flattening (integrating CTE logic into the main query). This led to redundant computations and loss of index-driven optimizations. - Example: The
raw
CTE (SELECT country, date, ... FROM summary
) was materialized as a temporary table, stripping away thesummary_ix
index and forcing full scans.
- In SQLite 3.39+, CTEs referenced more than once were implicitly treated as
Automatic Covering Index Misapplication
- The planner’s preference for automatic covering indexes over explicit indexes (
summary_ix
) in materialized CTEs caused inefficientSEARCH
operations. Automatic indexes lack the selectivity ofsummary_ix
forcountry
/date
predicates, increasing I/O.
- The planner’s preference for automatic covering indexes over explicit indexes (
Temporary B-Tree Overhead
- Materialized CTEs required temporary B-trees for sorting/grouping (
USE TEMP B-TREE FOR GROUP BY
), which scaled poorly with large datasets.
- Materialized CTEs required temporary B-trees for sorting/grouping (
Cost Model Adjustments
- A tuning parameter (
rSetup
) in SQLite’s cost calculation for materialized objects was adjusted, making the planner overly optimistic about the efficiency of automatic indexes for CTEs.
- A tuning parameter (
Comprehensive Troubleshooting and Resolution Strategies
Step 1: Validate Query Plan Changes
- Action: Compare
EXPLAIN QUERY PLAN
outputs across SQLite versions. - Example:
EXPLAIN QUERY PLAN WITH RECURSIVE ... [full query];
- Outcome: Identify shifts from
SEARCH summary USING INDEX summary_ix
toSCAN summary
orAUTOMATIC COVERING INDEX
.
Step 2: Apply Immediate Workarounds
Force Index Usage:
Modify CTEs to explicitly referencesummary_ix
usingINDEXED BY
clauses.WITH raw AS ( SELECT country, date, total_c AS total, delta_c_7d AS delta FROM summary INDEXED BY summary_ix WHERE ... )
Caution: Overuse of
INDEXED BY
can destabilize future optimizations.Rewrite CTEs as Subqueries:
Replace CTEs with derived tables to encourage query flattening.SELECT ... FROM ( SELECT country, date, ... FROM summary WHERE country=? AND date>? ) AS raw JOIN ...
Benefit: Subqueries are more likely to be flattened, preserving index usage.
Use Temporary Tables:
Materialize critical CTEs manually and index them.CREATE TEMP TABLE raw_temp AS SELECT country, date, ... FROM summary WHERE ...; CREATE INDEX tmp_raw_idx ON raw_temp(country, date);
Trade-off: Adds complexity but guarantees index availability.
Step 3: Apply SQLite Source Code Patch
For versions 3.39.0–3.40.1, apply the patch provided in the discussion to adjust the rSetup
cost heuristic:
--- src/where.c
+++ src/where.c
@@ -3590,17 +3590,17 @@
** of X is smaller for views and subqueries so that the query planner
** will be more aggressive about generating automatic indexes for
** those objects, since there is no opportunity to add schema
** indexes on subqueries and views. */
pNew->rSetup = rLogSize + rSize;
if( !IsView(pTab) && (pTab->tabFlags & TF_Ephemeral)==0 ){
pNew->rSetup += 28;
}else{
- pNew->rSetup -= 10;
+ pNew->rSetup -= 21;
}
ApplyCostMultiplier(pNew->rSetup, pTab->costMult);
if( pNew->rSetup<0 ) pNew->rSetup = 0;
/* TUNING: Each index lookup yields 20 rows in the table. This
** is more than the usual guess of 10 rows, since we have no way
** of knowing how selective the index will ultimately be. It would
** not be unreasonable to make this value much larger. */
pNew->nOut = 43; assert( 43==sqlite3LogEst(20) );
Effect: Restores the planner’s preference for existing indexes over automatic indexes in CTEs.
Step 4: Upgrade to SQLite 3.41+ with CTE Optimization Fixes
The root cause was resolved in SQLite trunk check-in 66f29c403d28630b, which:
- Re-enabled query flattening for CTEs referenced multiple times.
- Revised the heuristic for marking CTEs as materialized.
- Preserved index usage in nested CTE structures.
Verification Steps:
- Test the query with a pre-release snapshot.
- Confirm the query plan reverts to using
summary_ix
forSEARCH
operations. - Validate execution time matches or exceeds 3.38.5 performance.
Step 5: Query-Specific Optimizations
Simplify the
fit
CTE:
Reduce joins by pre-computing values in earlier CTEs.-- Original fit AS ( SELECT inv.country, inv.date, a * xy + b * y, c * xy + d * y FROM inv JOIN mult ON mult.country = inv.country AND mult.date = inv.date JOIN sums ON sums.country = mult.country AND sums.date = mult.date ) -- Optimized fit AS ( SELECT inv.country, inv.date, (inv.a * sums.xy) + (inv.b * sums.y) AS a_coeff, (inv.c * sums.xy) + (inv.d * sums.y) AS b_coeff FROM sums, mult, inv WHERE sums.country = mult.country AND sums.date = mult.date AND mult.country = inv.country AND mult.date = inv.date )
Result: Fewer joins reduce temporary B-tree usage.
Leverage Partial Indexes:
Add filtered indexes forcountry
/date
ranges in thesummary
table.CREATE INDEX summary_ix_active ON summary(country, date) WHERE total_c > 0;
Benefit: Accelerates the
init
CTE’smin(date)
/max(date)
calculation.Normalize Floating-Point Operations:
UseROUND()
or precision constraints to mitigate floating-point discrepancies between versions.SELECT ..., ROUND(nFin/nPrev - 1, 4) AS growth FROM ...
Rationale: Prevents insignificant rounding differences from affecting business logic.
Step 6: Monitor and Profile Future Upgrades
- Benchmark Suite:
Create a performance test harness that runs the query against multiple SQLite versions. - Query Plan Regression Checks:
AutomateEXPLAIN QUERY PLAN
comparisons during upgrades. - Profile with
sqlite3_profile()
:
Use SQLite’s profiling API to measure time spent in materialization vs. index scans.
Final Note: The regression highlights the delicate balance between query planner optimizations and CTE materialization strategies. Developers using complex CTEs in SQLite should:
- Prefer subqueries over CTEs where possible.
- Explicitly index temporary tables derived from CTEs.
- Monitor query plans during upgrades using
EXPLAIN QUERY PLAN
. - Engage with SQLite’s prerelease builds to validate fixes before official releases.
By adhering to these practices, users can mitigate risks associated with query planner changes while leveraging SQLite’s continuous performance improvements.