Performance Degradation in UNION Queries After SQLite 3.40.0 Update
Query Planner Failing to Push Predicates into UNION Subqueries
The core issue revolves around a significant performance regression observed when executing UNION-based queries in SQLite 3.40.0 compared to 3.39.4. The regression manifests as a 4,000x+ slowdown in execution time for queries that retrieve historical changes to specific fields via correlated subqueries. The root cause lies in how the query planner handles predicate propagation (filter conditions like ProjID = 'PR0000020614'
) across UNION operations.
In SQLite 3.39.4, the optimizer successfully "pushes down" the ProjID
filter into both branches of the UNION, allowing efficient index usage (SEARCH o USING INDEX
). This ensures only relevant rows from the Project_List
table are processed. However, 3.40.0 fails to propagate this filter into the UNION subqueries, resulting in full table scans (SCAN o
) across all rows. The problem amplifies exponentially when the query includes multiple UNION branches (e.g., tracking 30+ columns), as each branch performs redundant full scans instead of targeted index lookups.
Key indicators of this issue include:
- Missing index usage in
EXPLAIN QUERY PLAN
outputs for the outerProject_List as o
references. - Full table scans (opcodes
Rewind
/Next
) inEXPLAIN
outputs instead of index seeks (SeekGE
/IdxGT
). - Correlated subqueries executing per row without leveraging indexes to narrow their scope.
This regression is particularly impactful for applications tracking field-level historization, where queries must process large datasets with frequent updates.
Changes in Predicate Pushdown and Flattening Logic Between Versions
The performance degradation stems from optimizer changes introduced in SQLite 3.40.0. Specifically:
Predicate Pushdown Blocked by UNION:
TheWHERE ProjID = ...
clause is no longer pushed into the individual UNION branches. Instead, it’s applied after the UNION completes, forcing all branches to process the entire dataset. This negates index selectivity and increases I/O overhead.Flattening Heuristics Modified:
SQLite’s query planner attempts to "flatten" subqueries into the main query to optimize execution. Changes in 3.40.0’s flattening logic (e.g., due to fixes for correctness bugs like 346a3b12b861ce7b) inadvertently created optimization barriers for UNIONs with correlated subqueries.Collation/Affinity Handling:
New affinity-checking rules in 3.40.0 (1ad41840c5e0fa70) may prevent predicate propagation if column types in the UNION branches don’t strictly match. This can disable index usage for literals or expressions lacking explicit casts.Temporary B-Tree Construction:
TheUNION
operator internally uses a temporary B-Tree to eliminate duplicates. In 3.40.0, this structure may block predicate pushdown, whereasUNION ALL
(which skips deduplication) avoids this overhead and allows better optimization.
Restoring Performance via Manual Predicate Pushdown and Indexed UNION ALL
Step 1: Analyze Query Plans to Confirm Missing Index Usage
Compare EXPLAIN QUERY PLAN
outputs between SQLite versions. In 3.40.0, look for SCAN o
instead of SEARCH o USING INDEX
in UNION branches:
-- 3.40.0 (Problematic Plan)
QUERY PLAN
|--CO-ROUTINE (subquery-4)
| `--COMPOUND QUERY
| |--LEFT-MOST SUBQUERY
| | |--SCAN o -- Full scan instead of SEARCH
| | `--CORRELATED SCALAR SUBQUERY 1
| | `--SEARCH Project_List USING INDEX ...
| `--UNION USING TEMP B-TREE
| |--SCAN o
| `--CORRELATED SCALAR SUBQUERY 3
| `--SEARCH Project_List USING INDEX ...
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY
Step 2: Force Predicate Pushdown with Manual Filter Placement
Move the ProjID
filter into each UNION branch to bypass the optimizer’s limitation:
SELECT ProjID, Updated_By, InsertDate, var, oldv, newv
FROM (
SELECT ProjID, Updated_By, InsertDate, 'Finish_Date' AS var,
(SELECT ... FROM Project_List
WHERE ProjID = o.ProjID AND InsertDate < o.InsertDate) AS oldv,
coalesce(Finish_Date, '') AS newv
FROM Project_List AS o
WHERE ProjID = 'PR0000020614' -- Filter added here
UNION ALL
SELECT ProjID, Updated_By, InsertDate, 'Ann_CapexP' AS var,
(SELECT ... FROM Project_List
WHERE ProjID = o.ProjID AND InsertDate < o.InsertDate) AS oldv,
replace(round(Ann_CapexP), '.0', '') AS newv
FROM Project_List AS o
WHERE ProjID = 'PR0000020614' -- Filter added here
)
WHERE oldv <> newv
ORDER BY InsertDate ASC;
Step 3: Replace UNION with UNION ALL to Avoid Optimization Barriers
UNION
implicitly adds a DISTINCT
step using a temporary B-Tree, which blocks predicate pushdown. Use UNION ALL
and handle deduplication externally if needed:
-- Original (Slow in 3.40.0):
SELECT ... FROM (SELECT ... UNION SELECT ...)
-- Optimized:
SELECT ... FROM (SELECT ... UNION ALL SELECT ...)
Step 4: Utilize INDEXED BY to Force Index Usage (Temporary Workaround)
If manual predicate pushdown isn’t sufficient, force index usage in UNION branches:
SELECT ...
FROM (
SELECT ProjID, ...
FROM Project_List AS o INDEXED BY PL_ProjID_InsertDate_New -- Force index
WHERE ProjID = 'PR0000020614'
UNION ALL
SELECT ProjID, ...
FROM Project_List AS o INDEXED BY PL_ProjID_InsertDate_New
WHERE ProjID = 'PR0000020614'
)
...
Step 5: Upgrade to SQLite 3.41+ for Built-in Optimizer Fixes
The SQLite team addressed this regression in subsequent releases (e.g., adbca3448e2099f0). Verify with EXPLAIN QUERY PLAN
that 3.41+ restores index usage:
-- 3.41.0 (Corrected Plan):
QUERY PLAN
|--CO-ROUTINE (subquery-4)
| `--COMPOUND QUERY
| |--LEFT-MOST SUBQUERY
| | |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?) -- Index restored
| | |--CORRELATED SCALAR SUBQUERY 1
| | | `--SEARCH Project_List USING INDEX ...
| | `--CORRELATED SCALAR SUBQUERY 1
| | `--SEARCH Project_List USING INDEX ...
| `--UNION USING TEMP B-TREE
| |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)
| |--CORRELATED SCALAR SUBQUERY 3
| | `--SEARCH Project_List USING INDEX ...
| `--CORRELATED SCALAR SUBQUERY 3
| `--SEARCH Project_List USING INDEX ...
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY
Step 6: Adjust Affinity/Collation in UNION Branches for Strict Schemas
If using STRICT
tables or encountering type mismatches, explicitly cast literals to match column affinities:
SELECT ...
FROM (
SELECT ProjID, CAST('Finish_Date' AS TEXT) AS var, ... -- Explicit affinity
UNION ALL
SELECT ProjID, CAST('Ann_CapexP' AS TEXT) AS var, ...
)
Step 7: Monitor Schema and Index Statistics with ANALYZE
Outdated index statistics can mislead the query planner. Refresh them periodically:
ANALYZE;
ANALYZE sqlite_schema; -- For heavily modified schemas
Final Recommendation
For immediate fixes on SQLite 3.40.0, restructure queries to use UNION ALL
with manual predicate pushdown. For long-term stability, upgrade to SQLite 3.41+ where the optimizer correctly handles these scenarios. Always validate index usage via EXPLAIN QUERY PLAN
after version changes or schema modifications.