Performance Degradation in UNION Queries After SQLite 3.40.0 Update

Query Planner Failing to Push Predicates into UNION Subqueries

The core issue revolves around a significant performance regression observed when executing UNION-based queries in SQLite 3.40.0 compared to 3.39.4. The regression manifests as a 4,000x+ slowdown in execution time for queries that retrieve historical changes to specific fields via correlated subqueries. The root cause lies in how the query planner handles predicate propagation (filter conditions like ProjID = 'PR0000020614') across UNION operations.

In SQLite 3.39.4, the optimizer successfully "pushes down" the ProjID filter into both branches of the UNION, allowing efficient index usage (SEARCH o USING INDEX). This ensures only relevant rows from the Project_List table are processed. However, 3.40.0 fails to propagate this filter into the UNION subqueries, resulting in full table scans (SCAN o) across all rows. The problem amplifies exponentially when the query includes multiple UNION branches (e.g., tracking 30+ columns), as each branch performs redundant full scans instead of targeted index lookups.

Key indicators of this issue include:

  • Missing index usage in EXPLAIN QUERY PLAN outputs for the outer Project_List as o references.
  • Full table scans (opcodes Rewind/Next) in EXPLAIN outputs instead of index seeks (SeekGE/IdxGT).
  • Correlated subqueries executing per row without leveraging indexes to narrow their scope.

This regression is particularly impactful for applications tracking field-level historization, where queries must process large datasets with frequent updates.

Changes in Predicate Pushdown and Flattening Logic Between Versions

The performance degradation stems from optimizer changes introduced in SQLite 3.40.0. Specifically:

  1. Predicate Pushdown Blocked by UNION:
    The WHERE ProjID = ... clause is no longer pushed into the individual UNION branches. Instead, it’s applied after the UNION completes, forcing all branches to process the entire dataset. This negates index selectivity and increases I/O overhead.

  2. Flattening Heuristics Modified:
    SQLite’s query planner attempts to "flatten" subqueries into the main query to optimize execution. Changes in 3.40.0’s flattening logic (e.g., due to fixes for correctness bugs like 346a3b12b861ce7b) inadvertently created optimization barriers for UNIONs with correlated subqueries.

  3. Collation/Affinity Handling:
    New affinity-checking rules in 3.40.0 (1ad41840c5e0fa70) may prevent predicate propagation if column types in the UNION branches don’t strictly match. This can disable index usage for literals or expressions lacking explicit casts.

  4. Temporary B-Tree Construction:
    The UNION operator internally uses a temporary B-Tree to eliminate duplicates. In 3.40.0, this structure may block predicate pushdown, whereas UNION ALL (which skips deduplication) avoids this overhead and allows better optimization.

Restoring Performance via Manual Predicate Pushdown and Indexed UNION ALL

Step 1: Analyze Query Plans to Confirm Missing Index Usage

Compare EXPLAIN QUERY PLAN outputs between SQLite versions. In 3.40.0, look for SCAN o instead of SEARCH o USING INDEX in UNION branches:

-- 3.40.0 (Problematic Plan)
QUERY PLAN
|--CO-ROUTINE (subquery-4)
| `--COMPOUND QUERY
|   |--LEFT-MOST SUBQUERY
|   | |--SCAN o  -- Full scan instead of SEARCH
|   | `--CORRELATED SCALAR SUBQUERY 1
|   |   `--SEARCH Project_List USING INDEX ... 
|   `--UNION USING TEMP B-TREE
|    |--SCAN o
|    `--CORRELATED SCALAR SUBQUERY 3
|      `--SEARCH Project_List USING INDEX ... 
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY

Step 2: Force Predicate Pushdown with Manual Filter Placement

Move the ProjID filter into each UNION branch to bypass the optimizer’s limitation:

SELECT ProjID, Updated_By, InsertDate, var, oldv, newv
FROM (
  SELECT ProjID, Updated_By, InsertDate, 'Finish_Date' AS var,
    (SELECT ... FROM Project_List 
     WHERE ProjID = o.ProjID AND InsertDate < o.InsertDate) AS oldv,
    coalesce(Finish_Date, '') AS newv
  FROM Project_List AS o
  WHERE ProjID = 'PR0000020614'  -- Filter added here
  UNION ALL
  SELECT ProjID, Updated_By, InsertDate, 'Ann_CapexP' AS var,
    (SELECT ... FROM Project_List 
     WHERE ProjID = o.ProjID AND InsertDate < o.InsertDate) AS oldv,
    replace(round(Ann_CapexP), '.0', '') AS newv
  FROM Project_List AS o
  WHERE ProjID = 'PR0000020614'  -- Filter added here
)
WHERE oldv <> newv
ORDER BY InsertDate ASC;

Step 3: Replace UNION with UNION ALL to Avoid Optimization Barriers

UNION implicitly adds a DISTINCT step using a temporary B-Tree, which blocks predicate pushdown. Use UNION ALL and handle deduplication externally if needed:

-- Original (Slow in 3.40.0):
SELECT ... FROM (SELECT ... UNION SELECT ...)  

-- Optimized:
SELECT ... FROM (SELECT ... UNION ALL SELECT ...)  

Step 4: Utilize INDEXED BY to Force Index Usage (Temporary Workaround)

If manual predicate pushdown isn’t sufficient, force index usage in UNION branches:

SELECT ... 
FROM (
  SELECT ProjID, ...
  FROM Project_List AS o INDEXED BY PL_ProjID_InsertDate_New  -- Force index
  WHERE ProjID = 'PR0000020614'
  UNION ALL
  SELECT ProjID, ...
  FROM Project_List AS o INDEXED BY PL_ProjID_InsertDate_New  
  WHERE ProjID = 'PR0000020614'
)
...

Step 5: Upgrade to SQLite 3.41+ for Built-in Optimizer Fixes

The SQLite team addressed this regression in subsequent releases (e.g., adbca3448e2099f0). Verify with EXPLAIN QUERY PLAN that 3.41+ restores index usage:

-- 3.41.0 (Corrected Plan):
QUERY PLAN
|--CO-ROUTINE (subquery-4)
| `--COMPOUND QUERY
|   |--LEFT-MOST SUBQUERY
|   | |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)  -- Index restored
|   | |--CORRELATED SCALAR SUBQUERY 1
|   | | `--SEARCH Project_List USING INDEX ...
|   | `--CORRELATED SCALAR SUBQUERY 1
|   |   `--SEARCH Project_List USING INDEX ...
|   `--UNION USING TEMP B-TREE
|    |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)  
|    |--CORRELATED SCALAR SUBQUERY 3
|    | `--SEARCH Project_List USING INDEX ...
|    `--CORRELATED SCALAR SUBQUERY 3
|      `--SEARCH Project_List USING INDEX ...
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY

Step 6: Adjust Affinity/Collation in UNION Branches for Strict Schemas

If using STRICT tables or encountering type mismatches, explicitly cast literals to match column affinities:

SELECT ... 
FROM (
  SELECT ProjID, CAST('Finish_Date' AS TEXT) AS var, ...  -- Explicit affinity
  UNION ALL
  SELECT ProjID, CAST('Ann_CapexP' AS TEXT) AS var, ...
)

Step 7: Monitor Schema and Index Statistics with ANALYZE

Outdated index statistics can mislead the query planner. Refresh them periodically:

ANALYZE;
ANALYZE sqlite_schema;  -- For heavily modified schemas

Final Recommendation

For immediate fixes on SQLite 3.40.0, restructure queries to use UNION ALL with manual predicate pushdown. For long-term stability, upgrade to SQLite 3.41+ where the optimizer correctly handles these scenarios. Always validate index usage via EXPLAIN QUERY PLAN after version changes or schema modifications.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *