Recursive Query BLOOM Filter Performance Degradation and UNION ALL View Index Utilization Regression
BLOOM Filter Overhead in Recursive CTE Queries and UNION ALL View Index Selection Failure
1. Core Mechanics of Recursive CTE Execution and View-Based Index Utilization
The problem manifests in two distinct but related phases involving SQLite’s query optimizer decisions. First, the recursive Common Table Expression (CTE) with BLOOM filter usage demonstrates severe performance degradation compared to its non-BLOOM-filtered counterpart. Second, after addressing the BLOOM filter issue, a regression occurs where a UNION ALL view fails to utilize available indexes on underlying tables. These phenomena stem from SQLite’s cost-based optimizer making suboptimal decisions about filter application and predicate pushing.
Recursive CTE Execution Dynamics
Recursive CTEs in SQLite operate through a two-phase process:
- Initial Setup Phase: Executes the base query (non-recursive term)
- Recursive Expansion Phase: Iteratively processes the recursive term until no new rows are added
In the provided query:
WITH RECURSIVE transit(x) AS (
SELECT s FROM objs WHERE p=9 AND o=32805
UNION
SELECT objs.s FROM objs, transit WHERE objs.p=9 AND objs.o=transit.x
)
SELECT x FROM transit;
The optimizer generates a BLOOM filter during the recursive step to reduce unnecessary row comparisons. A BLOOM filter acts as a probabilistic data structure that tests whether an element belongs to a set. When applied to objs.o=transit.x
, it attempts to filter out objs
rows early that cannot possibly join with transit.x
values.
UNION ALL View Index Utilization Mechanism
The quads
view combines two tables via UNION ALL
:
CREATE VIEW quads AS
SELECT c,s,p,o,NULL AS d FROM objs
UNION ALL
SELECT c,s,p,o,d FROM datas;
When querying this view with predicates:
SELECT o,d FROM quads WHERE s=303 AND p=9;
SQLite should push down the s=303 AND p=9
predicates to both objs
and datas
tables, leveraging their respective indexes. Effective predicate pushing allows index seeks instead of full table scans. The regression observed after the BLOOM filter fix indicates a breakdown in this optimization pathway.
2. Root Causes of BLOOM Filter Inefficiency and Predicate Pushdown Failure
BLOOM Filter Overhead in Recursive Context
- Filter Construction Cost: BLOOM filters require hashing candidate values from the
transit.x
set. When the recursive step produces a large number of distinctx
values, the filter construction becomes computationally expensive. - False Positive Rate Impact: While BLOOM filters eliminate most non-matching rows, even a 1% false positive rate forces unnecessary row evaluations. In deep recursion trees, these accumulate exponentially.
- Index Coverage Mismatch: The
index_objs_op
on (o,p) provides covering data for the initial setup phase but becomes less effective during recursive expansion iftransit.x
values are sparsely distributed across the index. - Recursion Depth vs. Filter Efficiency: BLOOM filters optimize for large candidate sets, but shallow recursion depths (as seen in the fast 0.001s execution) render them counterproductive. The optimizer’s cost model overestimates the benefit of BLOOM filtering for small intermediate result sets.
UNION ALL View Index Utilization Regression
- View Materialization Thresholds: Recent SQLite versions may alter internal heuristics for view flattening. If the optimizer decides to materialize the
quads
view before applying predicates, index usage becomes impossible. - Predicate Pushdown Blockers: New constraints in the query planner might prevent pushing
s=303 AND p=9
into the view’s constituentUNION ALL
branches. Common causes include:- Type mismatches between view columns and base table columns
- Loss of nullability information through the view
- Overly conservative safety checks in predicate mobility analysis
- Cost Estimation Errors: The optimizer might incorrectly calculate that full scans of
objs
anddatas
are cheaper than index seeks due to:- Outdated table statistics (ANALYZE data)
- Misconfigured SQLITE_STAT4 histogram bins
- Incorrect assumptions about correlation between
s
andp
columns
3. Comprehensive Optimization Strategies for Recursive BLOOM Filters and View Index Restoration
Addressing Recursive CTE BLOOM Filter Overhead
Step 1: Query Plan Analysis and Forced Optimization Control
Pre-fix Workaround: Disable BLOOM filters using undocumented testctrl:
.testctrl optimizations 0x80000 -- CLI only
For application code, recompile SQLite with
-DSQLITE_OMIT_BLOOMFILTER
(not recommended long-term).Post-fix Verification: Confirm the trunk version properly handles BLOOM filter cost-benefit analysis in recursive contexts. Test with:
EXPLAIN QUERY PLAN WITH RECURSIVE transit(x) AS (...);
Ensure the optimized plan uses direct index seeks without BLOOM filters when appropriate.
Step 2: Index Tuning for Recursive Traversal
Composite Index Optimization: Create an index aligning with recursive join patterns:
CREATE INDEX index_objs_po ON objs(p,o,s); -- Covering index
This allows both setup and recursive terms to use the same index, reducing planner ambiguity.
Recursion Depth Monitoring: Instrument the query with termination limits:
WITH RECURSIVE transit(x, depth) AS ( SELECT s, 1 FROM objs WHERE p=9 AND o=32805 UNION SELECT objs.s, depth+1 FROM objs, transit WHERE objs.p=9 AND objs.o=transit.x AND depth < 20 ) SELECT x FROM transit;
Prevents runaway recursion while providing depth context for optimizer decisions.
Step 3: Manual Query Transformation
BREADTH-FIRST vs DEPTH-FIRST Search: Force materialization order using
ORDER BY
:WITH RECURSIVE transit(x) AS (...) SELECT x FROM transit ORDER BY rowid;
Alters row processing order, which can significantly impact BLOOM filter efficacy.
Temporary Materialization: Dump intermediate results to temp tables:
CREATE TEMP TABLE temp_transit AS WITH RECURSIVE transit(x) AS (...);
Breaks the recursion into planner-visible steps, allowing better cost estimation.
Restoring Index Utilization in UNION ALL Views
Step 1: View Definition Analysis
Explicit Column Typing: Ensure view columns match base table types:
CREATE VIEW quads AS SELECT CAST(c AS INTEGER) AS c, CAST(s AS INTEGER) AS s, ... FROM objs UNION ALL ...;
Eliminates type mismatch barriers to predicate pushdown.
Predicate Forwarding: Use
LATERAL
joins (SQLite 3.35+) to force predicate visibility:SELECT q.o, q.d FROM quads q WHERE EXISTS ( SELECT 1 FROM (SELECT * FROM objs WHERE s=303 AND p=9 UNION ALL SELECT * FROM datas WHERE s=303 AND p=9) WHERE rowid=q.rowid );
Step 2: Index Visibility Enhancement
Compound Index Creation: Build indexes covering all filtered columns:
CREATE INDEX index_objs_sp ON objs(s,p,o); CREATE INDEX index_datas_sp ON datas(s,p,o,d);
Allows covering index scans without base table access.
Statistics Regeneration: Force fresh cost calculations:
ANALYZE; ANALYZE sqlite_schema; -- Update internal schema stats
Step 3: Query Planner Overrides
Materialization Bypass: Use
NOT MATERIALIZED
hint (SQLite 3.35+):SELECT o,d FROM quads NOT MATERIALIZED WHERE s=303 AND p=9;
Directs the optimizer to process the view as a subquery rather than materializing.
Subquery Factoring: Manually decompose the view:
SELECT o,d FROM ( SELECT c,s,p,o,NULL AS d FROM objs WHERE s=303 AND p=9 UNION ALL SELECT c,s,p,o,d FROM datas WHERE s=303 AND p=9 );
Explicit predicate placement bypasses view optimization ambiguities.
Cross-Issue Mitigation Techniques
Index Merge Optimization: For complex query patterns, leverage INDEX_MERGE
:
SELECT o,d FROM quads
WHERE s=303 AND p=9
AND (rowid IN (SELECT rowid FROM objs WHERE s=303 AND p=9)
OR rowid IN (SELECT rowid FROM datas WHERE s=303 AND p=9));
Partial Indexing: Create filtered indexes matching common query parameters:
CREATE INDEX index_objs_high_freq ON objs(s,p,o) WHERE s BETWEEN 300 AND 400;
Cost Threshold Adjustments: Modify optimizer parameters via PRAGMA
:
PRAGMA optimizer_cost_limit=1000; -- Default 1000000
PRAGMA optimizer_scan_cost=2000; -- Default 40
PRAGMA optimizer_index_cost=100; -- Default 100
Query Plan Fixation: Use INDEXED BY
clauses to override planner choices:
SELECT o,d FROM quads
WHERE s=303 AND p=9
INDEXED BY index_objs_sp, index_datas_sp;
Long-Term Maintenance Practices
Version-Specific Plan Analysis: Regularly compare
EXPLAIN QUERY PLAN
outputs across SQLite versions to detect optimization regressions early.Index Coverage Monitoring: Use diagnostic queries to identify uncovered access patterns:
SELECT DISTINCT sql FROM sqlite_stat4 WHERE tbl IN ('objs','datas');
Recursive Query Profiling: Instrument CTEs with timing counters:
WITH RECURSIVE transit(x, iter) AS ( SELECT s, 1 FROM objs WHERE p=9 AND o=32805 UNION ALL SELECT objs.s, iter+1 FROM objs, transit WHERE objs.p=9 AND objs.o=transit.x AND iter < 100 ) SELECT x, MAX(iter) FROM transit GROUP BY x;
View Optimization Triggers: Create instead-of triggers to customize view access paths:
CREATE TRIGGER quads_optimize INSTEAD OF SELECT ON quads BEGIN SELECT o,d FROM objs WHERE s=NEW.s AND p=NEW.p UNION ALL SELECT o,d FROM datas WHERE s=NEW.s AND p=NEW.p; END;
Plan Stability Preservation: Archive known-good query plans using
sqlite_plan
table:CREATE TABLE sqlite_plan(query TEXT, plan TEXT); INSERT INTO sqlite_plan VALUES ('SELECT ...', (SELECT PLAN FROM EXPLAIN_QUERY_PLAN(...)));
This comprehensive approach addresses both immediate performance pathologies and establishes guardrails against future optimization regressions. The recursive query optimizations focus on aligning index structures with traversal patterns while the view fixes emphasize predicate mobility and cost model calibration. Together, they restore the performance characteristics expected from SQLite’s sophisticated query planner while accounting for edge-case behaviors in complex schema constructs.