Query Performance Regression with IN Clauses and OP_SeekScan in SQLite 3.35+
Understanding Performance Degradation in Multi-Column IN Queries
OP_SeekScan Optimization and Index Interaction Dynamics
This issue arises when upgrading from SQLite versions prior to 3.35 to newer versions, manifesting as severe query slowdowns (up to 500x) for queries containing multiple IN operators on indexed columns. The regression correlates with the introduction of the OP_SeekScan optimization in SQLite 3.35, designed to handle IN clauses more efficiently but sometimes producing suboptimal execution plans. A typical affected query pattern looks like:
SELECT count(_rowid_)
FROM edge
WHERE node_from IN temp.sub_nodes
AND node_to IN temp.sub_nodes;
The problem becomes acute when:
- Both filtered columns have composite indexes
- The temp.sub_nodes table contains numerous entries
- SQLite’s query planner chooses SeekScan over traditional index seeks
Key characteristics of failing queries:
- Execution plans show "USING ROWID SEARCH ON TABLE" for both IN clauses
- EXPLAIN output reveals OP_SeekScan opcodes in bytecode
- Query runtime grows quadratically with input size
- Removing indexes paradoxically improves performance
Query Planner Optimization Conflicts and Index Cardinality Miscalculations
Three primary factors converge to create this performance regression:
- SeekScan Cost Estimation Flaws
The OP_SeekScan optimization attempts to replace multiple index seeks with scanning a range of index entries, but its P1 parameter (maximum allowed steps) uses miscalculated heuristics. For queries with:
- High correlation between indexed columns
- Large temporary tables in IN clauses
- Composite indexes on both filtered columns
The optimization incorrectly estimates scanning costs, leading to:
-- Bad plan using SeekScan
SEARCH edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
- Index Selection Conflicts
Composite indexes create multiple access path options that confuse the planner:
CREATE INDEX edge_from_to ON edge(node_from,node_to);
CREATE INDEX edge_to_from ON edge(node_to,node_from);
The presence of covering indexes in both directions creates a "Hesitation Problem" where the planner:
- Can’t decide between index scan directions
- Defaults to worst-case scanning behavior
- Fails to recognize optimal join ordering
- Temporary Table Optimization Barriers
When using temporary tables in IN clauses:
CREATE TEMP TABLE sub_nodes(node_id INTEGER PRIMARY KEY);
SQLite’s query planner:
- Treats temp tables as opaque objects
- Can’t use standard index statistics
- Defaults to conservative row estimates
- Disables cross-table correlation analysis
Query Restructuring Techniques and Planner Guidance Strategies
Step 1: Immediate Workarounds
Force index selection with unary operators:
SELECT count(_rowid_)
FROM edge
WHERE +node_from IN temp.sub_nodes /* Disables SeekScan on first IN */
AND node_to IN temp.sub_nodes;
This modifies operator precedence to:
- Prevent SeekScan on the first IN clause
- Force index lookup on node_to
- Maintain ROWID search on temp.sub_nodes
Step 2: Query Rewriting Approaches
Convert IN clauses to explicit joins:
SELECT count(e._rowid_)
FROM edge e
JOIN sub_nodes n1 ON n1.node_id = e.node_from
JOIN sub_nodes n2 ON n2.node_id = e.node_to;
Execution plan improvements:
|--SCAN sub_nodes AS n1
|--SEARCH edge USING COVERING INDEX edge_from_to (node_from=?)
`--SEARCH sub_nodes AS n2 USING INTEGER PRIMARY KEY (rowid=?)
Step 3: Index Strategy Adjustments
Modify composite indexes to break planner confusion:
DROP INDEX edge_to_from;
CREATE INDEX edge_to_from_partial ON edge(node_to) WHERE node_from IS NOT NULL;
This prevents the planner from considering bidirectional scans while maintaining filtering efficiency.
Step 4: SQLite Configuration Tuning
Disable harmful optimizations at runtime:
PRAGMA optimize;
PRAGMA disable_optimizations=0x20000; /* Disables OP_SeekScan */
Or in application code:
sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0x20000);
Step 5: Version-Specific Upgrades
Upgrade to SQLite 3.35.5+ containing the fix:
# Contains critical SeekScan adjustments
wget https://sqlite.org/2021/sqlite-autoconf-3350500.tar.gz
Step 6: Advanced Monitoring
Analyze bytecode-level execution:
EXPLAIN
SELECT count(_rowid_) FROM edge
WHERE node_from IN sub_nodes AND node_to IN sub_nodes;
Look for OP_SeekScan opcodes and compare step counts between versions.
Step 7: Correlation Analysis
Calculate column correlation to inform index design:
SELECT
count(*) AS total,
count(DISTINCT node_from) AS distinct_from,
count(DISTINCT node_to) AS distinct_to,
1.0 * count(*) / (distinct_from * distinct_to) AS correlation
FROM edge;
High correlation values (>0.5) indicate:
- Composite indexes may not help
- SeekScan more likely to misestimate costs
Step 8: Temporary Table Optimization
Materialize temp tables with manual statistics:
ANALYZE temp.sub_nodes;
UPDATE sqlite_stat1 SET stat = '1000000 1' WHERE tbl = 'sub_nodes';
Forces the planner to treat temp tables as large datasets.