Query Performance Regression with IN Clauses and OP_SeekScan in SQLite 3.35+

Understanding Performance Degradation in Multi-Column IN Queries

OP_SeekScan Optimization and Index Interaction Dynamics

This issue arises when upgrading from SQLite versions prior to 3.35 to newer versions, manifesting as severe query slowdowns (up to 500x) for queries containing multiple IN operators on indexed columns. The regression correlates with the introduction of the OP_SeekScan optimization in SQLite 3.35, designed to handle IN clauses more efficiently but sometimes producing suboptimal execution plans. A typical affected query pattern looks like:

SELECT count(_rowid_)
FROM edge
WHERE node_from IN temp.sub_nodes 
  AND node_to IN temp.sub_nodes;

The problem becomes acute when:

Both filtered columns have composite indexes
The temp.sub_nodes table contains numerous entries
SQLite’s query planner chooses SeekScan over traditional index seeks

Key characteristics of failing queries:

Execution plans show "USING ROWID SEARCH ON TABLE" for both IN clauses
EXPLAIN output reveals OP_SeekScan opcodes in bytecode
Query runtime grows quadratically with input size
Removing indexes paradoxically improves performance

Query Planner Optimization Conflicts and Index Cardinality Miscalculations

Three primary factors converge to create this performance regression:

SeekScan Cost Estimation Flaws
The OP_SeekScan optimization attempts to replace multiple index seeks with scanning a range of index entries, but its P1 parameter (maximum allowed steps) uses miscalculated heuristics. For queries with:

High correlation between indexed columns
Large temporary tables in IN clauses
Composite indexes on both filtered columns

The optimization incorrectly estimates scanning costs, leading to:

-- Bad plan using SeekScan
SEARCH edge USING COVERING INDEX edge_to_from (node_to=? AND node_from=?)
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR
USING ROWID SEARCH ON TABLE sub_nodes FOR IN-OPERATOR

Index Selection Conflicts
Composite indexes create multiple access path options that confuse the planner:

CREATE INDEX edge_from_to ON edge(node_from,node_to);
CREATE INDEX edge_to_from ON edge(node_to,node_from);

The presence of covering indexes in both directions creates a "Hesitation Problem" where the planner:

Can’t decide between index scan directions
Defaults to worst-case scanning behavior
Fails to recognize optimal join ordering

Temporary Table Optimization Barriers
When using temporary tables in IN clauses:

CREATE TEMP TABLE sub_nodes(node_id INTEGER PRIMARY KEY);

SQLite’s query planner:

Treats temp tables as opaque objects
Can’t use standard index statistics
Defaults to conservative row estimates
Disables cross-table correlation analysis

Query Restructuring Techniques and Planner Guidance Strategies

Step 1: Immediate Workarounds

Force index selection with unary operators:

SELECT count(_rowid_)
FROM edge
WHERE +node_from IN temp.sub_nodes /* Disables SeekScan on first IN */
  AND node_to IN temp.sub_nodes;

This modifies operator precedence to:

Prevent SeekScan on the first IN clause
Force index lookup on node_to
Maintain ROWID search on temp.sub_nodes

Step 2: Query Rewriting Approaches

Convert IN clauses to explicit joins:

SELECT count(e._rowid_)
FROM edge e
JOIN sub_nodes n1 ON n1.node_id = e.node_from
JOIN sub_nodes n2 ON n2.node_id = e.node_to;

Execution plan improvements:

|--SCAN sub_nodes AS n1
|--SEARCH edge USING COVERING INDEX edge_from_to (node_from=?)
`--SEARCH sub_nodes AS n2 USING INTEGER PRIMARY KEY (rowid=?)

Step 3: Index Strategy Adjustments

Modify composite indexes to break planner confusion:

DROP INDEX edge_to_from;
CREATE INDEX edge_to_from_partial ON edge(node_to) WHERE node_from IS NOT NULL;

This prevents the planner from considering bidirectional scans while maintaining filtering efficiency.

Step 4: SQLite Configuration Tuning

Disable harmful optimizations at runtime:

PRAGMA optimize;
PRAGMA disable_optimizations=0x20000; /* Disables OP_SeekScan */

Or in application code:

sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0x20000);

Step 5: Version-Specific Upgrades

Upgrade to SQLite 3.35.5+ containing the fix:

# Contains critical SeekScan adjustments
wget https://sqlite.org/2021/sqlite-autoconf-3350500.tar.gz

Step 6: Advanced Monitoring

Analyze bytecode-level execution:

EXPLAIN
SELECT count(_rowid_) FROM edge
WHERE node_from IN sub_nodes AND node_to IN sub_nodes;

Look for OP_SeekScan opcodes and compare step counts between versions.

Step 7: Correlation Analysis

Calculate column correlation to inform index design:

SELECT 
  count(*) AS total,
  count(DISTINCT node_from) AS distinct_from,
  count(DISTINCT node_to) AS distinct_to,
  1.0 * count(*) / (distinct_from * distinct_to) AS correlation
FROM edge;

High correlation values (>0.5) indicate:

Composite indexes may not help
SeekScan more likely to misestimate costs

Step 8: Temporary Table Optimization

Materialize temp tables with manual statistics:

ANALYZE temp.sub_nodes;
UPDATE sqlite_stat1 SET stat = '1000000 1' WHERE tbl = 'sub_nodes';

Forces the planner to treat temp tables as large datasets.

Query Performance Regression with IN Clauses and OP_SeekScan in SQLite 3.35+

Understanding Performance Degradation in Multi-Column IN Queries

OP_SeekScan Optimization and Index Interaction Dynamics

Query Planner Optimization Conflicts and Index Cardinality Miscalculations

Query Restructuring Techniques and Planner Guidance Strategies

SQLite Concurrent Write Contention: File-Level Locking and Insert Performance Optimization

High CPU Usage and Threading Issues with SQLite WAL Mode

Parallel Index Creation in SQLite Using Begin Concurrent and Schema Manipulation

Optimizing SQLite Performance with Profile-Guided Optimization (PGO)

Reducing SQLite Library Size for Embedded Systems: Optimization Strategies and Compilation Techniques

Optimizing SQLite Table Appending Between Memory and File Databases

Leave a Reply Cancel reply

Understanding Performance Degradation in Multi-Column IN Queries

OP_SeekScan Optimization and Index Interaction Dynamics

Query Planner Optimization Conflicts and Index Cardinality Miscalculations

Query Restructuring Techniques and Planner Guidance Strategies

Related Guides

Leave a Reply Cancel reply