Query Performance Regression in SQLite 3.44.2 Due to Suboptimal Index Selection
Issue Overview: Degraded Query Performance from Index Selection Changes in SQLite 3.44.2
A user reported a severe performance regression after upgrading from SQLite 3.42.0 to 3.44.2. A query that previously executed in seconds began taking 20 minutes. The root cause was traced to a change in the query planner’s index selection strategy. In version 3.42.0, the query used a composite index (attributes_data_parent_id_and_name_index) to efficiently resolve a join condition involving data_parent_id and name. After upgrading to 3.44.2, the query planner switched to using a less optimal single-column index (attributes_name_index), resulting in a full scan of intermediate results and drastic slowdowns.
The user identified a specific optimization removed in commit 796a65fa61373b5d as the catalyst for this regression. Reverting this optimization locally restored the original query plan and performance. The core issue revolves around how the query planner evaluates candidate indexes for joins involving OR conditions and composite predicates. The regression highlights a scenario where newer heuristics in the query planner inadvertently prioritize an index that does not optimally reduce the search space for the given query structure.
The test case provided demonstrates the problem:
- A schema with four indexes on the
attributestable. - A query that joins two subqueries, one filtering by
idand the other byname, with a WHERE clause involvingdata_parent_idanddata_class_idlinked via OR. - In SQLite 3.42.0, the composite index
attributes_data_parent_id_and_name_indexis used for thedbAttrs3subquery, enabling efficient lookups. - In SQLite 3.44.2, the single-column
attributes_name_indexis selected instead, forcing a scan of all rows matchingname = 'asdf'and requiring subsequent filtering fordata_parent_idordata_class_id.
This change in index selection directly impacts execution time because the composite index allows the database to resolve both name and data_parent_id in a single seek, while the single-column index requires separate processing of the OR conditions.
Possible Causes: Why the Query Planner Chooses a Suboptimal Index
1. Changes to the Query Planner’s Cost Model in SQLite 3.44.2
The commit 796a65fa61373b5d removed an optimization related to index selection for queries with OR-connected predicates. This optimization likely influenced how the planner weighed the cost of using composite indexes versus single-column indexes. Without it, the planner may undervalue composite indexes when evaluating OR clauses, even if they cover multiple columns referenced in the query.
The cost model in SQLite’s query planner relies on estimates of disk I/O and CPU operations. Composite indexes can reduce the number of rows processed by resolving multiple predicates at once. However, if the planner’s heuristics assign a higher cost to using composite indexes for certain query patterns (e.g., OR conditions spanning multiple tables), it may fall back to single-column indexes, leading to suboptimal plans.
2. Outdated or Missing Statistical Metadata (ANALYZE Data)
SQLite uses stored statistics in the sqlite_stat1 table to estimate the selectivity of indexes. If these statistics are missing or outdated, the query planner cannot accurately compare the efficiency of candidate indexes. For example:
- The composite index
attributes_data_parent_id_and_name_indexmight have better selectivity (fewer rows per(name, data_parent_id)pair) than the single-columnattributes_name_index. - If the
sqlite_stat1table does not reflect this, the planner might assume both indexes have similar selectivity and choose the one with a lower perceived cost (e.g., smaller index size).
Running the ANALYZE command regenerates these statistics. The absence of up-to-date statistics is a common cause of sudden performance regressions after upgrades, as newer planner versions may rely more heavily on statistical data.
3. OR Condition Handling and Join Reordering
The WHERE clause dbAttrs3.data_parent_id = dbAttrs2.data_parent_id OR dbAttrs3.data_parent_id = dbAttrs2.data_class_id complicates query planning. SQLite must evaluate two potential join conditions and decide whether to process them as separate searches or as a combined scan.
In SQLite 3.42.0, the planner may have split the OR into separate queries and used the composite index to resolve both branches efficiently. In 3.44.2, the planner might instead scan all rows matching name = 'asdf' and then filter them against the OR condition. This approach increases I/O and CPU load, especially if many rows match the name predicate but few satisfy the data_parent_id/data_class_id conditions.
Troubleshooting Steps, Solutions & Fixes
1. Update Statistical Metadata with ANALYZE
Step 1: Run ANALYZE in the SQLite CLI to regenerate table and index statistics:
ANALYZE;
This populates the sqlite_stat1 table with data such as the number of rows per index entry, which the query planner uses to estimate costs.
Step 2: Compare the query plans before and after running ANALYZE using EXPLAIN QUERY PLAN. If the planner switches back to the composite index, the problem was caused by stale statistics.
Step 3: If performance improves but remains suboptimal, inspect the sqlite_stat1 entries for the affected indexes:
SELECT tbl, idx, stat FROM sqlite_stat1 WHERE tbl = 'attributes';
Ensure the stat column reflects the expected distribution (e.g., the composite index should show higher selectivity for (name, data_parent_id) pairs).
Solution: Schedule regular ANALYZE runs after significant data changes or schema modifications. Consider using PRAGMA optimize (SQLite 3.18.0+) to automate statistics updates.
2. Force Index Usage with Index Hints
If ANALYZE does not resolve the issue, override the planner’s index selection using SQLite’s INDEXED BY clause:
SELECT DISTINCT dbAttrs3.id
FROM
(SELECT ... FROM attributes AS dbAttrs1 ...) AS dbAttrs2,
(SELECT * FROM attributes INDEXED BY attributes_data_parent_id_and_name_index WHERE name = 'asdf') AS dbAttrs3
WHERE ...;
Caution: Hardcoding indexes reduces the planner’s flexibility. Validate that the forced index consistently improves performance across different data distributions.
Alternative: Use NOT INDEXED to exclude the problematic single-column index:
SELECT * FROM attributes NOT INDEXED WHERE name = 'asdf';
3. Rewrite the Query to Guide the Planner
Modify the query structure to make the composite index more appealing to the planner:
Option A: Separate the OR into UNION ALL subqueries:
SELECT DISTINCT dbAttrs3.id
FROM
dbAttrs2,
(
SELECT * FROM attributes
WHERE name = 'asdf' AND data_parent_id = dbAttrs2.data_parent_id
UNION ALL
SELECT * FROM attributes
WHERE name = 'asdf' AND data_parent_id = dbAttrs2.data_class_id
) AS dbAttrs3
...;
This forces the planner to use the composite index for both branches of the OR.
Option B: Materialize the dbAttrs2 subquery to avoid repeated scans:
WITH dbAttrs2 AS MATERIALIZED (
SELECT DISTINCT dbAttrs1.id, dbAttrs1.data_class_id, dbAttrs1.data_parent_id, dbAttrs1.name
FROM attributes AS dbAttrs1
WHERE dbAttrs1.id = 1234
)
SELECT DISTINCT dbAttrs3.id
FROM dbAttrs2, attributes AS dbAttrs3
WHERE
(dbAttrs3.data_parent_id = dbAttrs2.data_parent_id OR dbAttrs3.data_parent_id = dbAttrs2.data_class_id)
AND dbAttrs3.name = 'asdf'
ORDER BY dbAttrs3.id;
The MATERIALIZED hint ensures the subquery is computed once, reducing redundant computations.
4. Adjust Index Definitions for Better Coverage
Ensure the composite index includes all columns referenced in the query’s WHERE and JOIN clauses. For example, if the query also filters on data_class_id, consider a new index:
CREATE INDEX attributes_covering_index ON attributes(name, data_parent_id, data_class_id);
Covering indexes allow the planner to resolve queries without accessing the main table, reducing I/O.
5. Downgrade or Patch SQLite
If immediate resolution is critical, revert to SQLite 3.42.0 or apply a local patch to reintroduce the optimization from commit 796a65fa61373b5d.
Long-term: Report the regression to SQLite’s maintainers with a reproducible test case (schema, data sample, and query). Collaborate to refine the query planner’s cost model for OR-heavy queries.
This guide provides a comprehensive pathway to diagnose and resolve query planner regressions in SQLite, emphasizing the interplay between index design, statistical metadata, and query structure.