SQLite Composite Primary Key Index Not Used with Multiple IN Clause Values
Understanding Why SQLite’s Query Planner Avoids Composite Index for Multi-Value IN Clauses
Issue Overview: Composite Index Ignored During Multi-Tuple IN Clause Queries
When querying a SQLite table with a composite primary key (e.g., (id, time)
), developers expect the query planner to leverage the index for efficient lookups. However, in scenarios where an IN
clause contains multiple value tuples (e.g., (id, time) IN (VALUES (1,1), (1,2), (1,3))
), SQLite may unexpectedly perform a full table scan instead of using the index. This behavior contrasts with single-value IN
clauses, where the index is reliably used. The discrepancy arises from how SQLite’s query planner evaluates execution strategies, particularly when balancing the cost of index lookups against sequential scans. Three factors dominate this decision:
- Statistical Misjudgment: The query planner may misestimate the cost of repeated index lookups versus a single table scan.
- Subquery Handling: The structure of the
IN
clause may be interpreted as a subquery, altering optimization pathways. - Data Type Affinity Conflicts: Column type declarations (e.g.,
BIGINT UNSIGNED
vs.INTEGER
) might inadvertently affect index eligibility.
This issue manifests most prominently in large tables (e.g., 10M+ rows), where full scans degrade performance. Developers must understand why SQLite’s optimizer makes these choices and how to override them when necessary.
Root Causes: Why the Composite Index Is Bypassed
1. Cost Estimation Errors Due to Missing or Outdated Statistics
SQLite relies on the sqlite_stat1
table for statistics about table size and index selectivity. If ANALYZE
has not been run after populating the table, the optimizer assumes default statistics. These defaults often underestimate the table’s row count or misrepresent index distribution. For multi-value IN
clauses, the planner may incorrectly calculate that scanning the entire table is cheaper than performing multiple index lookups. This error becomes more pronounced as the number of IN
values increases, as the perceived overhead of repeated index seeks outweighs a single scan.
2. Subquery Materialization and Temporary Table Creation
The IN
clause with multiple tuples is internally rewritten as a LIST SUBQUERY
. SQLite materializes this list into an implicit temporary table. When joining this temporary table with the main table (test
), the optimizer may fail to recognize that the composite index can be used for direct lookups. Instead, it defaults to iterating over all rows of test
and checking for matches in the temporary table—a process resembling a nested loop join without index utilization.
3. Data Type Affinity Mismatch in Indexed Columns
While SQLite is flexible with data types, explicit declarations (e.g., BIGINT UNSIGNED
) can create affinity mismatches. SQLite treats all integer types (INT
, BIGINT
, etc.) as INTEGER
affinity. However, non-standard type names might confuse developer tools or obscure the index’s usability. Though this is less likely to affect the query planner directly, ensuring columns use SQLite’s native type names (e.g., INTEGER
) eliminates ambiguity and guarantees index compatibility.
Resolving the Problem: Forcing Index Usage and Optimizing Queries
Step 1: Update Statistics with ANALYZE
Run ANALYZE
to generate or refresh statistical data:
ANALYZE;
This populates sqlite_stat1
with accurate row counts and index selectivity metrics. After updating statistics, re-examine the query plan using EXPLAIN QUERY PLAN
. If the composite index is still ignored, proceed to query restructuring.
Step 2: Rewrite the Query Using a Common Table Expression (CTE)
Convert the IN
clause’s value list into an explicit CTE. This forces SQLite to treat the list as a concrete temporary table, enabling better join optimizations. Three equivalent formulations achieve this:
Option 1: Explicit JOIN with ON Clause
WITH keys(id, time) AS (VALUES (1,1), (1,2), (1,3))
SELECT t.* FROM keys k
JOIN test t ON t.id = k.id AND t.time = k.time;
Option 2: IN Subquery with CTE
WITH keys(id, time) AS (VALUES (1,1), (1,2), (1,3))
SELECT * FROM test
WHERE (id, time) IN (SELECT id, time FROM keys);
Option 3: NATURAL JOIN
WITH keys(id, time) AS (VALUES (1,1), (1,2), (1,3))
SELECT * FROM test
NATURAL JOIN keys;
All three approaches encourage the optimizer to use the composite index by framing the search as a join between two tables (the main table and the CTE). The NATURAL JOIN
variant is particularly concise but requires exact column name matches.
Step 3: Verify Data Type Affinities
Ensure the table schema uses SQLite’s native type names. Replace BIGINT UNSIGNED
with INTEGER
:
CREATE TABLE test (
id INTEGER NOT NULL,
time INTEGER NOT NULL,
value REAL,
version INTEGER NOT NULL,
PRIMARY KEY (id, time)
) WITHOUT ROWID;
While SQLite ignores the UNSIGNED
modifier, using standard types prevents tooling confusion and ensures index compatibility.
Step 4: Use Index Hints (Advanced)
If the above steps fail, force index usage with INDEXED BY
(use sparingly):
SELECT * FROM test INDEXED BY test_primary_key
WHERE (id, time) IN (VALUES (1,1), (1,2), (1,3));
This bypasses the query planner’s cost estimation. However, it risks suboptimal plans if data distribution changes, so prefer CTE-based rewrites.
Step 5: Upgrade SQLite
Newer SQLite versions (3.39.0+) include optimizations for tuple comparisons in IN
clauses. If possible, upgrade and retest the original query.
By systematically addressing statistical inaccuracies, query structure limitations, and schema ambiguities, developers can ensure SQLite’s query planner leverages composite indexes effectively—even for multi-value IN
clauses.