SQLite Composite Primary Key Index Not Used with Multiple IN Clause Values


Understanding Why SQLite’s Query Planner Avoids Composite Index for Multi-Value IN Clauses


Issue Overview: Composite Index Ignored During Multi-Tuple IN Clause Queries

When querying a SQLite table with a composite primary key (e.g., (id, time)), developers expect the query planner to leverage the index for efficient lookups. However, in scenarios where an IN clause contains multiple value tuples (e.g., (id, time) IN (VALUES (1,1), (1,2), (1,3))), SQLite may unexpectedly perform a full table scan instead of using the index. This behavior contrasts with single-value IN clauses, where the index is reliably used. The discrepancy arises from how SQLite’s query planner evaluates execution strategies, particularly when balancing the cost of index lookups against sequential scans. Three factors dominate this decision:

  1. Statistical Misjudgment: The query planner may misestimate the cost of repeated index lookups versus a single table scan.
  2. Subquery Handling: The structure of the IN clause may be interpreted as a subquery, altering optimization pathways.
  3. Data Type Affinity Conflicts: Column type declarations (e.g., BIGINT UNSIGNED vs. INTEGER) might inadvertently affect index eligibility.

This issue manifests most prominently in large tables (e.g., 10M+ rows), where full scans degrade performance. Developers must understand why SQLite’s optimizer makes these choices and how to override them when necessary.


Root Causes: Why the Composite Index Is Bypassed

1. Cost Estimation Errors Due to Missing or Outdated Statistics

SQLite relies on the sqlite_stat1 table for statistics about table size and index selectivity. If ANALYZE has not been run after populating the table, the optimizer assumes default statistics. These defaults often underestimate the table’s row count or misrepresent index distribution. For multi-value IN clauses, the planner may incorrectly calculate that scanning the entire table is cheaper than performing multiple index lookups. This error becomes more pronounced as the number of IN values increases, as the perceived overhead of repeated index seeks outweighs a single scan.

2. Subquery Materialization and Temporary Table Creation

The IN clause with multiple tuples is internally rewritten as a LIST SUBQUERY. SQLite materializes this list into an implicit temporary table. When joining this temporary table with the main table (test), the optimizer may fail to recognize that the composite index can be used for direct lookups. Instead, it defaults to iterating over all rows of test and checking for matches in the temporary table—a process resembling a nested loop join without index utilization.

3. Data Type Affinity Mismatch in Indexed Columns

While SQLite is flexible with data types, explicit declarations (e.g., BIGINT UNSIGNED) can create affinity mismatches. SQLite treats all integer types (INT, BIGINT, etc.) as INTEGER affinity. However, non-standard type names might confuse developer tools or obscure the index’s usability. Though this is less likely to affect the query planner directly, ensuring columns use SQLite’s native type names (e.g., INTEGER) eliminates ambiguity and guarantees index compatibility.


Resolving the Problem: Forcing Index Usage and Optimizing Queries

Step 1: Update Statistics with ANALYZE

Run ANALYZE to generate or refresh statistical data:

ANALYZE;

This populates sqlite_stat1 with accurate row counts and index selectivity metrics. After updating statistics, re-examine the query plan using EXPLAIN QUERY PLAN. If the composite index is still ignored, proceed to query restructuring.

Step 2: Rewrite the Query Using a Common Table Expression (CTE)

Convert the IN clause’s value list into an explicit CTE. This forces SQLite to treat the list as a concrete temporary table, enabling better join optimizations. Three equivalent formulations achieve this:

Option 1: Explicit JOIN with ON Clause

WITH keys(id, time) AS (VALUES (1,1), (1,2), (1,3))
SELECT t.* FROM keys k
JOIN test t ON t.id = k.id AND t.time = k.time;

Option 2: IN Subquery with CTE

WITH keys(id, time) AS (VALUES (1,1), (1,2), (1,3))
SELECT * FROM test
WHERE (id, time) IN (SELECT id, time FROM keys);

Option 3: NATURAL JOIN

WITH keys(id, time) AS (VALUES (1,1), (1,2), (1,3))
SELECT * FROM test
NATURAL JOIN keys;

All three approaches encourage the optimizer to use the composite index by framing the search as a join between two tables (the main table and the CTE). The NATURAL JOIN variant is particularly concise but requires exact column name matches.

Step 3: Verify Data Type Affinities

Ensure the table schema uses SQLite’s native type names. Replace BIGINT UNSIGNED with INTEGER:

CREATE TABLE test (
  id INTEGER NOT NULL,
  time INTEGER NOT NULL,
  value REAL,
  version INTEGER NOT NULL,
  PRIMARY KEY (id, time)
) WITHOUT ROWID;

While SQLite ignores the UNSIGNED modifier, using standard types prevents tooling confusion and ensures index compatibility.

Step 4: Use Index Hints (Advanced)

If the above steps fail, force index usage with INDEXED BY (use sparingly):

SELECT * FROM test INDEXED BY test_primary_key
WHERE (id, time) IN (VALUES (1,1), (1,2), (1,3));

This bypasses the query planner’s cost estimation. However, it risks suboptimal plans if data distribution changes, so prefer CTE-based rewrites.

Step 5: Upgrade SQLite

Newer SQLite versions (3.39.0+) include optimizations for tuple comparisons in IN clauses. If possible, upgrade and retest the original query.


By systematically addressing statistical inaccuracies, query structure limitations, and schema ambiguities, developers can ensure SQLite’s query planner leverages composite indexes effectively—even for multi-value IN clauses.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *