Optimizing Slow SELECT Queries on Composite Criteria in SQLite
Understanding Slow SELECT Performance on Composite Index Criteria
Issue Overview: Query Performance Degradation with Multiple WHERE Clauses
The core problem revolves around a SQLite query experiencing significant latency (150 ms) when filtering on two columns (ACCOUNT_ID
and ENTITY
), despite individual filters on ACCOUNT_ID
alone executing efficiently (2 ms). The table UNIQUE_ENTITIES
has separate indexes on ACCOUNT_ID
and ENTITY
, but the combined query fails to leverage these indexes optimally.
The schema design and indexing strategy are central to this issue. SQLite’s query planner selects an execution path that performs a full or partial scan when combining conditions from non-overlapping indexes. The ENTITY
column’s data type (varchar(512)
) and the potential cardinality (number of distinct values) of ENTITY
relative to ACCOUNT_ID
further complicate index utilization. For example, if ENTITY
values are highly unique, SQLite may prioritize the ACCOUNT_ID
index but resort to row-wise filtering for ENTITY
, leading to inefficiencies.
Additionally, the absence of a composite index forces the database engine to perform index merging—a process where results from multiple indexes are combined—which is computationally expensive. The BIGINT
and varchar(512)
data types introduce storage overhead, as larger column values increase index size, reducing the number of index entries per page and requiring more I/O operations.
The discussion also highlights experimentation with WITHOUT ROWID
tables, which can optimize storage for tables with composite primary keys. However, improper configuration (e.g., attempting to reference rowid
in a WITHOUT ROWID
table) caused confusion, and the lack of updated statistics (via ANALYZE
) may lead the query planner to make suboptimal decisions.
Possible Causes: Index Fragmentation, Suboptimal Index Design, and Planner Misestimations
Isolated Single-Column Indexes: Separate indexes on
ACCOUNT_ID
andENTITY
force SQLite to choose between:- Using one index and performing a row-wise filter for the other condition.
- Merging results from both indexes, which is rarely efficient in SQLite due to the lack of native bitmap index scans.
Data Distribution Skew:
- High cardinality in
ACCOUNT_ID
(many unique values) allows the index onACCOUNT_ID
to quickly narrow down results. - Low cardinality in
ENTITY
(many duplicates) makes theENTITY
index less selective, increasing the likelihood of full scans.
- High cardinality in
Outdated Statistics: Without up-to-date statistics (collected via
ANALYZE
), the query planner may misestimate the number of rows matching each condition, leading to poor index selection.Index Column Order: Composite indexes are order-sensitive. An index on
(ACCOUNT_ID, ENTITY)
is more efficient for queries filtering on both columns than(ENTITY, ACCOUNT_ID)
ifACCOUNT_ID
is the higher-cardinality column.Storage Overhead in Indexes: The
ENTITY
column’s variable-length string (up to 512 bytes) increases index size, reducing the number of entries stored per page and increasing I/O.WITHOUT ROWID
Misconfiguration: Attempting to referencerowid
in aWITHOUT ROWID
table causes errors, as such tables lack the implicit rowid column. This can disrupt queries or application logic expectingrowid
to exist.Implicit Type Conversions: Binding parameters with incorrect data types (e.g., sending integers as strings) can prevent index usage.
Troubleshooting Steps, Solutions & Fixes: Index Optimization, Schema Refactoring, and Planner Guidance
Step 1: Analyze Query Execution Plans
Use EXPLAIN QUERY PLAN
to diagnose how SQLite processes the query:
EXPLAIN QUERY PLAN
SELECT DETAILS_ID FROM UNIQUE_ENTITIES
WHERE ACCOUNT_ID = 112323 AND ENTITY = '21f442-678f97-fgh382-47d828';
Typical output for suboptimal plans might show USING INDEX idx1
followed by SCAN UNIQUE_ENTITIES
for the ENTITY
filter, indicating that only one index is used.
Step 2: Create a Composite Index
Replace individual indexes with a composite index covering both columns in the WHERE
clause:
CREATE INDEX idx_account_entity ON UNIQUE_ENTITIES (ACCOUNT_ID, ENTITY);
This allows SQLite to perform a single index seek for both conditions. The order of columns matters: place the most selective column (usually ACCOUNT_ID
) first.
Step 3: Evaluate WITHOUT ROWID
Tables
If the table is frequently queried by (ACCOUNT_ID, ENTITY)
, consider defining it as a WITHOUT ROWID
table with a composite primary key:
CREATE TABLE UNIQUE_ENTITIES (
DETAILS_ID BIGINT,
ACCOUNT_ID BIGINT,
ENTITY VARCHAR(512),
PRIMARY KEY (ACCOUNT_ID, ENTITY)
) WITHOUT ROWID;
This stores the table as a clustered index, reducing storage overhead and improving lookup speed for primary key queries.
Step 4: Update Statistics with ANALYZE
Run ANALYZE
to collect statistics about table and index data distribution:
ANALYZE;
This helps the query planner make informed decisions about index usage.
Step 5: Verify Data Types and Binding
Ensure that the application binds parameters using correct data types. For example, ACCOUNT_ID
should be bound as an integer, not a string.
Step 6: Benchmark with Different Index Orders
If query patterns vary, test composite indexes with different column orders. For example, an index on (ENTITY, ACCOUNT_ID)
might perform better if ENTITY
is often used alone.
Step 7: Monitor Index Fragmentation
Rebuild indexes periodically if the table undergoes heavy write operations:
REINDEX idx_account_entity;
Step 8: Utilize Covering Indexes
Include DETAILS_ID
in the composite index to create a covering index:
CREATE INDEX idx_account_entity_details ON UNIQUE_ENTITIES (ACCOUNT_ID, ENTITY, DETAILS_ID);
This eliminates the need to access the main table, reducing I/O.
Step 9: Review Application Logic
Ensure that queries are parameterized correctly and avoid unnecessary data retrieval.
Step 10: Consider Data Partitioning
If ACCOUNT_ID
represents distinct tenants or users, partition the table by ACCOUNT_ID
using separate tables or attached databases.
By systematically addressing index design, schema configuration, and planner guidance, the latency for composite queries can be reduced to single-digit milliseconds, aligning with the performance observed for single-condition queries.