Optimizing Slow SELECT Queries on Composite Criteria in SQLite

Understanding Slow SELECT Performance on Composite Index Criteria

Issue Overview: Query Performance Degradation with Multiple WHERE Clauses
The core problem revolves around a SQLite query experiencing significant latency (150 ms) when filtering on two columns (ACCOUNT_ID and ENTITY), despite individual filters on ACCOUNT_ID alone executing efficiently (2 ms). The table UNIQUE_ENTITIES has separate indexes on ACCOUNT_ID and ENTITY, but the combined query fails to leverage these indexes optimally.

The schema design and indexing strategy are central to this issue. SQLite’s query planner selects an execution path that performs a full or partial scan when combining conditions from non-overlapping indexes. The ENTITY column’s data type (varchar(512)) and the potential cardinality (number of distinct values) of ENTITY relative to ACCOUNT_ID further complicate index utilization. For example, if ENTITY values are highly unique, SQLite may prioritize the ACCOUNT_ID index but resort to row-wise filtering for ENTITY, leading to inefficiencies.

Additionally, the absence of a composite index forces the database engine to perform index merging—a process where results from multiple indexes are combined—which is computationally expensive. The BIGINT and varchar(512) data types introduce storage overhead, as larger column values increase index size, reducing the number of index entries per page and requiring more I/O operations.

The discussion also highlights experimentation with WITHOUT ROWID tables, which can optimize storage for tables with composite primary keys. However, improper configuration (e.g., attempting to reference rowid in a WITHOUT ROWID table) caused confusion, and the lack of updated statistics (via ANALYZE) may lead the query planner to make suboptimal decisions.

Possible Causes: Index Fragmentation, Suboptimal Index Design, and Planner Misestimations

Isolated Single-Column Indexes: Separate indexes on ACCOUNT_ID and ENTITY force SQLite to choose between:
- Using one index and performing a row-wise filter for the other condition.
- Merging results from both indexes, which is rarely efficient in SQLite due to the lack of native bitmap index scans.
Data Distribution Skew:
- High cardinality in ACCOUNT_ID (many unique values) allows the index on ACCOUNT_ID to quickly narrow down results.
- Low cardinality in ENTITY (many duplicates) makes the ENTITY index less selective, increasing the likelihood of full scans.
Outdated Statistics: Without up-to-date statistics (collected via ANALYZE), the query planner may misestimate the number of rows matching each condition, leading to poor index selection.
Index Column Order: Composite indexes are order-sensitive. An index on (ACCOUNT_ID, ENTITY) is more efficient for queries filtering on both columns than (ENTITY, ACCOUNT_ID) if ACCOUNT_ID is the higher-cardinality column.
Storage Overhead in Indexes: The ENTITY column’s variable-length string (up to 512 bytes) increases index size, reducing the number of entries stored per page and increasing I/O.
WITHOUT ROWID Misconfiguration: Attempting to reference rowid in a WITHOUT ROWID table causes errors, as such tables lack the implicit rowid column. This can disrupt queries or application logic expecting rowid to exist.
Implicit Type Conversions: Binding parameters with incorrect data types (e.g., sending integers as strings) can prevent index usage.

Troubleshooting Steps, Solutions & Fixes: Index Optimization, Schema Refactoring, and Planner Guidance

Step 1: Analyze Query Execution Plans
Use EXPLAIN QUERY PLAN to diagnose how SQLite processes the query:

EXPLAIN QUERY PLAN 
SELECT DETAILS_ID FROM UNIQUE_ENTITIES 
WHERE ACCOUNT_ID = 112323 AND ENTITY = '21f442-678f97-fgh382-47d828';

Typical output for suboptimal plans might show USING INDEX idx1 followed by SCAN UNIQUE_ENTITIES for the ENTITY filter, indicating that only one index is used.

Step 2: Create a Composite Index
Replace individual indexes with a composite index covering both columns in the WHERE clause:

CREATE INDEX idx_account_entity ON UNIQUE_ENTITIES (ACCOUNT_ID, ENTITY);

This allows SQLite to perform a single index seek for both conditions. The order of columns matters: place the most selective column (usually ACCOUNT_ID) first.

Step 3: Evaluate WITHOUT ROWID Tables
If the table is frequently queried by (ACCOUNT_ID, ENTITY), consider defining it as a WITHOUT ROWID table with a composite primary key:

CREATE TABLE UNIQUE_ENTITIES (
  DETAILS_ID BIGINT, 
  ACCOUNT_ID BIGINT, 
  ENTITY VARCHAR(512), 
  PRIMARY KEY (ACCOUNT_ID, ENTITY)
) WITHOUT ROWID;

This stores the table as a clustered index, reducing storage overhead and improving lookup speed for primary key queries.

Step 4: Update Statistics with ANALYZE
Run ANALYZE to collect statistics about table and index data distribution:

ANALYZE;

This helps the query planner make informed decisions about index usage.

Step 5: Verify Data Types and Binding
Ensure that the application binds parameters using correct data types. For example, ACCOUNT_ID should be bound as an integer, not a string.

Step 6: Benchmark with Different Index Orders
If query patterns vary, test composite indexes with different column orders. For example, an index on (ENTITY, ACCOUNT_ID) might perform better if ENTITY is often used alone.

Step 7: Monitor Index Fragmentation
Rebuild indexes periodically if the table undergoes heavy write operations:

REINDEX idx_account_entity;

Step 8: Utilize Covering Indexes
Include DETAILS_ID in the composite index to create a covering index:

CREATE INDEX idx_account_entity_details ON UNIQUE_ENTITIES (ACCOUNT_ID, ENTITY, DETAILS_ID);

This eliminates the need to access the main table, reducing I/O.

Step 9: Review Application Logic
Ensure that queries are parameterized correctly and avoid unnecessary data retrieval.

Step 10: Consider Data Partitioning
If ACCOUNT_ID represents distinct tenants or users, partition the table by ACCOUNT_ID using separate tables or attached databases.

By systematically addressing index design, schema configuration, and planner guidance, the latency for composite queries can be reduced to single-digit milliseconds, aligning with the performance observed for single-condition queries.