Optimizing SQLite Query Performance with Compound Indexes on Multiple Columns

Understanding Query Planner Behavior with Multi-Column Indexes in SQLite

Issue Overview: Inefficient Index Selection Despite Wide Compound Indexes

The problem arises when executing AND-connected queries against a table with many queryable columns. A compound index covering all queryable columns (a to z in the example) was created to accelerate these queries. However, SQLite’s query planner (QP) does not automatically select this index unless explicitly forced via INDEXED BY. Even after deleting all other indexes and running ANALYZE, the QP still prefers full table scans or suboptimal index choices. This occurs despite the compound index being a covering index (i.e., containing all columns required by the query) and delivering significantly faster results when manually enforced.

Key observations from the scenario include:

The compound index (idx_treatments_allCols) reduces query execution time from 7 seconds to 0.2 seconds when forced.
The QP ignores the compound index for queries filtering on columns that are not the first few columns in the index definition.
ANALYZE statistics do not appear to influence the QP’s decision-making as expected.

This behavior contradicts the expectation that a covering index spanning all queryable columns would be prioritized by the QP. The root cause lies in how SQLite’s query planner evaluates index usability, selectivity, and cost.

Causes of Suboptimal Index Selection in Multi-Column Queries

1. Column Order in Compound Indexes

SQLite’s query planner relies heavily on the leftmost prefix rule when using compound indexes. An index on columns (a, b, c, d, ..., z) can only be efficiently used if the query includes constraints on a contiguous leftmost subset of these columns. For example:

A query with WHERE a = ? AND b = ? can use the index.
A query with WHERE c = ? AND d = ? cannot use the index unless the preceding columns (a and b) are also constrained or the index supports skip-scan optimization (which has limitations).

In the provided example, the query filtering on status and checkinTime did not use the compound index because these columns were not the leftmost in the index definition. The QP defaulted to a smaller index on status (if present) or a full table scan.

2. Query Cost Estimation Errors

SQLite’s cost-based optimizer uses statistics from ANALYZE to estimate the number of rows filtered by each index. However, these estimates can be inaccurate for compound indexes with many columns:

False Selectivity Assumptions: The QP assumes that a compound index with many columns has lower selectivity for individual columns, especially if those columns appear later in the index. For example, if status is the 10th column in a 20-column index, the QP may underestimate its filtering power.
Covering Index Misjudgment: Even when a compound index is covering, the QP might incorrectly estimate that scanning the entire index is slower than scanning the table. This happens when the index is wide (contains many large columns) or when ANALYZE statistics indicate a high number of rows per index entry.

3. Skip-Scan Optimization Limitations

The skip-scan optimization allows SQLite to use an index even when the leftmost columns are not constrained, but it requires:

Low cardinality (few distinct values) in the leading columns of the index.
Up-to-date ANALYZE statistics to identify skip-scan opportunities.

In the example, if the columns preceding status and checkinTime in the index had high cardinality, skip-scan would be inefficient, and the QP would avoid it. This explains why forcing the index manually improved performance: the skip-scan was viable for the specific query’s data distribution but not generalizable.

Solutions for Ensuring Optimal Index Usage

Step 1: Reorganize Compound Indexes Based on Query Patterns

Action: Redesign the compound index to prioritize frequently queried columns and column groups. For example, if most queries filter on status and checkinTime, define the index as:

CREATE INDEX idx_treatments_priority ON treatments (status, checkinTime, ...);

Rationale: By placing high-selectivity and frequently used columns first, the index becomes usable for a broader range of queries without relying on skip-scan.

Implementation:

Analyze query logs to identify common column combinations in WHERE clauses.
Use SQLite’s EXPLAIN QUERY PLAN to verify index usage after reordering.

Step 2: Validate and Update ANALYZE Statistics

Action: Ensure ANALYZE generates accurate statistics by:

ANALYZE; -- Collect statistics for all indexes
ANALYZE sqlite_schema; -- Update schema metadata (SQLite 3.32+)

Rationale: Outdated statistics can cause the QP to misestimate the cost of using a compound index. For example, if ANALYZE was run before populating the table, it would not reflect the true data distribution.

Verification:

SELECT * FROM sqlite_stat1 WHERE tbl = 'treatments';

Check the stat column for histogram data. A value like 12345 10 indicates the index has 12,345 rows with an average of 10 rows per distinct value.

Step 3: Force Index Usage Strategically

Action: Use INDEXED BY as a temporary workaround for critical queries:

SELECT ... FROM treatments INDEXED BY idx_treatments_allCols WHERE ...;

Rationale: This overrides the QP’s decision and is useful for:

Queries where the index provides a proven performance benefit.
Legacy systems where schema changes are not immediately feasible.

Caution: Overusing INDEXED BY can lead to maintenance headaches and suboptimal plans if the data distribution changes. Reserve this for edge cases.

Step 4: Split Large Tables into Narrower Sub-Tables

Action: Separate frequently queried columns from rarely used ones:

CREATE TABLE treatments_core (
  id INTEGER PRIMARY KEY,
  status TEXT,
  checkinTime INTEGER,
  ...
);
CREATE TABLE treatments_aux (
  id INTEGER PRIMARY KEY,
  large_data BLOB,
  ...
);

Rationale: Narrower tables reduce the cost of full scans and make covering indexes smaller and faster to scan. This is particularly effective if the original table contains BLOB or TEXT columns that inflate row sizes.

Step 5: Use Partial Indexes for High-Selectivity Filters

Action: Create indexes targeting specific values:

CREATE INDEX idx_treatments_status_nov ON treatments(status) 
WHERE status = 'sp. nov.';

Rationale: Partial indexes are smaller and faster to scan when queries frequently filter on specific values (e.g., status = 'sp. nov.').

Step 6: Benchmark Index Scans vs. Table Scans

Action: Compare the performance of forced index scans with default QP behavior:

-- With index
SELECT Count(*) FROM treatments INDEXED BY idx_treatments_allCols WHERE ...;

-- Without index
SELECT Count(*) FROM treatments WHERE ...;

Rationale: If the index scan is consistently faster, consider reworking the schema or using INDEXED BY. If not, the compound index may not be worth retaining.

Step 7: Monitor Query Plan Stability

Action: Regularly check query plans after data changes:

EXPLAIN QUERY PLAN SELECT ... FROM treatments WHERE ...;

Rationale: Data growth or shifts in value distributions can render previously optimal indexes ineffective. Proactive monitoring avoids sudden performance regressions.

Summary of Fixes and Best Practices

Index Design: Prioritize leftmost columns based on query frequency and selectivity.
Statistics Maintenance: Run ANALYZE after significant data changes.
Selective Overrides: Use INDEXED BY sparingly for mission-critical queries.
Schema Optimization: Split tables and use partial indexes to reduce overhead.
Continuous Monitoring: Validate query plans and index usage periodically.

By aligning index structures with query patterns and maintaining accurate statistics, SQLite’s query planner can reliably select optimal indexes without manual intervention. For corner cases where the QP’s logic falls short, strategic use of INDEXED BY and schema refactoring ensures consistent performance.

Optimizing SQLite Query Performance with Compound Indexes on Multiple Columns

Understanding Query Planner Behavior with Multi-Column Indexes in SQLite

Issue Overview: Inefficient Index Selection Despite Wide Compound Indexes

Causes of Suboptimal Index Selection in Multi-Column Queries

1. Column Order in Compound Indexes

2. Query Cost Estimation Errors

3. Skip-Scan Optimization Limitations

Solutions for Ensuring Optimal Index Usage

Step 1: Reorganize Compound Indexes Based on Query Patterns

Step 2: Validate and Update ANALYZE Statistics

Step 3: Force Index Usage Strategically

Step 4: Split Large Tables into Narrower Sub-Tables

Step 5: Use Partial Indexes for High-Selectivity Filters

Step 6: Benchmark Index Scans vs. Table Scans

Step 7: Monitor Query Plan Stability

Summary of Fixes and Best Practices

WAL Mode Behavior and Optimizing Checkpoints in SQLite

Recovering and Preventing SQLite In-Memory Database Failures on Full Storage

Optimizing SQLite Performance in PHP with Shared Memory and Concurrent Connections

SQLite Database Locked Issue in EntityFramework Transactions

Optimizing Bulk Inserts into WITHOUT ROWID Tables in SQLite

SQLite Multi-Threaded Assertion Errors and Segfaults with TEMP TRIGGER on Non-Temp Tables

Leave a Reply Cancel reply

Understanding Query Planner Behavior with Multi-Column Indexes in SQLite

Issue Overview: Inefficient Index Selection Despite Wide Compound Indexes

Causes of Suboptimal Index Selection in Multi-Column Queries

1. Column Order in Compound Indexes

2. Query Cost Estimation Errors

3. Skip-Scan Optimization Limitations

Solutions for Ensuring Optimal Index Usage

Step 1: Reorganize Compound Indexes Based on Query Patterns

Step 2: Validate and Update ANALYZE Statistics

Step 3: Force Index Usage Strategically

Step 4: Split Large Tables into Narrower Sub-Tables

Step 5: Use Partial Indexes for High-Selectivity Filters

Step 6: Benchmark Index Scans vs. Table Scans

Step 7: Monitor Query Plan Stability

Summary of Fixes and Best Practices

Related Guides

Leave a Reply Cancel reply