Optimizing Composite Key Seek Performance in SQLite WITHOUT ROWID Tables


Understanding Composite Key Seek Behavior with Partial Index Utilization

Core Challenge: Single-Column Index Seek on Multi-Part Primary Key

The central issue revolves around SQLite’s query planner failing to utilize a full composite primary key for efficient range searches (SeekGE operations) in a WITHOUT ROWID table. The table in question uses a four-column primary key:

PRIMARY KEY(KeyPart1, KeyPart2, KeyPart3, KeyPart4)

WITHOUT ROWID storage ensures all key columns and values are stored directly in the B-tree structure, theoretically enabling index-only access. However, the original query structure resulted in suboptimal execution:

WHERE KeyPart1 > 1 OR (KeyPart1 = 1 AND (... nested conditions ...))

The EXPLAIN QUERY PLAN revealed SQLite performed a SEARCH Test USING PRIMARY KEY (KeyPart1>?), using only the first key column. This forced traversal of all rows where KeyPart1 > 1, effectively triggering a partial table scan despite the composite index.

Performance degradation occurred because most entries shared the same KeyPart1 value. The query planner’s inability to leverage the full composite key for a single SeekGE operation caused excessive disk I/O and CPU cycles, mirroring a full scan’s inefficiency.


Root Causes of Partial Composite Key Utilization

1. Imprecise WHERE Clause Structure for Multi-Column Range Queries

SQLite’s query planner struggles with disjunctive (OR) conditions that split range predicates across multiple key columns. The original query’s nested OR logic:

KeyPart1 > 1 OR 
(KeyPart1 = 1 AND KeyPart2 > x'abcdef' OR 
 (KeyPart2 = x'abcdef' AND KeyPart3 > 2 OR ...))

creates separate search boundaries for each key segment. This forces the planner to evaluate each disjunct as an independent range scan, preventing a unified SeekGE operation.

2. Index Selection Heuristics and Column Skipping

SQLite prioritizes leftmost key columns when optimizing searches. If the first column in a composite key has a range condition (>), subsequent columns are often ignored in index seeks. This heuristic aims to minimize I/O but backfires when most rows share the same KeyPart1, as seen here. The planner assumes KeyPart1 > ? sufficiently narrows the search, unaware of data skew.

3. Temporary B-Tree Overhead from ORDER BY and LIMIT

When queries combine disjunctive conditions with ORDER BY and LIMIT, SQLite may create temporary B-trees to sort results before applying the limit. The second query variant demonstrated this:

QUERY PLAN
`--USE TEMP B-TREE FOR ORDER BY

This adds memory and CPU overhead, negating the benefits of index-only access in WITHOUT ROWID tables.


Solutions for Efficient Composite Key Seek Operations

1. Vector Comparison Syntax for Unified Range Conditions

Replace disjunctive logic with a row value comparison to unify the search condition across all key columns:

SELECT KeyPart1, hex(KeyPart2), KeyPart3, KeyPart4 
FROM Test 
WHERE (KeyPart1, KeyPart2, KeyPart3, KeyPart4) > (1, x'abcdef', 2, 3)
ORDER BY KeyPart1, KeyPart2, KeyPart3, KeyPart4
LIMIT 1;

This syntax explicitly defines a single range boundary for the entire composite key. The EXPLAIN QUERY PLAN now shows:

QUERY PLAN
`--SEARCH Test USING PRIMARY KEY ((KeyPart1,KeyPart2,KeyPart3,KeyPart4)>(?,?,?,?))

Key Advantages:

  • Single SeekGE Operation: The B-tree is traversed starting at the exact composite key boundary, eliminating redundant row checks.
  • No Temporary Sorting: The inherent order of the PRIMARY KEY aligns with the ORDER BY clause, allowing the LIMIT to terminate the scan immediately after finding the first match.

2. Index-Only Query Confirmation in WITHOUT ROWID Tables

Verify that the query uses index-only scanning:

EXPLAIN QUERY PLAN 
SELECT KeyPart1, KeyPart2 FROM Test 
WHERE (KeyPart1, KeyPart2, KeyPart3, KeyPart4) > (?, ?, ?, ?);

The output should include USING PRIMARY KEY without MATERIALIZE or USE TEMP B-TREE. In WITHOUT ROWID tables, all columns are part of the primary key B-tree, ensuring no heap accesses.

3. Mitigating Data Skew with Composite Key Design

If the first key column has low cardinality (e.g., KeyPart1 dominated by a single value), consider reordering the composite key:

PRIMARY KEY(KeyPart2, KeyPart3, KeyPart4, KeyPart1)

This places a higher-cardinality column (e.g., BLOB) first, allowing the query planner to narrow the search faster. Validate with:

ANALYZE;
SELECT sqlite_stat1.stat FROM sqlite_stat1 WHERE tbl = 'Test';

to ensure the revised key order has favorable stats.

4. Forcing Index Usage with Index Hints

If the planner still misbehaves, use INDEXED BY to enforce composite key usage:

SELECT ... FROM Test INDEXED BY sqlite_autoindex_Test_1 
WHERE (KeyPart1, KeyPart2, KeyPart3, KeyPart4) > (?, ?, ?, ?);

Caution: This bypasses the planner’s cost-based decisions and requires manual reevaluation after schema changes.

5. BLOB Handling and Type Affinity Considerations

SQLite treats BLOB values as distinct from other types in comparisons. Ensure the application passes BLOB literals correctly:

-- Correct
WHERE KeyPart2 > x'abcdef'
-- Incorrect (text affinity)
WHERE KeyPart2 > 'abcdef'

Mismatched types can prevent index usage. Use hex() in output formatting without applying it to WHERE clauses.

6. EXPLAIN Output Analysis for Seek Operations

Inspect the bytecode (EXPLAIN) for SeekGE/SeekGT opcodes covering all key columns:

8   SeekGE     2   38  2   4       0  key=r[2..5]

The key=r[2..5] indicates the composite key (registers 2 to 5) is used for the seek. Compare this to partial seeks like key=r[2] (only first column).

7. Benchmarking with Large Datasets

Test with representative data volumes to validate performance:

  1. Insert 100K+ rows with controlled key distributions.
  2. Use sqlite3_test_control(SQLITE_TESTCTRL_BITVEC_TEST) to stress I/O patterns.
  3. Profile with sqlite3_stmt_status() to track page accesses:
int page_reads = sqlite3_stmt_status(stmt, SQLITE_STMTSTATUS_AUTOINDEX, 0);

8. Version-Specific Query Planner Enhancements

Upgrade to SQLite 3.34+ (2020-12-01), which introduced row-value comparisons. Earlier versions require cumbersome workarounds:

WHERE KeyPart1 >= ?1 AND (
  KeyPart1 > ?1 OR 
  (KeyPart2 >= ?2 AND (
    KeyPart2 > ?2 OR 
    (KeyPart3 >= ?3 AND ... )
  ))
)

By restructuring queries to leverage row-value comparisons and aligning WHERE/ORDER BY clauses with composite key order, developers can achieve O(log N) seek performance comparable to ISAM databases. The solution capitalizes on SQLite’s B-tree mechanics while avoiding common pitfalls in disjunctive condition handling.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *