Row Value Index Search Behavior with Rowid vs. WITHOUT ROWID Tables
Issue Overview: Partial Index Usage in Row Value Comparisons
The core problem revolves around unexpected query plan behavior when using row value comparisons (e.g., (a,b,i) > (?,?,?)
) against indexes containing SQLite’s implicit rowid column. Users observe that when querying a standard rowid table with an index containing the rowid alias (e.g., i INTEGER PRIMARY KEY
), the EXPLAIN QUERY PLAN
output shows partial index usage. For example, an index on (a,b,i)
appears to only use (a,b)
for searching, while the same query against a WITHOUT ROWID
table with identical schema uses all three columns. This discrepancy raises questions about index storage formats, query planner logic, and fundamental differences between rowid and WITHOUT ROWID
table structures.
Key Observations:
Rowid Table (t2):
Unique index declared as(a,b,i)
wherei
is the rowid alias.
Query plan showsSEARCH USING COVERING INDEX ... ((a,b)>(?,?))
Only first two columns used despite explicit three-column comparisonWITHOUT ROWID Table (t3):
Same index definition(a,b,i)
(now primary key)
Query plan shows full(a,b,i)>(?,?,?)
usageIndex Column Order Sensitivity:
Changing index column order (e.g.,(a,i,b)
) causes query planner to truncate index usage at first occurrence of rowid alias
Fundamental Concepts at Play:
- Implicit Rowid Storage: In standard tables, every index implicitly appends the rowid to its stored columns
- Index Key Composition: For rowid tables, indexes effectively store
[IndexedColumns,rowid]
even if not explicitly declared - Row Value Comparisons: SQLite evaluates tuple comparisons lexically, requiring precise index alignment for optimal search
Possible Causes: Rowid Storage Mechanics and Index Truncation
1. Implicit Rowid Duplication in Indexes
Root Cause:
When creating an index on a rowid table that explicitly includes the rowid alias column (e.g., i INTEGER PRIMARY KEY
), SQLite creates a redundant index structure. The physical index stores both the declared columns and the implicit rowid, leading to hidden column duplication.
Example Breakdown:
For table t2
with index (a,b,i)
:
- Logical index definition:
a, b, i
- Physical storage:
a, b, i, rowid
(wherei=rowid
) - Effective key becomes
a, b, i, i
due toi
being rowid alias
This redundancy forces the query planner to make non-obvious decisions about index usage boundaries.
2. Query Planner’s Index Truncation Heuristic
Behavior Trigger:
When the query planner detects consecutive identical columns in an index key (here, i
appearing as both explicit column and implicit rowid), it truncates the searchable index columns at the first occurrence of the rowid alias. This occurs because:
- The rowid alias (
i
) and implicit rowid are guaranteed equal - Including both creates logical contradictions in range comparisons
- The planner prioritizes avoiding comparison redundancy
Critical Implications:
- Indexes ending with rowid alias columns become partially unusable for row value comparisons
- Query plans may show fewer columns used than expected
- Actual bytecode execution (via
EXPLAIN
) might still process all columns despite plan output
3. WITHOUT ROWID vs Rowid Table Index Storage
Structural Difference:
WITHOUT ROWID
tables store indexes without implicit rowid appending. An index on (a,b,i)
in a WITHOUT ROWID
table stores exactly those three columns, enabling full usage in row value comparisons. This explains why table t3
shows complete index usage in its query plan.
Storage Comparison:
Table Type | Declared Index | Physical Storage |
---|---|---|
Rowid | (a,b,i) | (a,b,i,rowid) → (a,b,i,i) |
WITHOUT ROWID | (a,b,i) | (a,b,i) |
4. Row Value Comparison Semantics
Lexical Comparison Nuance:
SQLite compares row values component-wise, similar to string sorting:
(a1,b1,i1) > (a2,b2,i2)
⇨ (a1 > a2) OR (a1=a2 AND b1 > b2) OR (a1=a2 AND b1=b2 AND i1 > i2)
When the index contains duplicate columns (due to implicit rowid), the comparison becomes:
(a1,b1,i1,i1) > (a2,b2,i2,i2)
The fourth column (i1
vs i2
) comparison is redundant because i1=i1
and i2=i2
always hold. This redundancy confuses the query planner’s range optimization logic.
Troubleshooting Steps, Solutions & Fixes
Step 1: Diagnose Implicit Index Structure
Action: Use PRAGMA index_xinfo(index_name)
to reveal hidden columns.
Example for t2’s autoindex:
PRAGMA index_xinfo(sqlite_autoindex_t2_1);
Expected output:
seq | name | desc | coll | key | asc |
---|---|---|---|---|---|
0 | a | 0 | BINARY | 1 | 1 |
1 | b | 0 | BINARY | 1 | 1 |
2 | i | 0 | BINARY | 1 | 1 |
3 | rowid | 0 | BINARY | 0 | 1 |
Analysis:
- Column 3 (
rowid
) is implicitly added despitei
being rowid alias - Creates logical duplicate between column 2 (
i
) and 3 (rowid
)
Step 2: Avoid Redundant Rowid Inclusion
Solution: Never explicitly include rowid alias columns in indexes for rowid tables.
Correct Approach for t2:
CREATE TABLE t2 (
i INTEGER PRIMARY KEY,
a INTEGER NOT NULL,
b INTEGER NOT NULL
);
CREATE INDEX t2_ab ON t2(a, b); -- Implicitly includes rowid
Query Adjustment:
SELECT * FROM t2
WHERE (a,b,i) > (?,?,?) -- i=rowid still comparable
ORDER BY a,b,i
LIMIT 10;
Rationale:
- Index
t2_ab
physically stores(a,b,rowid)
i
is equivalent torowid
, so comparison works- Query plan will show
SEARCH ... USING INDEX t2_ab ((a,b) > (?,?))
but bytecode uses full index
Step 3: Bytecode-Level Verification
Action: Use EXPLAIN
instead of EXPLAIN QUERY PLAN
to see actual opcodes.
Example:
EXPLAIN
SELECT * FROM t2 WHERE (a,b,i) > (?,?,?) ORDER BY a,b,i LIMIT 10;
Key Opcodes to Check:
- ColumnCount: Verify all compared columns are accessed
- IdxGE: Look for operands covering all index columns
- DecrJumpZero: Check loop termination conditions
Interpretation:
Even if EXPLAIN QUERY PLAN
shows partial index usage, bytecode often reveals full index traversal due to implicit rowid inclusion.
Step 4: Index Column Order Optimization
Rule: Place rowid alias columns after all other indexed columns to prevent truncation.
Incorrect Index:
CREATE INDEX bad_idx ON t2(a, i, b); -- i=rowid in middle
Query Plan Output:
SEARCH t2 USING INDEX bad_idx (a>?)
Correct Index:
CREATE INDEX good_idx ON t2(a, b); -- rowid appended automatically
Behavior:
- Physical index:
(a, b, rowid)
- Supports
(a,b,i) > (?,?,?)
comparisons via bytecode
Step 5: Consider WITHOUT ROWID When Appropriate
Use Case: Tables requiring clustered indexes or explicit PK ordering.
Conversion Example:
CREATE TABLE t3 (
i INTEGER PRIMARY KEY,
a INTEGER NOT NULL,
b INTEGER NOT NULL
) WITHOUT ROWID;
Index Behavior:
- Primary key
(i)
becomes clustered index - Additional index
(a,b,i)
stores exact columns - Full index usage in row value comparisons
Tradeoffs:
- Increased storage space (no implicit rowid)
- Update overhead for clustered indexes
Step 6: Forced Index Usage Clarification
Technique: Use INDEXED BY
to override planner choices.
Example:
SELECT *
FROM t2 INDEXED BY sqlite_autoindex_t2_1
WHERE (a,b,i) > (?,?,?)
ORDER BY a,b,i
LIMIT 10;
Effect:
- Bypasses query planner’s truncation heuristic
- Forces full index scan despite redundant columns
- Verify with
EXPLAIN
to ensure desired opcodes
Step 7: Query Rewrite for Partial Comparisons
Alternative Approach: Break row value comparison into column-wise clauses.
Original Query:
WHERE (a,b,i) > (?,?,?)
Rewritten:
WHERE a > ?1
OR (a = ?1 AND b > ?2)
OR (a = ?1 AND b = ?2 AND i > ?3)
Advantage:
- Explicit control over comparison logic
- Works better with compound indexes
- Planner may use
MULTI-INDEX OR
optimization
Step 8: Version-Specific Behavior Testing
Consideration: Test across SQLite versions (3.43.0+ shown in examples).
Checklist:
- Confirm
row_value > ?
optimization exists in target version - Review SQLite Release Notes for index changes
- Test with latest version using
sqlite3_version()
function
Step 9: Index Redesign for Rowid Tables
Optimal Index Strategy:
- Never include rowid alias in secondary indexes
- Leverage implicit rowid appending
- Use covering indexes where possible
Example Schema:
CREATE TABLE optimized (
id INTEGER PRIMARY KEY,
x INTEGER NOT NULL,
y INTEGER NOT NULL
);
CREATE INDEX optimized_xy ON optimized(x, y);
Query Efficiency:
-- Uses index (x,y,id) implicitly
SELECT * FROM optimized
WHERE (x,y,id) > (?,?,?)
ORDER BY x,y,id
LIMIT 10;
Step 10: Understanding Storage Engine Internals
Deep Dive: Study SQLite’s index record format from File Format Documentation.
Key Takeaways:
- Index records store:
- Header (type info)
- Key columns
- Rowid (for rowid tables)
WITHOUT ROWID
tables omit the trailing rowid- Comparison logic operates on serialized B-tree cells
Practical Impact:
Explicit rowid inclusion creates larger index records with duplicated data, affecting both storage and comparison efficiency.
Final Recommendation Summary
- Avoid Explicit Rowid in Indexes: Let SQLite handle implicit appending
- Prefer
WITHOUT ROWID
for PK Clustering: When full index control is critical - Validate with
EXPLAIN
: Don’t rely solely onEXPLAIN QUERY PLAN
- Monitor Version Updates: Optimization behaviors evolve
- Index Column Order Matters: Place rowid aliases last if unavoidable
By aligning index design with SQLite’s storage mechanics and comparison logic, developers can achieve optimal row value comparison performance while avoiding query planner confusion. The observed behavior isn’t a bug but rather a consequence of SQLite’s rowid handling optimizations, necessitating deliberate schema design to work harmoniously with the engine’s internals.