Covering Index Regression with Virtual Columns in SQLite 3.40.0
Issue Overview: Virtual Column Index Coverage Failure in Query Plans
A regression was identified in SQLite version 3.40.0 (initially misreported as 3.34.0) where queries leveraging virtual columns in covering indexes exhibit suboptimal execution plans. This manifests as unnecessary table accesses during query execution, resulting in degraded performance due to increased page misses. The issue is particularly evident in schemas designed for interval overlap queries, where virtual columns are defined as expressions over base columns and indexed to optimize range-based filters.
Schema & Query Context
The problematic schema involves a table with two base columns (beg
and end
) and two virtual columns:
CREATE TABLE t(
beg INTEGER NOT NULL,
end INTEGER NOT NULL,
_beg INTEGER AS (beg) VIRTUAL, -- Direct reference to base column
_len INTEGER AS (end - beg) VIRTUAL -- Expression-based virtual column
);
CREATE INDEX t_i ON t(_beg, _len); -- Composite index on virtual columns
The query in question filters on _beg
(a virtual column mirroring beg
) and _beg + _len
(a derived value):
SELECT _rowid_ FROM t INDEXED BY t_i
WHERE _beg BETWEEN 100 AND 200 AND _beg + _len >= 150;
In SQLite versions prior to 3.40.0, the query planner fully utilizes the covering index t_i
, avoiding table access. However, in 3.40.0+, the planner erroneously accesses the base table (cursor 0
) for the _beg
virtual column, despite its presence in the index.
Execution Plan Analysis
Optimal Plan (Pre-3.40.0):
- The
Column
opcodes (addresses 8 & 9) reference cursor 1 (the indext_i
), confirming index-only access. - No table access occurs during the filter evaluation.
Degraded Plan (3.40.0+):
- The first
Column
opcode (address 9) references cursor 0 (the base tablet
), forcing a row fetch. - The
Affinity
opcodes (addresses 10 & 13) further indicate unnecessary data conversion steps tied to table access.
This regression stems from changes in how the query optimizer classifies virtual column expressions when determining index coverage. Specifically, trivial virtual column definitions (those directly referencing a single base column) are no longer recognized as index-covered expressions in certain contexts.
Root Causes: Virtual Column Classification and Optimizer Behavior Changes
1. Trivial vs. Non-Trivial Virtual Column Definitions
Virtual columns defined as direct references to base columns (e.g., _beg AS (beg)
) are classified as trivial expressions. Prior to 3.40.0, these were treated identically to base columns in index coverage analysis. However, optimizer changes introduced in 3.40.0 began distinguishing between:
- Base Column References: Directly stored in indexes.
- Trivial Virtual Columns: Treated as separate entities, leading to redundant storage in indexes and optimizer confusion.
2. Index Coverage Recognition Regression
The query planner’s logic for determining whether an index fully covers a query’s column requirements was modified to handle virtual columns more conservatively. This change introduced a false negative when:
- A virtual column is defined as a trivial expression.
- The same virtual column is referenced in both the index and query filters.
The optimizer fails to recognize that the trivial virtual column’s value is already stored in the index, triggering a table access to re-fetch the base column value.
3. Expression Classification in Index Scans
The _beg + _len
filter combines two virtual columns. The optimizer:
- Correctly identifies
_len
as an indexed expression (end - beg
). - Misclassifies
_beg
as requiring a table fetch due to its trivial definition.
This inconsistency forces partial index coverage, where only non-trivial virtual columns are utilized from the index.
Resolution: Workarounds, Fixes, and Schema Optimization
1. Modify Trivial Virtual Column Definitions
Workaround: Convert trivial virtual columns into non-trivial expressions using a no-op unary +
operator:
-- Original definition causing regression:
_beg INTEGER AS (beg) VIRTUAL
-- Modified definition forcing expression classification:
_beg INTEGER AS (+beg) VIRTUAL
Mechanism: The +
operator forces SQLite to treat _beg
as an expression rather than a direct column reference. This ensures the optimizer recognizes the virtual column’s value as stored in the index.
Example Schema Fix:
CREATE TABLE t(
beg INTEGER NOT NULL,
end INTEGER NOT NULL,
_beg INTEGER AS (+beg) VIRTUAL, -- Now an expression
_len INTEGER AS (end - beg) VIRTUAL
);
CREATE INDEX t_i ON t(_beg, _len);
2. Verify Query Plan Changes
After modifying the virtual column definition, re-examine the query plan using EXPLAIN
:
EXPLAIN SELECT _rowid_ FROM t INDEXED BY t_i
WHERE _beg BETWEEN 100 AND 200 AND _beg + _len >= 150;
Key Validation Points:
- All
Column
opcodes should reference cursor 1 (the index). - No
Affinity
opcodes should be present for the virtual columns. - Absence of
IfNullRow
checks tied to the base table.
3. Upgrade to SQLite 3.41.0+ with Optimizer Fixes
A permanent fix was implemented in SQLite version 3.41.0 (via check-in 40549bacb3):
- The query planner now correctly recognizes trivial virtual columns as index-covered when appropriate.
- No schema modifications are required if using this version or later.
Upgrade Recommendation:
- For projects locked on 3.40.x, apply the unary
+
workaround. - Migrate to 3.41.0+ to resolve the regression without schema changes.
4. Best Practices for Virtual Columns and Indexes
To avoid similar issues:
- Avoid Redundant Virtual Columns: Do not create virtual columns that simply alias base columns unless necessary for constraints or backward compatibility.
- Explicitly Define Expressions: Even trivial expressions should include an operator (e.g.,
+beg
,beg + 0
) to ensure consistent optimizer treatment. - Validate Index Coverage: Use
EXPLAIN
liberally when upgrading SQLite versions or modifying virtual column definitions.
5. Schema Optimization for Interval Queries
For interval overlap use cases, consider these alternatives:
- Use R-Tree Indexes: Native support for spatial queries, though limited to integer ranges.
- Materialize Calculated Columns: Replace virtual columns with stored columns if write performance permits.
- Re-Express Filters: Rewrite
_beg + _len >= 150
asend >= 150
to leverage base column indexes directly.
6. Long-Term Monitoring and Regression Testing
- Version-Specific Test Suites: Include query plan validation in automated tests when upgrading SQLite.
- Community Tracking: Monitor SQLite’s release notes for optimizer-related changes.
- Query Plan Analysis Tools: Use third-party tools (e.g., SQLite Query Planner Visualizers) to detect coverage regressions.