Covering Index Regression with Virtual Columns in SQLite 3.40.0

Issue Overview: Virtual Column Index Coverage Failure in Query Plans

A regression was identified in SQLite version 3.40.0 (initially misreported as 3.34.0) where queries leveraging virtual columns in covering indexes exhibit suboptimal execution plans. This manifests as unnecessary table accesses during query execution, resulting in degraded performance due to increased page misses. The issue is particularly evident in schemas designed for interval overlap queries, where virtual columns are defined as expressions over base columns and indexed to optimize range-based filters.

Schema & Query Context

The problematic schema involves a table with two base columns (beg and end) and two virtual columns:

CREATE TABLE t(
  beg INTEGER NOT NULL, 
  end INTEGER NOT NULL,
  _beg INTEGER AS (beg) VIRTUAL,  -- Direct reference to base column
  _len INTEGER AS (end - beg) VIRTUAL  -- Expression-based virtual column
);
CREATE INDEX t_i ON t(_beg, _len);  -- Composite index on virtual columns

The query in question filters on _beg (a virtual column mirroring beg) and _beg + _len (a derived value):

SELECT _rowid_ FROM t INDEXED BY t_i 
WHERE _beg BETWEEN 100 AND 200 AND _beg + _len >= 150;

In SQLite versions prior to 3.40.0, the query planner fully utilizes the covering index t_i, avoiding table access. However, in 3.40.0+, the planner erroneously accesses the base table (cursor 0) for the _beg virtual column, despite its presence in the index.

Execution Plan Analysis

Optimal Plan (Pre-3.40.0):

  • The Column opcodes (addresses 8 & 9) reference cursor 1 (the index t_i), confirming index-only access.
  • No table access occurs during the filter evaluation.

Degraded Plan (3.40.0+):

  • The first Column opcode (address 9) references cursor 0 (the base table t), forcing a row fetch.
  • The Affinity opcodes (addresses 10 & 13) further indicate unnecessary data conversion steps tied to table access.

This regression stems from changes in how the query optimizer classifies virtual column expressions when determining index coverage. Specifically, trivial virtual column definitions (those directly referencing a single base column) are no longer recognized as index-covered expressions in certain contexts.


Root Causes: Virtual Column Classification and Optimizer Behavior Changes

1. Trivial vs. Non-Trivial Virtual Column Definitions

Virtual columns defined as direct references to base columns (e.g., _beg AS (beg)) are classified as trivial expressions. Prior to 3.40.0, these were treated identically to base columns in index coverage analysis. However, optimizer changes introduced in 3.40.0 began distinguishing between:

  • Base Column References: Directly stored in indexes.
  • Trivial Virtual Columns: Treated as separate entities, leading to redundant storage in indexes and optimizer confusion.

2. Index Coverage Recognition Regression

The query planner’s logic for determining whether an index fully covers a query’s column requirements was modified to handle virtual columns more conservatively. This change introduced a false negative when:

  • A virtual column is defined as a trivial expression.
  • The same virtual column is referenced in both the index and query filters.

The optimizer fails to recognize that the trivial virtual column’s value is already stored in the index, triggering a table access to re-fetch the base column value.

3. Expression Classification in Index Scans

The _beg + _len filter combines two virtual columns. The optimizer:

  • Correctly identifies _len as an indexed expression (end - beg).
  • Misclassifies _beg as requiring a table fetch due to its trivial definition.

This inconsistency forces partial index coverage, where only non-trivial virtual columns are utilized from the index.


Resolution: Workarounds, Fixes, and Schema Optimization

1. Modify Trivial Virtual Column Definitions

Workaround: Convert trivial virtual columns into non-trivial expressions using a no-op unary + operator:

-- Original definition causing regression:
_beg INTEGER AS (beg) VIRTUAL

-- Modified definition forcing expression classification:
_beg INTEGER AS (+beg) VIRTUAL

Mechanism: The + operator forces SQLite to treat _beg as an expression rather than a direct column reference. This ensures the optimizer recognizes the virtual column’s value as stored in the index.

Example Schema Fix:

CREATE TABLE t(
  beg INTEGER NOT NULL, 
  end INTEGER NOT NULL,
  _beg INTEGER AS (+beg) VIRTUAL,  -- Now an expression
  _len INTEGER AS (end - beg) VIRTUAL
);
CREATE INDEX t_i ON t(_beg, _len);

2. Verify Query Plan Changes

After modifying the virtual column definition, re-examine the query plan using EXPLAIN:

EXPLAIN SELECT _rowid_ FROM t INDEXED BY t_i 
WHERE _beg BETWEEN 100 AND 200 AND _beg + _len >= 150;

Key Validation Points:

  • All Column opcodes should reference cursor 1 (the index).
  • No Affinity opcodes should be present for the virtual columns.
  • Absence of IfNullRow checks tied to the base table.

3. Upgrade to SQLite 3.41.0+ with Optimizer Fixes

A permanent fix was implemented in SQLite version 3.41.0 (via check-in 40549bacb3):

  • The query planner now correctly recognizes trivial virtual columns as index-covered when appropriate.
  • No schema modifications are required if using this version or later.

Upgrade Recommendation:

  • For projects locked on 3.40.x, apply the unary + workaround.
  • Migrate to 3.41.0+ to resolve the regression without schema changes.

4. Best Practices for Virtual Columns and Indexes

To avoid similar issues:

  • Avoid Redundant Virtual Columns: Do not create virtual columns that simply alias base columns unless necessary for constraints or backward compatibility.
  • Explicitly Define Expressions: Even trivial expressions should include an operator (e.g., +beg, beg + 0) to ensure consistent optimizer treatment.
  • Validate Index Coverage: Use EXPLAIN liberally when upgrading SQLite versions or modifying virtual column definitions.

5. Schema Optimization for Interval Queries

For interval overlap use cases, consider these alternatives:

  • Use R-Tree Indexes: Native support for spatial queries, though limited to integer ranges.
  • Materialize Calculated Columns: Replace virtual columns with stored columns if write performance permits.
  • Re-Express Filters: Rewrite _beg + _len >= 150 as end >= 150 to leverage base column indexes directly.

6. Long-Term Monitoring and Regression Testing

  • Version-Specific Test Suites: Include query plan validation in automated tests when upgrading SQLite.
  • Community Tracking: Monitor SQLite’s release notes for optimizer-related changes.
  • Query Plan Analysis Tools: Use third-party tools (e.g., SQLite Query Planner Visualizers) to detect coverage regressions.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *