Assertion Failure in sqlite3WhereEnd During Indexed Expression Optimization
Understanding the Assertion Failure in sqlite3WhereEnd with Indexed Generated Columns
Root Cause: Query Planner Misoptimization Involving Index-Only Scans on Generated Columns
The core issue arises when SQLite’s query planner attempts to optimize a query that combines generated columns, indexed expressions, and correlated subqueries. Specifically, the assertion failure occurs in the sqlite3WhereEnd
function during the finalization of the WHERE
clause processing. This function is responsible for cleaning up data structures and closing cursors opened during query execution. The assertion (pLoop->wsFlags & WHERE_IDX_ONLY)==0 || cursorIsOpen(v,pOp->p1,k) || pOp->opcode==OP_Offset
fails because the query planner incorrectly assumes that an index-only scan (WHERE_IDX_ONLY
) is valid for a generated column, even though the underlying cursor for the index is not properly opened or managed.
The failure is tightly coupled with the SQLITE_IndexedExpr
optimization, which allows SQLite to use indexed expressions (including those involving generated columns) to satisfy parts of a query without accessing the base table. When this optimization is enabled, the query planner may incorrectly generate an execution plan that skips opening a cursor for the index, leading to a mismatch between the optimizer’s assumptions and the runtime state of the cursors.
The problematic query structure involves:
- A table (
v0
) with a generated column (c2
). - A composite index (
i
) on the generated column and another column (c0
). - A correlated subquery in the
SELECT
list that references the outer table (a0.c0
) and usesGROUP BY
on the generated column.
This combination confuses the query planner into generating an invalid index-only scan plan, which violates the internal consistency checks in sqlite3WhereEnd
.
Key Contributing Factors: Generated Columns, Correlated Subqueries, and Index-Only Scans
Generated Columns and Indexed Expressions
Generated columns (virtual or stored) compute their values dynamically. When an index includes a generated column, SQLite must ensure that the index correctly reflects the computed value. TheSQLITE_IndexedExpr
optimization allows the query planner to use these indexes to avoid recomputing the generated column’s value during query execution. However, if the index is not fully materialized or the cursor for the index is not properly initialized, the optimization can lead to inconsistencies.Correlated Subqueries in Aggregation Contexts
TheSELECT
list contains a subquery that correlates the outer table’s column (a0.c0
) with the inner subquery’s result. The use ofGROUP BY a5.c2
(the generated column) in the subquery forces the query planner to consider index-based optimizations for aggregation. This creates a dependency chain where the outer query’s generated column (a0.c2
) and the inner subquery’s grouping logic interact in unexpected ways.Index-Only Scan Optimization Flaws
TheWHERE_IDX_ONLY
flag indicates that the query can be satisfied using only the index, without accessing the base table. When the index includes a generated column, the query planner must ensure that the index cursor is opened and maintained correctly. If the cursor is not opened (due to an oversight in the optimization logic), the assertion insqlite3WhereEnd
fails because the code expects the cursor to be valid whenWHERE_IDX_ONLY
is set.Incomplete Patch for Edge Cases
A prior attempt to fix a similar issue (c8bedef0d61731c2) addressed specific scenarios but did not account for cases where correlated subqueries and generated columns interact with aggregate functions. The bisect results indicate that the regression was introduced between commits4dc438a951bdbe27
(good) and40549bacb3923e43
(bad), narrowing down the faulty optimization logic.
Resolving the Assertion Failure: Debugging, Workarounds, and Permanent Fixes
Step 1: Reproduce the Issue with Diagnostic Flags
Compile SQLite with debugging flags to capture the query planner’s decisions:
export CFLAGS="-g -O0 -DSQLITE_DEBUG -DSQLITE_ENABLE_WHERETRACE -DSQLITE_ENABLE_TREETRACE"
./configure
make
Execute the problematic query:
./sqlite3 test.db "CREATE TABLE v0 (c0, c2 AS(c0)); CREATE INDEX i ON v0 (c2, c0); SELECT 1 FROM v0 AS a0 WHERE (SELECT count(+a0.c0 IN (SELECT a5.c2)) FROM v0 a5 GROUP BY a5.c2) GROUP BY a0.c2;"
The WHERE
trace (SQLITE_ENABLE_WHERETRACE
) will log the query planner’s optimization steps, showing how the index i
is selected for an index-only scan.
Step 2: Disable the SQLITE_IndexedExpr Optimization
As a temporary workaround, disable the problematic optimization using sqlite3_test_control
:
.testctrl optimizations 0x01000000; -- Disable SQLITE_IndexedExpr (bit 24)
This forces the query planner to ignore indexed expressions, reverting to a less optimized but safer execution plan.
Step 3: Analyze the Query Plan
Run EXPLAIN
on the query to inspect the generated opcodes:
EXPLAIN SELECT 1 FROM v0 AS a0 WHERE (SELECT count(+a0.c0 IN (SELECT a5.c2)) FROM v0 a5 GROUP BY a5.c2) GROUP BY a0.c2;
Look for opcodes like OpenRead
(cursor opening) and Column
(data retrieval). If the index i
is used without a corresponding OpenRead
on its cursor, the optimization is flawed.
Step 4: Apply the Permanent Fix
The issue was resolved in commit 898bfa1afd8260ea. Update SQLite to a version containing this commit. The fix involves:
- Adding checks in the query planner to ensure cursors for indexed expressions are properly initialized.
- Refining the logic that determines when an index-only scan is valid for generated columns.
Step 5: Validate with Regression Tests
After applying the fix, re-run the original query and verify that the assertion failure no longer occurs. Additionally, test edge cases involving:
- Generated columns with complex expressions.
- Correlated subqueries that reference outer table columns in
IN
clauses. - Composite indexes mixing generated and non-generated columns.
Step 6: Monitor Query Planner Behavior
For future queries, use EXPLAIN
and EXPLAIN QUERY PLAN
to audit the use of index-only scans on generated columns. Enable WHERE
tracing during development to catch similar optimization errors early.
By addressing the root cause—cursor management for indexed generated columns during index-only scans—the assertion failure is resolved, ensuring stable query execution.