Assertion Failure in sqlite3WhereEnd During Indexed Expression Optimization

Understanding the Assertion Failure in sqlite3WhereEnd with Indexed Generated Columns

Root Cause: Query Planner Misoptimization Involving Index-Only Scans on Generated Columns

The core issue arises when SQLite’s query planner attempts to optimize a query that combines generated columns, indexed expressions, and correlated subqueries. Specifically, the assertion failure occurs in the sqlite3WhereEnd function during the finalization of the WHERE clause processing. This function is responsible for cleaning up data structures and closing cursors opened during query execution. The assertion (pLoop->wsFlags & WHERE_IDX_ONLY)==0 || cursorIsOpen(v,pOp->p1,k) || pOp->opcode==OP_Offset fails because the query planner incorrectly assumes that an index-only scan (WHERE_IDX_ONLY) is valid for a generated column, even though the underlying cursor for the index is not properly opened or managed.

The failure is tightly coupled with the SQLITE_IndexedExpr optimization, which allows SQLite to use indexed expressions (including those involving generated columns) to satisfy parts of a query without accessing the base table. When this optimization is enabled, the query planner may incorrectly generate an execution plan that skips opening a cursor for the index, leading to a mismatch between the optimizer’s assumptions and the runtime state of the cursors.

The problematic query structure involves:

  1. A table (v0) with a generated column (c2).
  2. A composite index (i) on the generated column and another column (c0).
  3. A correlated subquery in the SELECT list that references the outer table (a0.c0) and uses GROUP BY on the generated column.

This combination confuses the query planner into generating an invalid index-only scan plan, which violates the internal consistency checks in sqlite3WhereEnd.


Key Contributing Factors: Generated Columns, Correlated Subqueries, and Index-Only Scans

  1. Generated Columns and Indexed Expressions
    Generated columns (virtual or stored) compute their values dynamically. When an index includes a generated column, SQLite must ensure that the index correctly reflects the computed value. The SQLITE_IndexedExpr optimization allows the query planner to use these indexes to avoid recomputing the generated column’s value during query execution. However, if the index is not fully materialized or the cursor for the index is not properly initialized, the optimization can lead to inconsistencies.

  2. Correlated Subqueries in Aggregation Contexts
    The SELECT list contains a subquery that correlates the outer table’s column (a0.c0) with the inner subquery’s result. The use of GROUP BY a5.c2 (the generated column) in the subquery forces the query planner to consider index-based optimizations for aggregation. This creates a dependency chain where the outer query’s generated column (a0.c2) and the inner subquery’s grouping logic interact in unexpected ways.

  3. Index-Only Scan Optimization Flaws
    The WHERE_IDX_ONLY flag indicates that the query can be satisfied using only the index, without accessing the base table. When the index includes a generated column, the query planner must ensure that the index cursor is opened and maintained correctly. If the cursor is not opened (due to an oversight in the optimization logic), the assertion in sqlite3WhereEnd fails because the code expects the cursor to be valid when WHERE_IDX_ONLY is set.

  4. Incomplete Patch for Edge Cases
    A prior attempt to fix a similar issue (c8bedef0d61731c2) addressed specific scenarios but did not account for cases where correlated subqueries and generated columns interact with aggregate functions. The bisect results indicate that the regression was introduced between commits 4dc438a951bdbe27 (good) and 40549bacb3923e43 (bad), narrowing down the faulty optimization logic.


Resolving the Assertion Failure: Debugging, Workarounds, and Permanent Fixes

Step 1: Reproduce the Issue with Diagnostic Flags

Compile SQLite with debugging flags to capture the query planner’s decisions:

export CFLAGS="-g -O0 -DSQLITE_DEBUG -DSQLITE_ENABLE_WHERETRACE -DSQLITE_ENABLE_TREETRACE"
./configure
make

Execute the problematic query:

./sqlite3 test.db "CREATE TABLE v0 (c0, c2 AS(c0)); CREATE INDEX i ON v0 (c2, c0); SELECT 1 FROM v0 AS a0 WHERE (SELECT count(+a0.c0 IN (SELECT a5.c2)) FROM v0 a5 GROUP BY a5.c2) GROUP BY a0.c2;"  

The WHERE trace (SQLITE_ENABLE_WHERETRACE) will log the query planner’s optimization steps, showing how the index i is selected for an index-only scan.

Step 2: Disable the SQLITE_IndexedExpr Optimization

As a temporary workaround, disable the problematic optimization using sqlite3_test_control:

.testctrl optimizations 0x01000000; -- Disable SQLITE_IndexedExpr (bit 24)  

This forces the query planner to ignore indexed expressions, reverting to a less optimized but safer execution plan.

Step 3: Analyze the Query Plan

Run EXPLAIN on the query to inspect the generated opcodes:

EXPLAIN SELECT 1 FROM v0 AS a0 WHERE (SELECT count(+a0.c0 IN (SELECT a5.c2)) FROM v0 a5 GROUP BY a5.c2) GROUP BY a0.c2;  

Look for opcodes like OpenRead (cursor opening) and Column (data retrieval). If the index i is used without a corresponding OpenRead on its cursor, the optimization is flawed.

Step 4: Apply the Permanent Fix

The issue was resolved in commit 898bfa1afd8260ea. Update SQLite to a version containing this commit. The fix involves:

  1. Adding checks in the query planner to ensure cursors for indexed expressions are properly initialized.
  2. Refining the logic that determines when an index-only scan is valid for generated columns.

Step 5: Validate with Regression Tests

After applying the fix, re-run the original query and verify that the assertion failure no longer occurs. Additionally, test edge cases involving:

  • Generated columns with complex expressions.
  • Correlated subqueries that reference outer table columns in IN clauses.
  • Composite indexes mixing generated and non-generated columns.

Step 6: Monitor Query Planner Behavior

For future queries, use EXPLAIN and EXPLAIN QUERY PLAN to audit the use of index-only scans on generated columns. Enable WHERE tracing during development to catch similar optimization errors early.

By addressing the root cause—cursor management for indexed generated columns during index-only scans—the assertion failure is resolved, ensuring stable query execution.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *