Assertion Failure in ANALYZE with STAT4 and Expression Indexes
Understanding the STAT4 Analysis Assertion Failure in SQLite
The core issue revolves around an assertion failure triggered during the execution of the ANALYZE
command in SQLite when specific conditions are met:
- The SQLITE_STAT4 compile-time option is enabled.
- The database contains indexes defined on non-trivial expressions (e.g., arithmetic operations, string concatenation).
- Multiple
ANALYZE
commands are executed consecutively.
This problem manifests as a crash with the error message:
sqlite3: sqlite3.c:92068: sqlite3VdbeExec: Assertion
pIn1!=pOut’ failed.`
The failure occurs deep within SQLite’s virtual database engine (VDBE) during the processing of statistical data for query optimization. The assertion pIn1!=pOut
enforces a critical invariant in the VDBE: the input and output registers for an opcode must not alias. Violation of this invariant indicates a flaw in how temporary registers are allocated or reused during query execution, particularly in the context of statistical sampling for index analysis.
Root Causes: STAT4 Sampling and Expression Index Interactions
1. STAT4’s Dependency on Index Sampling
SQLITE_STAT4 enhances the query planner’s ability to estimate selectivity by collecting detailed statistics about index value distributions. Unlike earlier STAT3/STAT2, STAT4 samples index entries directly, generating histograms for the left-most columns of an index. This process involves executing internal queries that read indexed data using temporary tables or subqueries.
When an index includes expressions (e.g., c1|c1
, c1||c1
), STAT4’s sampling logic must evaluate these expressions dynamically. SQLite generates VDBE code to compute the indexed expressions during analysis. If the generated code inadvertently reuses the same register for both input and output operands in a VDBE instruction, the assertion pIn1!=pOut
fails, as the engine detects an illegal overlap.
2. Compile-Time Optimization Flags and Code Paths
The problem is tightly coupled with the SQLITE_STAT4 directive. Disabling STAT4 (via .testctrl optimizations 0x00000800
or omitting -DSQLITE_ENABLE_STAT4
at compile time) avoids the crash because the faulty code path is excluded. However, this workaround sacrifices optimization capabilities.
The assertion failure is not observed in earlier SQLite versions (pre-3.8.0) because STAT4 was introduced in 3.8.0. Furthermore, the introduction of expression-based indexes (check-in 2131a5ca53f0e9b0) created new interactions between STAT4’s sampling logic and the VDBE’s register management.
3. Register Allocation in the VDBE
The VDBE uses a register-based architecture where each operation (opcode) reads from and writes to specific registers. The assertion pIn1!=pOut
ensures that an operation’s output register does not overlap with its input registers, preventing data corruption.
In the failing scenario, the code generated for STAT4’s internal queries reuses a register incorrectly. For example, an opcode like OP_Column
(which reads a column value) might be instructed to write its output to a register that is simultaneously being used as an input for another part of the expression evaluation. This overlap violates the VDBE’s register integrity rules.
Resolving the STAT4 Assertion Failure: Diagnosis and Mitigation
Step 1: Confirm the Presence of Expression-Based Indexes
Identify indexes defined using expressions involving arithmetic operators, string functions, or other non-column references. In the test case:
CREATE INDEX i2 ON v0 (c1|c1, c1||c1, c1, c1);
The first two index columns are expressions (c1|c1
and c1||c1
), while the latter two are simple column references.
Action:
- Review schema definitions for indexes containing
+
,||
,|
,/
, or function calls. - Use
sqlite3_master
or.schema
to list all indexes.
Step 2: Reproduce Under Debugging Symbols
Compile SQLite with debugging symbols (-g -O0
) and enable internal tracing:
export CFLAGS="-g -O0 -DSQLITE_DEBUG -DSQLITE_ENABLE_TREETRACE -DSQLITE_ENABLE_WHERETRACE -DSQLITE_ENABLE_CURSOR_HINTS -DSQLITE_COUNTOFVIEW_OPTIMIZATION -DSQLITE_ENABLE_STAT4"
./configure
make
Execute the failing script:
./sqlite3 test.db <<EOF
CREATE TABLE v0 ( c1 PRIMARY KEY ) WITHOUT ROWID;
CREATE INDEX i2 ON v0 ( c1|c1, c1||c1, c1, c1 );
INSERT INTO v0 ( c1 ) VALUES ( 0 );
ANALYZE;
ANALYZE;
EOF
The second ANALYZE
triggers the assertion due to stale statistics or cached VDBE programs.
Step 3: Analyze VDBE Bytecode with EXPLAIN
Capture the VDBE instructions generated during the second ANALYZE
:
.echo on
EXPLAIN ANALYZE;
Look for opcodes where the pOut
register overlaps with pIn1
or pIn2
. For example:
addr opcode p1 p2 p3 p4 p5 comment
--- ------------ ---- ---- ------ ------------- -- --------
...
10 Column 0 1 3 r[3] 00 i2.c1|c1
11 BitOr 3 3 2 r[2]=r[3]|r[3] 00
...
Here, OP_BitOr
reads from register 3 and writes to register 2. If pIn1
(register 3) and pOut
(register 2) were the same, the assertion would fire.
Step 4: Apply the Official Patch
The SQLite development team addressed this in check-in 13c8c60bb6b4447b. Update to SQLite 3.41.0 or later, where the fix is included.
Verification:
After updating, rerun the test script. The second ANALYZE
should complete without assertion failures.
Step 5: Workarounds for Unpatchable Systems
If updating SQLite is not feasible, consider:
Disabling STAT4:
.testctrl optimizations 0x00000800; -- Disable STAT4 ANALYZE;
Note: This degrades the query planner’s accuracy for indexes with many duplicate values.
Avoiding Expression Indexes:
Refactor schemas to use generated columns or separate columns for computed values. For example:CREATE TABLE v0 ( c1 INTEGER PRIMARY KEY, c1_or_c1 INTEGER GENERATED ALWAYS AS (c1 | c1), c1_concat_c1 TEXT GENERATED ALWAYS AS (c1 || c1) ) WITHOUT ROWID; CREATE INDEX i2 ON v0 (c1_or_c1, c1_concat_c1, c1, c1);
Generated columns materialize the expression results, allowing simpler indexes.
Step 6: Regression Testing for Complex Indexes
For databases relying heavily on expression-based indexes, implement automated tests that:
- Create indexes with varied expressions (
+
,||
,SUBSTR
, etc.). - Run
ANALYZE
multiple times. - Validate the absence of crashes and correctness of query plans.
Example Test Script:
CREATE TABLE test (a INT, b TEXT);
CREATE INDEX idx_expr1 ON test (a|a, b||b);
CREATE INDEX idx_expr2 ON test (a+a, LENGTH(b));
INSERT INTO test VALUES (1, 'xyz'), (2, 'abc');
ANALYZE;
ANALYZE; -- Second ANALYZE
SELECT COUNT(*) FROM sqlite_stat4; -- Ensure stats are populated
Step 7: Monitor Schema Changes and STAT4 Interactions
Adopt a review process for schema modifications involving expression indexes. Ensure developers:
- Document the use case for each expression index.
- Test
ANALYZE
under STAT4 after schema changes. - Consider alternatives (e.g., triggers, generated columns) where expressions are volatile or complex.
Final Recommendations
- Upgrade SQLite: Deploy version 3.41.0 or newer to benefit from the permanent fix.
- Audit Existing Indexes: Identify and refactor high-risk expression indexes.
- Enable Debug Symbols in Testing: Facilitate rapid diagnosis of similar issues.
- Implement Regression Tests: Cover STAT4 interactions with expression indexes.
By understanding the interplay between STAT4’s sampling mechanism, expression indexes, and VDBE register management, developers can preemptively avoid this class of assertion failures and maintain robust SQLite deployments.