Unexpected Empty Result When Using ANALYZE with Complex Joins and Indexes
Unexpected Behavior in LEFT OUTER JOIN Results After ANALYZE Execution
Issue Overview
The core problem revolves around a SELECT query that returns an empty result set instead of a NULL value when executed under specific conditions involving table statistics collection via the ANALYZE command. The scenario involves three tables (t0, t1, t2) with unconventional indexes, data types, and JOIN conditions. The query structure includes a chain of LEFT OUTER JOIN operations and a WHERE clause that filters on a column from the rightmost table (t2.c0).
Key Observations
-
Schema and Data Setup
t0is a temporary table with aTEXTcolumnc0containing the value'DM'.t1is a table with anINTcolumnc0populated with mixed-type values (strings, integers,NULL).t2is a table with aREALcolumnc0initialized to a BLOB value (x''), later updated toNULL.- Indexes
i54(ont0) andi20(ont2) use complex expressions involving bitwise operations,BETWEENclauses with invalid operands (e.g.,x''), andCOLLATE RTRIM.
-
Query Structure
The problematic query:SELECT ALL t1.c0 FROM t0 LEFT OUTER JOIN t1 LEFT OUTER JOIN t2 ON (t1.c0 NOT NULL) == (t2.c0 IN ()) WHERE t2.c0;- The JOIN condition compares
t1.c0 NOT NULL(a boolean expression) tot2.c0 IN ()(an empty list, which always evaluates toFALSE). - The
WHERE t2.c0clause filters rows wheret2.c0is truthy (non-zero and non-NULL).
- The JOIN condition compares
-
Behavioral Discrepancy
- Case 1 (Without
ANALYZE): The query returns no rows becauset2.c0isNULL(due to theUPDATEstatement), and theWHERE t2.c0clause filters out all rows. However, the user expected aNULLresult, likely due to misunderstanding howLEFT JOINinteracts withWHEREclauses. - Case 2 (With
ANALYZE): After runningANALYZE t0andANALYZE t1, the query returns an empty result set. The user interpreted this as a bug, expectingNULLinstead of an empty output.
- Case 1 (Without
-
Version-Specific Fix
The SQLite maintainer confirmed the issue was resolved in versions after 3.41.1 (specifically in trunk and branch-3.41). The root cause was a query planner bug triggered by stale or incorrect statistics collected viaANALYZE.
Root Causes Behind Incorrect Query Results Post-ANALYZE
-
Query Planner Sensitivity to Table Statistics
TheANALYZEcommand generates statistical metadata about table contents and index selectivity. These statistics influence the query planner’s decisions about join order, index usage, and predicate evaluation. In this case:- The
WHEREclause (t2.c0) depends ont2being part of the JOIN sequence. - After
ANALYZE, outdated or misleading statistics caused the planner to incorrectly optimize the JOIN order or predicate evaluation, leading to an empty result.
- The
-
Index-Driven Optimization Pitfalls
The indexesi54andi20involve non-deterministic or invalid expressions:i54includes((c0) & (c0)) NOT BETWEEN ('?~') AND (x''), which combines bitwise operations with aBETWEENclause comparing aTEXTvalue ('?~') to a BLOB (x''). This index is effectively unused due to itsWHEREclause ((NULL) BETWEEN (c0) AND (c0)), which always evaluates toNULL(equivalent toFALSE).i20uses((c0 OR 0.488...) OR c0) IN (), which simplifies toFALSE.
These indexes confuse the query planner by introducing synthetic columns with hard-coded
FALSEorNULLvalues, skewing statistics and causing incorrect index selections. -
NULL Handling in JOIN Conditions and WHERE Clauses
- The
LEFT OUTER JOIN t2condition(t1.c0 NOT NULL) == (t2.c0 IN ())resolves tot1.c0 NOT NULL == FALSE, or equivalentlyt1.c0 IS NULL. - The
WHERE t2.c0clause filters out all rows wheret2.c0isNULLor0. Sincet2.c0was set toNULLviaUPDATE, no rows satisfy this condition. - The expectation of a
NULLresult stemmed from misunderstanding howLEFT JOINpreserves unmatched rows from the left table but appliesWHEREclause filtering after the JOIN.
- The
-
CLI Output Formatting Misinterpretation
SQLite’s command-line interface (CLI) does not displayNULLvalues as the string "NULL" but instead shows an empty string. This led to confusion between an actualNULLvalue (stored in the result set) and an empty output (no rows).
Resolving Empty Results by Addressing Query Plan and Version-Specific Bugs
Step 1: Verify SQLite Version and Apply Updates
- Check Version:
SELECT sqlite_version();Ensure the version is 3.41.1 or newer. Versions prior to 3.41.1 contain the query planner bug fixed in later releases.
- Update SQLite:
Download the latest precompiled binaries or build from source:wget https://www.sqlite.org/src/tarball/sqlite.tar.gz tar xzf sqlite.tar.gz cd sqlite ./configure && make
Step 2: Analyze Query Logic and Data Flow
-
Simplify the Query:
Break down the JOIN sequence and WHERE clause:-- Original JOIN logic rewritten for clarity SELECT t1.c0 FROM t0 LEFT JOIN ( t1 LEFT JOIN t2 ON (t1.c0 IS NULL) ) ON TRUE WHERE t2.c0;- The
t2JOIN condition simplifies tot1.c0 IS NULL. - The
WHERE t2.c0clause excludes rows wheret2.c0isNULLor0.
- The
-
Validate Data States:
- Confirm
t2.c0isNULLafter theUPDATE:SELECT c0 FROM t2; - Check
t1.c0values forNULL:SELECT c0 FROM t1 WHERE c0 IS NULL;
- Confirm
Step 3: Disable or Correct Indexes
-
Drop Problematic Indexes:
DROP INDEX i54; DROP INDEX i20;Re-run the query to see if the result changes.
-
Rewrite Index Definitions:
Avoid expressions with hard-codedFALSEorNULLconditions. For example:CREATE INDEX i20 ON t2(c0); -- Simple index on t2.c0
Step 4: Bypass ANALYZE-Induced Issues
- Reset Statistics:
Clear existing statistics to force the query planner to recalculate:ANALYZE sqlite_schema; -- Updates internal schema metadata - Use Query-Specific Optimizer Hints:
Force a specific JOIN order usingCROSS JOINorINDEXED BY:SELECT ALL t1.c0 FROM t0 LEFT OUTER JOIN t1 LEFT OUTER JOIN t2 INDEXED BY i20 ON ...
Step 5: Correct NULL Handling and Output Display
- Explicitly Handle NULLs:
UseCOALESCEorIFNULLto distinguishNULLfrom empty strings:SELECT ALL COALESCE(t1.c0, 'NULL') AS c0 FROM ... - Adjust CLI Settings:
Configure the CLI to displayNULLexplicitly:.nullvalue NULL
Step 6: Validate with EXPLAIN and EXPLAIN QUERY PLAN
-
Inspect Query Execution Steps:
EXPLAIN QUERY PLAN SELECT ALL t1.c0 FROM t0 LEFT OUTER JOIN t1 ...;- Look for full table scans vs index usage.
- Verify JOIN order matches expectations.
-
Compare Pre- and Post-ANALYZE Plans:
RunEXPLAINbefore and afterANALYZEto identify differences in opcode sequences.
Step 7: Test with Simplified Schema and Data
- Reproduce with Minimal Components:
Create a minimal test case:CREATE TABLE t1 (c0 INT); INSERT INTO t1 VALUES (NULL); CREATE TABLE t2 (c0 REAL); INSERT INTO t2 VALUES (NULL); SELECT t1.c0 FROM t1 LEFT JOIN t2 ON t1.c0 IS NULL WHERE t2.c0;- This isolates the JOIN logic and
WHEREclause.
- This isolates the JOIN logic and
Step 8: Review SQLite’s NULL and Boolean Logic
-
Understand Implicit Type Conversions:
t2.c0 IN ()evaluates toFALSEregardless oft2.c0.t1.c0 NOT NULLevaluates toTRUEorFALSE, but in SQLite,TRUEis1andFALSEis0.- The comparison
(t1.c0 NOT NULL) == (t2.c0 IN ())becomes1 == 0or0 == 0, depending ont1.c0.
-
Adjust Boolean Comparisons:
Use explicitIS TRUEorIS FALSEto avoid integer coercion:ON (t1.c0 NOT NULL) IS (t2.c0 IN ())
Step 9: Consult SQLite’s Query Planner Documentation
- Reference:
Study sections on The SQLite Query Planner and LEFT JOIN Strength Reduction.
Step 10: Report Edge Cases to SQLite Maintainers
- Contribute to Testing:
If the issue persists after all fixes, submit a reproducible test case to the SQLite team via their bug tracker.
This guide systematically addresses the interplay between query planner optimizations, index design, and NULL handling in SQLite. By methodically isolating each component, developers can diagnose similar issues arising from complex JOINs and statistical metadata.