Unexpected Empty Result When Using ANALYZE with Complex Joins and Indexes
Unexpected Behavior in LEFT OUTER JOIN Results After ANALYZE Execution
Issue Overview
The core problem revolves around a SELECT
query that returns an empty result set instead of a NULL
value when executed under specific conditions involving table statistics collection via the ANALYZE
command. The scenario involves three tables (t0
, t1
, t2
) with unconventional indexes, data types, and JOIN conditions. The query structure includes a chain of LEFT OUTER JOIN
operations and a WHERE
clause that filters on a column from the rightmost table (t2.c0
).
Key Observations
Schema and Data Setup
t0
is a temporary table with aTEXT
columnc0
containing the value'DM'
.t1
is a table with anINT
columnc0
populated with mixed-type values (strings, integers,NULL
).t2
is a table with aREAL
columnc0
initialized to a BLOB value (x''
), later updated toNULL
.- Indexes
i54
(ont0
) andi20
(ont2
) use complex expressions involving bitwise operations,BETWEEN
clauses with invalid operands (e.g.,x''
), andCOLLATE RTRIM
.
Query Structure
The problematic query:SELECT ALL t1.c0 FROM t0 LEFT OUTER JOIN t1 LEFT OUTER JOIN t2 ON (t1.c0 NOT NULL) == (t2.c0 IN ()) WHERE t2.c0;
- The JOIN condition compares
t1.c0 NOT NULL
(a boolean expression) tot2.c0 IN ()
(an empty list, which always evaluates toFALSE
). - The
WHERE t2.c0
clause filters rows wheret2.c0
is truthy (non-zero and non-NULL
).
- The JOIN condition compares
Behavioral Discrepancy
- Case 1 (Without
ANALYZE
): The query returns no rows becauset2.c0
isNULL
(due to theUPDATE
statement), and theWHERE t2.c0
clause filters out all rows. However, the user expected aNULL
result, likely due to misunderstanding howLEFT JOIN
interacts withWHERE
clauses. - Case 2 (With
ANALYZE
): After runningANALYZE t0
andANALYZE t1
, the query returns an empty result set. The user interpreted this as a bug, expectingNULL
instead of an empty output.
- Case 1 (Without
Version-Specific Fix
The SQLite maintainer confirmed the issue was resolved in versions after 3.41.1 (specifically in trunk and branch-3.41). The root cause was a query planner bug triggered by stale or incorrect statistics collected viaANALYZE
.
Root Causes Behind Incorrect Query Results Post-ANALYZE
Query Planner Sensitivity to Table Statistics
TheANALYZE
command generates statistical metadata about table contents and index selectivity. These statistics influence the query planner’s decisions about join order, index usage, and predicate evaluation. In this case:- The
WHERE
clause (t2.c0
) depends ont2
being part of the JOIN sequence. - After
ANALYZE
, outdated or misleading statistics caused the planner to incorrectly optimize the JOIN order or predicate evaluation, leading to an empty result.
- The
Index-Driven Optimization Pitfalls
The indexesi54
andi20
involve non-deterministic or invalid expressions:i54
includes((c0) & (c0)) NOT BETWEEN ('?~') AND (x'')
, which combines bitwise operations with aBETWEEN
clause comparing aTEXT
value ('?~'
) to a BLOB (x''
). This index is effectively unused due to itsWHERE
clause ((NULL) BETWEEN (c0) AND (c0)
), which always evaluates toNULL
(equivalent toFALSE
).i20
uses((c0 OR 0.488...) OR c0) IN ()
, which simplifies toFALSE
.
These indexes confuse the query planner by introducing synthetic columns with hard-coded
FALSE
orNULL
values, skewing statistics and causing incorrect index selections.NULL Handling in JOIN Conditions and WHERE Clauses
- The
LEFT OUTER JOIN t2
condition(t1.c0 NOT NULL) == (t2.c0 IN ())
resolves tot1.c0 NOT NULL == FALSE
, or equivalentlyt1.c0 IS NULL
. - The
WHERE t2.c0
clause filters out all rows wheret2.c0
isNULL
or0
. Sincet2.c0
was set toNULL
viaUPDATE
, no rows satisfy this condition. - The expectation of a
NULL
result stemmed from misunderstanding howLEFT JOIN
preserves unmatched rows from the left table but appliesWHERE
clause filtering after the JOIN.
- The
CLI Output Formatting Misinterpretation
SQLite’s command-line interface (CLI) does not displayNULL
values as the string "NULL" but instead shows an empty string. This led to confusion between an actualNULL
value (stored in the result set) and an empty output (no rows).
Resolving Empty Results by Addressing Query Plan and Version-Specific Bugs
Step 1: Verify SQLite Version and Apply Updates
- Check Version:
SELECT sqlite_version();
Ensure the version is 3.41.1 or newer. Versions prior to 3.41.1 contain the query planner bug fixed in later releases.
- Update SQLite:
Download the latest precompiled binaries or build from source:wget https://www.sqlite.org/src/tarball/sqlite.tar.gz tar xzf sqlite.tar.gz cd sqlite ./configure && make
Step 2: Analyze Query Logic and Data Flow
Simplify the Query:
Break down the JOIN sequence and WHERE clause:-- Original JOIN logic rewritten for clarity SELECT t1.c0 FROM t0 LEFT JOIN ( t1 LEFT JOIN t2 ON (t1.c0 IS NULL) ) ON TRUE WHERE t2.c0;
- The
t2
JOIN condition simplifies tot1.c0 IS NULL
. - The
WHERE t2.c0
clause excludes rows wheret2.c0
isNULL
or0
.
- The
Validate Data States:
- Confirm
t2.c0
isNULL
after theUPDATE
:SELECT c0 FROM t2;
- Check
t1.c0
values forNULL
:SELECT c0 FROM t1 WHERE c0 IS NULL;
- Confirm
Step 3: Disable or Correct Indexes
Drop Problematic Indexes:
DROP INDEX i54; DROP INDEX i20;
Re-run the query to see if the result changes.
Rewrite Index Definitions:
Avoid expressions with hard-codedFALSE
orNULL
conditions. For example:CREATE INDEX i20 ON t2(c0); -- Simple index on t2.c0
Step 4: Bypass ANALYZE-Induced Issues
- Reset Statistics:
Clear existing statistics to force the query planner to recalculate:ANALYZE sqlite_schema; -- Updates internal schema metadata
- Use Query-Specific Optimizer Hints:
Force a specific JOIN order usingCROSS JOIN
orINDEXED BY
:SELECT ALL t1.c0 FROM t0 LEFT OUTER JOIN t1 LEFT OUTER JOIN t2 INDEXED BY i20 ON ...
Step 5: Correct NULL Handling and Output Display
- Explicitly Handle NULLs:
UseCOALESCE
orIFNULL
to distinguishNULL
from empty strings:SELECT ALL COALESCE(t1.c0, 'NULL') AS c0 FROM ...
- Adjust CLI Settings:
Configure the CLI to displayNULL
explicitly:.nullvalue NULL
Step 6: Validate with EXPLAIN and EXPLAIN QUERY PLAN
Inspect Query Execution Steps:
EXPLAIN QUERY PLAN SELECT ALL t1.c0 FROM t0 LEFT OUTER JOIN t1 ...;
- Look for full table scans vs index usage.
- Verify JOIN order matches expectations.
Compare Pre- and Post-ANALYZE Plans:
RunEXPLAIN
before and afterANALYZE
to identify differences in opcode sequences.
Step 7: Test with Simplified Schema and Data
- Reproduce with Minimal Components:
Create a minimal test case:CREATE TABLE t1 (c0 INT); INSERT INTO t1 VALUES (NULL); CREATE TABLE t2 (c0 REAL); INSERT INTO t2 VALUES (NULL); SELECT t1.c0 FROM t1 LEFT JOIN t2 ON t1.c0 IS NULL WHERE t2.c0;
- This isolates the JOIN logic and
WHERE
clause.
- This isolates the JOIN logic and
Step 8: Review SQLite’s NULL and Boolean Logic
Understand Implicit Type Conversions:
t2.c0 IN ()
evaluates toFALSE
regardless oft2.c0
.t1.c0 NOT NULL
evaluates toTRUE
orFALSE
, but in SQLite,TRUE
is1
andFALSE
is0
.- The comparison
(t1.c0 NOT NULL) == (t2.c0 IN ())
becomes1 == 0
or0 == 0
, depending ont1.c0
.
Adjust Boolean Comparisons:
Use explicitIS TRUE
orIS FALSE
to avoid integer coercion:ON (t1.c0 NOT NULL) IS (t2.c0 IN ())
Step 9: Consult SQLite’s Query Planner Documentation
- Reference:
Study sections on The SQLite Query Planner and LEFT JOIN Strength Reduction.
Step 10: Report Edge Cases to SQLite Maintainers
- Contribute to Testing:
If the issue persists after all fixes, submit a reproducible test case to the SQLite team via their bug tracker.
This guide systematically addresses the interplay between query planner optimizations, index design, and NULL handling in SQLite. By methodically isolating each component, developers can diagnose similar issues arising from complex JOINs and statistical metadata.