Unexpected Empty Result with FULL OUTER JOIN and Constant Expression Evaluation in SQLite 3.39.0
Understanding the Core Problem: FULL OUTER JOIN and Constant Expression Evaluation
Issue Overview
The primary issue revolves around a SQLite query that combines a FULL OUTER JOIN
with a constant expression (1 IS TRUE
). The first query in the example returns a row containing data from all joined tables, while the second query, which replaces the SELECT *
with SELECT 1 IS TRUE
, returns an empty result set. This discrepancy is unexpected because the expression 1 IS TRUE
should always evaluate to TRUE
(i.e., 1
), and the FULL OUTER JOIN
should ensure at least one row is returned. The problem occurs in SQLite version 3.39.0 (unreleased at the time of the discussion), and it exposes a subtle interaction between query optimization logic, join reordering, and the handling of constant expressions in the presence of FULL OUTER JOIN
.
Key Observations
Behavior of
FULL OUTER JOIN
:
AFULL OUTER JOIN
combines the results of bothLEFT JOIN
andRIGHT JOIN
, ensuring that all rows from both tables are included in the result set, withNULL
values filling in for unmatched columns. If no rows match, the result should still include all rows from both tables, paired withNULL
values.Constant Expression Evaluation:
The expression1 IS TRUE
is logically equivalent to1 = 1
in SQLite becauseTRUE
is an alias for the integer1
. Thus, this expression should always evaluate toTRUE
, producing a non-empty result when selected.Query Structure:
The problematic query involves a chain of joins:SELECT 1 IS TRUE FROM t3 LEFT OUTER JOIN t1 INNER JOIN t0 ON (t1.c0 IN (t3.c1)) FULL OUTER JOIN t2
The
FULL OUTER JOIN
here complicates the query planner’s ability to correctly determine which rows to include, especially when theSELECT
clause does not reference any columns from the joined tables.Impact of the Unreleased Version:
The issue arises in an unreleased version of SQLite (3.39.0), where the introduction ofFULL OUTER JOIN
support may have introduced regressions in the query optimizer’s handling of join reordering and constant propagation.
Root Causes: Query Optimization, Join Reordering, and Type Handling
Possible Causes
Incorrect Join Reordering by the Query Optimizer:
SQLite’s query optimizer attempts to reorder joins to improve performance. However, whenFULL OUTER JOIN
is involved, reordering joins can violate the logical semantics of the query. In the example, the optimizer may incorrectly transform the query into a form that omits rows due to an invalid assumption about the relationship between tables.For instance, the optimizer might rewrite the original query as:
SELECT 1 FROM t1 JOIN t3 ON (b2 IN (a1)) CROSS JOIN t2 FULL JOIN t4 ON true
This transformation is invalid because the
ON (b2 IN (a1))
clause references columns fromt2
, which is not yet in scope whent3
is being joined.Interaction Between
FULL OUTER JOIN
and Constant Propagation:
When theSELECT
clause contains only constant expressions (e.g.,1 IS TRUE
), the optimizer may erroneously conclude that no rows need to be fetched from the tables, leading to an empty result. This is a failure to account for theFULL OUTER JOIN
’s requirement to generate rows even when no matching data exists.Type Handling and Boolean Logic:
While SQLite does not have a dedicatedBOOLEAN
type (it uses integers1
and0
forTRUE
andFALSE
), theIS
operator performs strict type checking. However, in this case, the issue is not related to type mismatches but rather to the optimizer’s incorrect elimination of rows.Index Usage and Analyzer Statistics:
The presence of the indexi0 ON t0(c0) WHERE c1 ISNULL
and theANALYZE
command may influence the optimizer’s plan. If the analyzer statistics incorrectly suggest that certain tables are empty or that joins can be short-circuited, this could contribute to the empty result.
Diagnosis and Resolution: Addressing Optimization Flaws and Join Semantics
Troubleshooting Steps, Solutions & Fixes
Step 1: Validate the Query Structure
Before diving into optimization issues, confirm that the query is logically correct.
Simplify the Query:
Remove theFULL OUTER JOIN
and test theSELECT 1 IS TRUE
expression in isolation:SELECT 1 IS TRUE FROM t3 LEFT OUTER JOIN t1 INNER JOIN t0 ON (t1.c0 IN (t3.c1));
If this returns a result, the problem lies specifically with the
FULL OUTER JOIN
interaction.Test with Explicit Join Conditions:
Replace implicit joins with explicitJOIN
clauses to eliminate ambiguity:SELECT 1 IS TRUE FROM t3 LEFT JOIN t1 ON true INNER JOIN t0 ON t1.c0 IN (t3.c1) FULL JOIN t2 ON true;
This ensures that all join conditions are explicitly defined.
Step 2: Investigate Query Optimization and Execution Plans
Use SQLite’s EXPLAIN
command to analyze the query plan:
Generate the Execution Plan:
EXPLAIN QUERY PLAN SELECT 1 IS TRUE FROM t3 LEFT OUTER JOIN t1 INNER JOIN t0 ON (t1.c0 IN (t3.c1)) FULL OUTER JOIN t2;
Interpret the Output:
Look for signs of incorrect join reordering, such as aCROSS JOIN
where aFULL OUTER JOIN
was intended. The presence ofCROSS JOIN
in the plan indicates that the optimizer has rewritten the query in a way that violates the original semantics.
Step 3: Workarounds and Immediate Fixes
Disable Specific Optimizations:
Use SQLite’s pragmas to temporarily disable optimizations:PRAGMA optimize = 0; -- Disable all optimizations
Rerun the query to see if the result is correct. If so, re-enable optimizations one by one to isolate the problematic one.
Force Join Order:
UseCROSS JOIN
orLEFT JOIN
to enforce the join order:SELECT 1 IS TRUE FROM t3 CROSS JOIN t1 LEFT JOIN t0 ON t1.c0 IN (t3.c1) FULL JOIN t2 ON true;
Use Subqueries or CTEs:
Break the query into subqueries to prevent the optimizer from reordering joins:WITH subq AS ( SELECT * FROM t3 LEFT JOIN t1 ON true INNER JOIN t0 ON t1.c0 IN (t3.c1) ) SELECT 1 IS TRUE FROM subq FULL JOIN t2 ON true;
Step 4: Apply Long-Term Solutions
Upgrade to a Fixed Version:
The discussion mentions that the issue was resolved in a specific check-in (238453ffab0ba1bd
). Ensure you are using a version of SQLite that includes this fix.Modify the Query to Avoid Edge Cases:
If upgrading is not feasible, rewrite the query to avoid relying on the optimizer’s handling ofFULL OUTER JOIN
:SELECT 1 IS TRUE FROM (SELECT * FROM t3 LEFT JOIN t1 ON true) AS a INNER JOIN t0 ON a.c0 IN (a.c1) FULL JOIN t2 ON true;
Report the Issue:
If the problem persists in newer releases, report it to the SQLite team with a minimal reproducible example.
Step 5: Cross-Database Validation
Compare the query’s behavior with other databases (e.g., PostgreSQL) to confirm the expected result:
-- PostgreSQL-compatible syntax
SELECT 1 IS TRUE
FROM t3
LEFT JOIN t1 ON true
INNER JOIN t0 ON t1.c0 = ANY(t3.c1)
FULL JOIN t2 ON true;
If PostgreSQL returns a row, this confirms that SQLite’s behavior is incorrect.
Step 6: Understand Boolean Handling in SQLite
Reinforce the understanding that 1 IS TRUE
is valid in SQLite:
SELECT 1 IS TRUE; -- Returns 1 (TRUE)
This confirms that the empty result is not due to type mismatches but optimization flaws.
Final Resolution
The root cause is a query optimizer flaw in SQLite’s handling of FULL OUTER JOIN
when combined with other joins and constant expressions. The fix involves correcting the optimizer’s join reordering logic to respect the semantics of FULL OUTER JOIN
. Developers should upgrade to a version containing the fix or use workarounds like explicit join ordering until the fix is available in a stable release.