Unexpected Empty Result with FULL OUTER JOIN and Constant Expression Evaluation in SQLite 3.39.0


Understanding the Core Problem: FULL OUTER JOIN and Constant Expression Evaluation

Issue Overview
The primary issue revolves around a SQLite query that combines a FULL OUTER JOIN with a constant expression (1 IS TRUE). The first query in the example returns a row containing data from all joined tables, while the second query, which replaces the SELECT * with SELECT 1 IS TRUE, returns an empty result set. This discrepancy is unexpected because the expression 1 IS TRUE should always evaluate to TRUE (i.e., 1), and the FULL OUTER JOIN should ensure at least one row is returned. The problem occurs in SQLite version 3.39.0 (unreleased at the time of the discussion), and it exposes a subtle interaction between query optimization logic, join reordering, and the handling of constant expressions in the presence of FULL OUTER JOIN.

Key Observations

  1. Behavior of FULL OUTER JOIN:
    A FULL OUTER JOIN combines the results of both LEFT JOIN and RIGHT JOIN, ensuring that all rows from both tables are included in the result set, with NULL values filling in for unmatched columns. If no rows match, the result should still include all rows from both tables, paired with NULL values.

  2. Constant Expression Evaluation:
    The expression 1 IS TRUE is logically equivalent to 1 = 1 in SQLite because TRUE is an alias for the integer 1. Thus, this expression should always evaluate to TRUE, producing a non-empty result when selected.

  3. Query Structure:
    The problematic query involves a chain of joins:

    SELECT 1 IS TRUE 
    FROM t3 
      LEFT OUTER JOIN t1 
      INNER JOIN t0 ON (t1.c0 IN (t3.c1)) 
      FULL OUTER JOIN t2 
    

    The FULL OUTER JOIN here complicates the query planner’s ability to correctly determine which rows to include, especially when the SELECT clause does not reference any columns from the joined tables.

  4. Impact of the Unreleased Version:
    The issue arises in an unreleased version of SQLite (3.39.0), where the introduction of FULL OUTER JOIN support may have introduced regressions in the query optimizer’s handling of join reordering and constant propagation.


Root Causes: Query Optimization, Join Reordering, and Type Handling

Possible Causes

  1. Incorrect Join Reordering by the Query Optimizer:
    SQLite’s query optimizer attempts to reorder joins to improve performance. However, when FULL OUTER JOIN is involved, reordering joins can violate the logical semantics of the query. In the example, the optimizer may incorrectly transform the query into a form that omits rows due to an invalid assumption about the relationship between tables.

    For instance, the optimizer might rewrite the original query as:

    SELECT 1 
    FROM t1 
      JOIN t3 ON (b2 IN (a1)) 
      CROSS JOIN t2 
      FULL JOIN t4 ON true
    

    This transformation is invalid because the ON (b2 IN (a1)) clause references columns from t2, which is not yet in scope when t3 is being joined.

  2. Interaction Between FULL OUTER JOIN and Constant Propagation:
    When the SELECT clause contains only constant expressions (e.g., 1 IS TRUE), the optimizer may erroneously conclude that no rows need to be fetched from the tables, leading to an empty result. This is a failure to account for the FULL OUTER JOIN’s requirement to generate rows even when no matching data exists.

  3. Type Handling and Boolean Logic:
    While SQLite does not have a dedicated BOOLEAN type (it uses integers 1 and 0 for TRUE and FALSE), the IS operator performs strict type checking. However, in this case, the issue is not related to type mismatches but rather to the optimizer’s incorrect elimination of rows.

  4. Index Usage and Analyzer Statistics:
    The presence of the index i0 ON t0(c0) WHERE c1 ISNULL and the ANALYZE command may influence the optimizer’s plan. If the analyzer statistics incorrectly suggest that certain tables are empty or that joins can be short-circuited, this could contribute to the empty result.


Diagnosis and Resolution: Addressing Optimization Flaws and Join Semantics

Troubleshooting Steps, Solutions & Fixes

Step 1: Validate the Query Structure

Before diving into optimization issues, confirm that the query is logically correct.

  1. Simplify the Query:
    Remove the FULL OUTER JOIN and test the SELECT 1 IS TRUE expression in isolation:

    SELECT 1 IS TRUE FROM t3 
      LEFT OUTER JOIN t1 
      INNER JOIN t0 ON (t1.c0 IN (t3.c1));
    

    If this returns a result, the problem lies specifically with the FULL OUTER JOIN interaction.

  2. Test with Explicit Join Conditions:
    Replace implicit joins with explicit JOIN clauses to eliminate ambiguity:

    SELECT 1 IS TRUE 
    FROM t3 
      LEFT JOIN t1 ON true 
      INNER JOIN t0 ON t1.c0 IN (t3.c1) 
      FULL JOIN t2 ON true;
    

    This ensures that all join conditions are explicitly defined.

Step 2: Investigate Query Optimization and Execution Plans

Use SQLite’s EXPLAIN command to analyze the query plan:

  1. Generate the Execution Plan:

    EXPLAIN QUERY PLAN 
    SELECT 1 IS TRUE 
    FROM t3 
      LEFT OUTER JOIN t1 
      INNER JOIN t0 ON (t1.c0 IN (t3.c1)) 
      FULL OUTER JOIN t2;
    
  2. Interpret the Output:
    Look for signs of incorrect join reordering, such as a CROSS JOIN where a FULL OUTER JOIN was intended. The presence of CROSS JOIN in the plan indicates that the optimizer has rewritten the query in a way that violates the original semantics.

Step 3: Workarounds and Immediate Fixes

  1. Disable Specific Optimizations:
    Use SQLite’s pragmas to temporarily disable optimizations:

    PRAGMA optimize = 0;  -- Disable all optimizations
    

    Rerun the query to see if the result is correct. If so, re-enable optimizations one by one to isolate the problematic one.

  2. Force Join Order:
    Use CROSS JOIN or LEFT JOIN to enforce the join order:

    SELECT 1 IS TRUE 
    FROM t3 
      CROSS JOIN t1 
      LEFT JOIN t0 ON t1.c0 IN (t3.c1) 
      FULL JOIN t2 ON true;
    
  3. Use Subqueries or CTEs:
    Break the query into subqueries to prevent the optimizer from reordering joins:

    WITH subq AS (
      SELECT * 
      FROM t3 
        LEFT JOIN t1 ON true 
        INNER JOIN t0 ON t1.c0 IN (t3.c1)
    )
    SELECT 1 IS TRUE 
    FROM subq 
      FULL JOIN t2 ON true;
    

Step 4: Apply Long-Term Solutions

  1. Upgrade to a Fixed Version:
    The discussion mentions that the issue was resolved in a specific check-in (238453ffab0ba1bd). Ensure you are using a version of SQLite that includes this fix.

  2. Modify the Query to Avoid Edge Cases:
    If upgrading is not feasible, rewrite the query to avoid relying on the optimizer’s handling of FULL OUTER JOIN:

    SELECT 1 IS TRUE 
    FROM (SELECT * FROM t3 LEFT JOIN t1 ON true) AS a 
      INNER JOIN t0 ON a.c0 IN (a.c1) 
      FULL JOIN t2 ON true;
    
  3. Report the Issue:
    If the problem persists in newer releases, report it to the SQLite team with a minimal reproducible example.

Step 5: Cross-Database Validation

Compare the query’s behavior with other databases (e.g., PostgreSQL) to confirm the expected result:

-- PostgreSQL-compatible syntax
SELECT 1 IS TRUE 
FROM t3 
  LEFT JOIN t1 ON true 
  INNER JOIN t0 ON t1.c0 = ANY(t3.c1) 
  FULL JOIN t2 ON true;

If PostgreSQL returns a row, this confirms that SQLite’s behavior is incorrect.

Step 6: Understand Boolean Handling in SQLite

Reinforce the understanding that 1 IS TRUE is valid in SQLite:

SELECT 1 IS TRUE;  -- Returns 1 (TRUE)

This confirms that the empty result is not due to type mismatches but optimization flaws.


Final Resolution
The root cause is a query optimizer flaw in SQLite’s handling of FULL OUTER JOIN when combined with other joins and constant expressions. The fix involves correcting the optimizer’s join reordering logic to respect the semantics of FULL OUTER JOIN. Developers should upgrade to a version containing the fix or use workarounds like explicit join ordering until the fix is available in a stable release.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *