RIGHT JOIN ON Clause Incorrectly Propagates WHERE Constants in SQLite 3.39.0
Unexpected Query Results Due to Constant Propagation in RIGHT JOIN ON Clauses
The core issue revolves around SQLite version 3.39.0 (and earlier unreleased versions) incorrectly propagating constants from the WHERE
clause into the ON
conditions of RIGHT OUTER JOIN
operations. This leads to inconsistent query results when joining virtual or regular tables. The problem manifests in two distinct scenarios:
Empty Result vs. Non-Empty Result Discrepancy:
ASELECT
statement with aRIGHT OUTER JOIN
returns no rows when theWHERE
clause is omitted but returns a valid row when aWHERE
condition referencing a column from the right table is added. This contradicts standard SQL behavior, where theWHERE
clause should filter results after joins are processed.Incorrect Constant Propagation in ON Clauses:
SQLite’s query optimizer incorrectly substitutes values from theWHERE
clause into theON
conditions ofRIGHT JOIN
operations. This substitution alters the join logic and produces invalid comparisons, especially when virtual tables (e.g.,rtree_i32
) are involved.
Example Scenario:
Consider a virtual table rt0
with columns c1
and c2
, and two empty tables t0
and t1
:
CREATE VIRTUAL TABLE rt0 USING rtree_i32(c0, c1, c2);
CREATE TABLE t0 (c0);
CREATE TABLE t1 (c0);
INSERT INTO rt0 VALUES (0, 0, 0);
The first query returns an empty result:
SELECT * FROM t1
JOIN t0 ON rt0.c1 > t0.c0
RIGHT OUTER JOIN rt0; -- {} Empty result
The second query, which adds a WHERE rt0.c1 = 0
clause, incorrectly returns a row:
SELECT * FROM t1
JOIN t0 ON rt0.c1 > t0.c0
RIGHT OUTER JOIN rt0
WHERE rt0.c1 = 0; -- ||0|0|0
The WHERE
clause’s constant (0
) is improperly pushed into the ON
clause of the RIGHT JOIN
, changing the join condition from rt0.c1 > t0.c0
to 0 > t0.c0
. Since t0
is empty, this comparison becomes irrelevant, and the join erroneously returns a row from rt0
.
Root Causes of Constant Propagation and Join Misbehavior
1. Optimizer Over-Aggressiveness in Constant Substitution
SQLite’s query optimizer attempts to simplify expressions by substituting known constants early in the query compilation process. This is normally beneficial for performance but becomes problematic when constants from the WHERE
clause are propagated into ON
conditions of RIGHT JOIN
operations. The substitution occurs before the join is processed, leading to invalid comparisons.
Why This Affects RIGHT JOINs Specifically:
In a RIGHT OUTER JOIN
, the right table’s rows are preserved regardless of whether they match the left side. The optimizer’s substitution logic fails to account for the fact that WHERE
clauses referencing the right table’s columns should not influence the ON
conditions of earlier joins. This violates the SQL standard’s scoping rules for joins and filters.
2. Interaction with Virtual Tables
Virtual tables like rtree_i32
(used for spatial indexing) have custom query optimization logic. The bug is more pronounced in such tables due to their internal handling of constraints. When a virtual table is the right operand of a RIGHT JOIN
, the optimizer’s constant propagation interacts with the virtual table’s internal query planner, causing invalid substitutions in the join conditions.
Case Study:
In the example above, the rt0.c1 = 0
condition in the WHERE
clause is pushed into the ON
clause of the RIGHT JOIN
. The virtual table’s internal logic may treat this as a direct constraint, bypassing the intended join logic. This results in the ON
condition rt0.c1 > t0.c0
being rewritten as 0 > t0.c0
, which is a nonsensical comparison when t0
is empty.
3. Incomplete Fix for Prior Bug
This issue is a regression caused by an incomplete fix for a previous bug (forum post 8e4c352937e82929). The original fix addressed constant propagation in simpler join scenarios but failed to account for edge cases involving RIGHT JOIN
, virtual tables, or multi-level joins. The revised fix in check-in fb0a23b6789da8e9 resolves this by restricting constant propagation in RIGHT JOIN
contexts.
Resolving the Constant Propagation Bug in RIGHT JOINs
Step 1: Verify the SQLite Version and Apply the Fix
- Check Current Version:
RunSELECT sqlite_version();
to confirm the SQLite version. If the result is3.39.0
or earlier, the bug is present. - Update to a Fixed Version:
The fix was introduced in check-infb0a23b6789da8e9
. Obtain the latest trunk version of SQLite from the official repository or wait for an official release containing the fix.
Step 2: Rewrite Queries to Avoid Constant Propagation
If upgrading is not feasible, modify queries to prevent the optimizer from propagating WHERE
constants into ON
clauses:
Workaround 1: Use the Unary + Operator
Add a no-op unary +
operator to the WHERE
clause to prevent constant substitution:
-- Original (buggy):
SELECT * FROM t1
JOIN t0 ON rt0.c1 > t0.c0
RIGHT OUTER JOIN rt0
WHERE rt0.c1 = 0;
-- Fixed:
SELECT * FROM t1
JOIN t0 ON rt0.c1 > t0.c0
RIGHT OUTER JOIN rt0
WHERE +rt0.c1 = 0; -- Unary + blocks propagation
Workaround 2: Restructure Joins
Avoid RIGHT JOIN
syntax, which is less common in SQLite. Use LEFT JOIN
with reordered tables:
-- Original:
SELECT * FROM t1
JOIN t0 ON rt0.c1 > t0.c0
RIGHT OUTER JOIN rt0
WHERE rt0.c1 = 0;
-- Restructured with LEFT JOIN:
SELECT * FROM rt0
LEFT JOIN (t1 JOIN t0 ON rt0.c1 > t0.c0)
WHERE rt0.c1 = 0;
Step 3: Validate Query Logic with Test Cases
Use simplified test cases to confirm the fix:
Test Case 1 (Regular Tables):
CREATE TABLE t1(a INT);
CREATE TABLE t2(b INT);
CREATE TABLE t3(c INTEGER PRIMARY KEY, d INT);
INSERT INTO t3 VALUES(0, 0);
-- Both queries should return no rows:
SELECT * FROM t1 JOIN t2 ON d > b RIGHT JOIN t3 ON true WHERE +d = 0;
SELECT * FROM t1 JOIN t2 ON d > b RIGHT JOIN t3 ON true WHERE d = 0;
Test Case 2 (Virtual Tables):
CREATE VIRTUAL TABLE rt0 USING rtree_i32(c0, c1, c2);
INSERT INTO rt0 VALUES(0, 0, 0);
-- Should return no rows (t1 and t0 are empty):
SELECT * FROM t1
JOIN t0 ON rt0.c1 > t0.c0
RIGHT OUTER JOIN rt0
WHERE +rt0.c1 = 0;
Step 4: Monitor Query Performance and Plan
After applying fixes, use EXPLAIN QUERY PLAN
to verify that constants are not propagated into ON
clauses:
EXPLAIN QUERY PLAN
SELECT * FROM t1
JOIN t0 ON rt0.c1 > t0.c0
RIGHT OUTER JOIN rt0
WHERE rt0.c1 = 0;
The output should show that the WHERE
clause is applied after the join, not merged into the ON
condition.
Key Takeaways
- Constant Propagation Limitations: SQLite’s optimizer aggressively substitutes constants but does not always respect join scoping rules. This can corrupt
RIGHT JOIN
logic. - Virtual Table Considerations: Custom virtual table implementations may exacerbate optimizer bugs due to their unique constraint handling.
- Defensive Query Design: Use the unary
+
operator or avoidRIGHT JOIN
syntax in favor ofLEFT JOIN
to mitigate similar issues.