Regression in SQLite 3.38.0: LEFT JOIN with UNION ALL View Fails Due to Bloom Filter Optimization
Bloom Filter Optimization Interferes with LEFT JOIN Semantics in Complex View-Based Queries
Issue Overview: Query Involving UNION ALL View and LEFT JOIN Returns No Rows in SQLite ≥3.38.0
The core issue manifests when executing a complex query involving a view created with UNION ALL across multiple tables, joined with other tables through INNER JOINs and a LEFT JOIN. In SQLite versions prior to 3.38.0, this query returns expected results matching specific coordinate transformation criteria. Starting with version 3.38.0, the same query returns zero rows despite identical schema and data. The regression stems from changes in query planner optimizations related to Bloom filters introduced in commit a11c5e22b5eb32cdf2061a6c19fd7bf4d2a36e05.
Key components of the failing scenario include:
- View Definition: A
coordinate_operation_view
combininggrid_transformation
andother_transformation
tables via UNION ALL - Join Structure: Two self-joins of the view connected through CRS (Coordinate Reference System) code relationships
- LEFT JOIN Condition: A supersession check that should allow non-matching rows but incorrectly filters them
- Filtering Criteria: Multiple equality conditions on CRS codes and spatial extent checks via joined tables
- Ordering Clause: An ORDER BY based on summed accuracy values from both transformation steps
The query’s semantic intent is to find valid transformation chains between coordinate systems while considering possible supersessions and spatial extents. The UNION ALL view provides a unified interface to different transformation types, while the LEFT JOIN on supersession
allows for optional deprecation information. The regression causes premature filtering of rows that should survive the LEFT JOIN due to Bloom filter misapplication.
Critical observations from the failure pattern:
- Disabling the ORDER BY clause restores results in 3.38.0
- Moving the LEFT JOIN position in the FROM clause resolves the issue
- ANALYZE statistics influence but don’t prevent the regression
- The problem persists in subsequent versions up to latest trunk
Underlying technical factors include:
- Bloom filter optimizations now apply to RIGHT tables of LEFT JOINs
- UNION ALL view materialization interacts with join reordering
- Predicate pushdown through view layers alters null-handling
- Index selection heuristics changed by new optimization flags
Query Planner Misapplies Bloom Filters to LEFT JOIN Right-Hand Side
Possible Causes: Bloom Filter Optimization Disrupts LEFT JOIN Null Preservation
The regression root cause lies in enhanced Bloom filter optimizations for LEFT JOINs introduced in SQLite 3.38.0. Bloom filters are probabilistic data structures used to quickly eliminate impossible join candidates early in query execution. The problematic commit (a11c5e22b5eb32cdf2061a6c19fd7bf4d2a36e05) extended Bloom filter usage to the right-hand side of LEFT JOINs, creating these scenarios:
Premature Row Elimination
Bloom filters built from WHERE clause predicates may incorrectly filter rows from the RIGHT table of a LEFT JOIN before null-extension occurs. This violates SQL standard LEFT JOIN semantics requiring preservation of LEFT table rows regardless of RIGHT table matches.View Materialization Timing
The UNION ALL view’s materialization interacts with Bloom filter application timing. When the query planner pushes Bloom filters below the view layer, they may filter source tables individually rather than considering the unified view output.Join Order Sensitivity
The original query’s join order places the LEFT JOIN earlier than subsequent INNER JOINs. The Bloom filter optimization reorders tables based on selectivity estimates, causing the LEFT JOIN’s right-hand table to be processed with Bloom filters before null-preservation occurs.Index Selection Conflicts
Statistics from ANALYZE (sqlite_stat1 entries) suggest index usage patterns that conflict with Bloom-filtered access paths. Thesupersession_idx
index on (superseded_table_name, superseded_auth_name, superseded_code) becomes entangled in filter pushdown decisions.Expression Reassociation Effects
Complex ON clauses combining multiple equality conditions and boolean logic (e.g.,ss1.same_source_target_crs = 1
) interact with expression tree normalization changes in the optimizer, leading to incorrect null-handling during Bloom filter construction.ORDER BY Interaction
The presence of ORDER BY accuracy1 + accuracy2 influences the query planner’s choice of temporary indexes and sorting methods. This interacts with Bloom filter placement by altering the sequence of data materialization steps.
Specific manifestations in the test case:
- The
supersession
LEFT JOIN’s ON clause contains a self-join condition (ss1.superseded_table_name = ss1.replacement_table_name
) that becomes entangled with Bloom filter predicates - UNION ALL view columns used in JOIN conditions (target_crs_auth_name, source_crs_code) lose null-awareness when pushed into Bloom filters
- Implicit type conversions between INTEGER and TEXT codes (e.g., ‘8675’ vs 8675) create subtle mismatches in Bloom filter hash calculations
Resolving LEFT JOIN Row Loss Through Query Rewrites and Optimization Control
Troubleshooting Steps, Solutions & Fixes: Restoring Correct LEFT JOIN Behavior in SQLite 3.38.0+
Immediate Workarounds
Disable Bloom Filter Pull-Down Optimization
Execute at session start:SELECT sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0x100000);
This disables the specific Bloom filter pull-down optimization (value 0x100000 corresponds to SQLITE_BloomPulldown) while keeping other optimizations active.
Reorder JOIN Clauses
Move the LEFT JOIN to after all INNER JOINs:FROM coordinate_operation_view v1 JOIN coordinate_operation_view v2 ON ... /* All INNER JOINs first */ LEFT JOIN supersession ss1 ON ... /* Moved to end */
This ensures Bloom filters applied during INNER JOIN processing don’t interfere with LEFT JOIN null preservation.
Materialize Intermediate Results
Use CTEs or temporary tables to isolate the LEFT JOIN from Bloom filter interference:WITH base_query AS ( SELECT ... /* Original joins without LEFT JOIN */ ) SELECT base_query.*, ss1.* FROM base_query LEFT JOIN supersession ss1 ON ...
Permanent Fixes
Apply Official Patch
Modify SQLite’s WHERE clause processing to skip Bloom filter pull-down for LEFT JOIN right-hand tables:/* In src/where.c */ while( ++iLevel < pWInfo->nLevel ){ const SrcItem *pTabItem; pLevel = &pWInfo->a[iLevel]; pLoop = pLevel->pWLoop; if( NEVER(pLoop==0) ) continue; if( pLoop->prereq & notReady ) continue; pTabItem = &pWInfo->pTabList->a[pLevel->iFrom]; if( pTabItem->fg.jointype & JT_LEFT ) continue; /* Added check */ /* Rest of Bloom filter code */ }
Query Structure Modifications
- Explicit Type Casting
Ensure code comparisons use consistent types:AND v1.source_crs_code = CAST('8675' AS INTEGER)
- Volatile Function Barrier
Prevent predicate pushdown with a no-op function:AND 1 = randomblob(1) /* Prevent Bloom filter formation */
- Redundant Condition Duplication
Repeat critical WHERE clauses in JOIN ON conditions:JOIN extent a1 ON ... AND south_lat1 != 0
- Explicit Type Casting
Diagnostic Techniques
EXPLAIN QUERY PLAN Analysis
Compare outputs between SQLite versions:EXPLAIN QUERY PLAN SELECT ... /* Original query */;
Look for differences in:
USING INDEX
directives on supersession_idxSCAN
vsSEARCH
operations on UNION ALL view components- Temporary b-tree usage for ORDER BY
Optimization Tracing
Enable diagnostic output:EXPLAIN SELECT ... /* Original query */; PRAGMA optimizer_trace = 1; SELECT ...; SELECT * FROM sqlite_optimizer_trace;
Search trace for
bloom-filter
mentions and join order decisions.Constraint Propagation Testing
Isolate Bloom filter impact by:SELECT * FROM supersession WHERE /* LEFT JOIN conditions */ /* With and without Bloom filter enabled */
Preventative Measures
Query Pattern Guidelines
- Avoid mixing LEFT JOINs with UNION ALL views in complex joins
- Place LEFT JOINs after INNER JOINs in FROM clauses
- Use explicit COALESCE() on LEFT JOIN columns in WHERE clauses
Version-Specific Optimization Control
PRAGMA compile_options; /* Check for ENABLE_BLOOM_FILTER_PULLDOWN */ PRAGMA query_only = 1; /* Disables some optimizations */
Statistical Maintenance
Refresh ANALYZE data after schema changes:ANALYZE sqlite_master; /* Rebuild all statistics */ ANALYZE supersession; /* Focus on problem table */
Deep Technical Analysis
The Bloom filter optimization conflict arises from these sequence of operations:
Query Parsing Phase
The UNION ALL view is flattened into outer query joins, creating a compound subquery spanninggrid_transformation
andother_transformation
tables.Predicate Pushdown
WHERE clause conditions (v1.source_crs_auth_name = ‘EPSG’) are pushed into the view’s constituent SELECT statements, creating separate Bloom filters for each UNION ALL branch.Join Order Selection
The query planner reorders tables to process most selective joins first. With Bloom filters enabled,supersession
table access is moved earlier in the sequence.Bloom Filter Construction
A filter is built fromusage.u1
andextent.a1
join conditions, then applied tosupersession
columns before null-extension occurs in LEFT JOIN processing.Null Handling Breakdown
Columns fromsupersession
needed for null-checking (ss1.*) are filtered out by Bloom predicates before the LEFT JOIN completes, violating SQL standard semantics.
The official patch corrects this by adding jointype checks during Bloom filter application:
if( pTabItem->fg.jointype & JT_LEFT ) continue;
This skips Bloom filter formation for any table participating in a LEFT JOIN’s right side, preserving the null-extension behavior required by SQL standards.
Performance Considerations
While disabling Bloom filters restores correctness, it may impact query speed on large datasets. Mitigation strategies include:
Partial Indexing
Create filtered indexes matching common query patterns:CREATE INDEX supersession_active_idx ON supersession(superseded_table_name, superseded_auth_name, superseded_code) WHERE same_source_target_crs = 1;
Materialized Views
Convert the UNION ALL view to a temporary table with periodic refreshes:CREATE TEMP TABLE coord_op_materialized AS SELECT * FROM coordinate_operation_view;
Compound Index Optimization
Combine frequently joined columns in covering indexes:CREATE INDEX usage_extent_idx ON usage( object_table_name, object_auth_name, object_code, extent_auth_name, extent_code );
Cross-Version Compatibility
For codebases requiring SQLite version flexibility:
Runtime Version Checks
Adjust query structure based on detected SQLite version:SELECT sqlite_version() >= '3.38.0' AS is_new; /* Branch query logic accordingly */
Defensive Query Construction
Use query patterns robust to optimizer changes:SELECT ... FROM (SELECT * FROM coordinate_operation_view) v1 /* Prevent view flattening */ JOIN ...
Isolation through Common Table Expressions
Force materialization points with CTEs:WITH v1 AS MATERIALIZED ( SELECT * FROM coordinate_operation_view ) SELECT ... FROM v1 ...;
Conclusion
This regression highlights the delicate balance between SQLite’s advanced optimizations and complex query semantics. By understanding Bloom filter interactions with LEFT JOIN null preservation, developers can employ targeted workarounds while awaiting official patches. The solutions presented offer both immediate mitigation and long-term prevention strategies, ensuring correct query results across SQLite versions while maintaining performance efficiency.