Regression in SQLite 3.38.0: LEFT JOIN with UNION ALL View Fails Due to Bloom Filter Optimization

Bloom Filter Optimization Interferes with LEFT JOIN Semantics in Complex View-Based Queries

Issue Overview: Query Involving UNION ALL View and LEFT JOIN Returns No Rows in SQLite ≥3.38.0

The core issue manifests when executing a complex query involving a view created with UNION ALL across multiple tables, joined with other tables through INNER JOINs and a LEFT JOIN. In SQLite versions prior to 3.38.0, this query returns expected results matching specific coordinate transformation criteria. Starting with version 3.38.0, the same query returns zero rows despite identical schema and data. The regression stems from changes in query planner optimizations related to Bloom filters introduced in commit a11c5e22b5eb32cdf2061a6c19fd7bf4d2a36e05.

Key components of the failing scenario include:

  1. View Definition: A coordinate_operation_view combining grid_transformation and other_transformation tables via UNION ALL
  2. Join Structure: Two self-joins of the view connected through CRS (Coordinate Reference System) code relationships
  3. LEFT JOIN Condition: A supersession check that should allow non-matching rows but incorrectly filters them
  4. Filtering Criteria: Multiple equality conditions on CRS codes and spatial extent checks via joined tables
  5. Ordering Clause: An ORDER BY based on summed accuracy values from both transformation steps

The query’s semantic intent is to find valid transformation chains between coordinate systems while considering possible supersessions and spatial extents. The UNION ALL view provides a unified interface to different transformation types, while the LEFT JOIN on supersession allows for optional deprecation information. The regression causes premature filtering of rows that should survive the LEFT JOIN due to Bloom filter misapplication.

Critical observations from the failure pattern:

  • Disabling the ORDER BY clause restores results in 3.38.0
  • Moving the LEFT JOIN position in the FROM clause resolves the issue
  • ANALYZE statistics influence but don’t prevent the regression
  • The problem persists in subsequent versions up to latest trunk

Underlying technical factors include:

  • Bloom filter optimizations now apply to RIGHT tables of LEFT JOINs
  • UNION ALL view materialization interacts with join reordering
  • Predicate pushdown through view layers alters null-handling
  • Index selection heuristics changed by new optimization flags

Query Planner Misapplies Bloom Filters to LEFT JOIN Right-Hand Side

Possible Causes: Bloom Filter Optimization Disrupts LEFT JOIN Null Preservation

The regression root cause lies in enhanced Bloom filter optimizations for LEFT JOINs introduced in SQLite 3.38.0. Bloom filters are probabilistic data structures used to quickly eliminate impossible join candidates early in query execution. The problematic commit (a11c5e22b5eb32cdf2061a6c19fd7bf4d2a36e05) extended Bloom filter usage to the right-hand side of LEFT JOINs, creating these scenarios:

  1. Premature Row Elimination
    Bloom filters built from WHERE clause predicates may incorrectly filter rows from the RIGHT table of a LEFT JOIN before null-extension occurs. This violates SQL standard LEFT JOIN semantics requiring preservation of LEFT table rows regardless of RIGHT table matches.

  2. View Materialization Timing
    The UNION ALL view’s materialization interacts with Bloom filter application timing. When the query planner pushes Bloom filters below the view layer, they may filter source tables individually rather than considering the unified view output.

  3. Join Order Sensitivity
    The original query’s join order places the LEFT JOIN earlier than subsequent INNER JOINs. The Bloom filter optimization reorders tables based on selectivity estimates, causing the LEFT JOIN’s right-hand table to be processed with Bloom filters before null-preservation occurs.

  4. Index Selection Conflicts
    Statistics from ANALYZE (sqlite_stat1 entries) suggest index usage patterns that conflict with Bloom-filtered access paths. The supersession_idx index on (superseded_table_name, superseded_auth_name, superseded_code) becomes entangled in filter pushdown decisions.

  5. Expression Reassociation Effects
    Complex ON clauses combining multiple equality conditions and boolean logic (e.g., ss1.same_source_target_crs = 1) interact with expression tree normalization changes in the optimizer, leading to incorrect null-handling during Bloom filter construction.

  6. ORDER BY Interaction
    The presence of ORDER BY accuracy1 + accuracy2 influences the query planner’s choice of temporary indexes and sorting methods. This interacts with Bloom filter placement by altering the sequence of data materialization steps.

Specific manifestations in the test case:

  • The supersession LEFT JOIN’s ON clause contains a self-join condition (ss1.superseded_table_name = ss1.replacement_table_name) that becomes entangled with Bloom filter predicates
  • UNION ALL view columns used in JOIN conditions (target_crs_auth_name, source_crs_code) lose null-awareness when pushed into Bloom filters
  • Implicit type conversions between INTEGER and TEXT codes (e.g., ‘8675’ vs 8675) create subtle mismatches in Bloom filter hash calculations

Resolving LEFT JOIN Row Loss Through Query Rewrites and Optimization Control

Troubleshooting Steps, Solutions & Fixes: Restoring Correct LEFT JOIN Behavior in SQLite 3.38.0+

Immediate Workarounds

  1. Disable Bloom Filter Pull-Down Optimization
    Execute at session start:

    SELECT sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0x100000);
    

    This disables the specific Bloom filter pull-down optimization (value 0x100000 corresponds to SQLITE_BloomPulldown) while keeping other optimizations active.

  2. Reorder JOIN Clauses
    Move the LEFT JOIN to after all INNER JOINs:

    FROM coordinate_operation_view v1
    JOIN coordinate_operation_view v2 ON ...
    /* All INNER JOINs first */
    LEFT JOIN supersession ss1 ON ... /* Moved to end */
    

    This ensures Bloom filters applied during INNER JOIN processing don’t interfere with LEFT JOIN null preservation.

  3. Materialize Intermediate Results
    Use CTEs or temporary tables to isolate the LEFT JOIN from Bloom filter interference:

    WITH base_query AS (
      SELECT ... /* Original joins without LEFT JOIN */
    )
    SELECT base_query.*, ss1.* 
    FROM base_query
    LEFT JOIN supersession ss1 ON ...
    

Permanent Fixes

  1. Apply Official Patch
    Modify SQLite’s WHERE clause processing to skip Bloom filter pull-down for LEFT JOIN right-hand tables:

    /* In src/where.c */
    while( ++iLevel < pWInfo->nLevel ){
      const SrcItem *pTabItem;
      pLevel = &pWInfo->a[iLevel];
      pLoop = pLevel->pWLoop;
      if( NEVER(pLoop==0) ) continue;
      if( pLoop->prereq & notReady ) continue;
      pTabItem = &pWInfo->pTabList->a[pLevel->iFrom];
      if( pTabItem->fg.jointype & JT_LEFT ) continue; /* Added check */
      /* Rest of Bloom filter code */
    }
    
  2. Query Structure Modifications

    • Explicit Type Casting
      Ensure code comparisons use consistent types:

      AND v1.source_crs_code = CAST('8675' AS INTEGER)
      
    • Volatile Function Barrier
      Prevent predicate pushdown with a no-op function:

      AND 1 = randomblob(1) /* Prevent Bloom filter formation */
      
    • Redundant Condition Duplication
      Repeat critical WHERE clauses in JOIN ON conditions:

      JOIN extent a1 ON ... AND south_lat1 != 0
      

Diagnostic Techniques

  1. EXPLAIN QUERY PLAN Analysis
    Compare outputs between SQLite versions:

    EXPLAIN QUERY PLAN
    SELECT ... /* Original query */;
    

    Look for differences in:

    • USING INDEX directives on supersession_idx
    • SCAN vs SEARCH operations on UNION ALL view components
    • Temporary b-tree usage for ORDER BY
  2. Optimization Tracing
    Enable diagnostic output:

    EXPLAIN
    SELECT ... /* Original query */;
    
    PRAGMA optimizer_trace = 1;
    SELECT ...;
    SELECT * FROM sqlite_optimizer_trace;
    

    Search trace for bloom-filter mentions and join order decisions.

  3. Constraint Propagation Testing
    Isolate Bloom filter impact by:

    SELECT * FROM supersession WHERE /* LEFT JOIN conditions */
    /* With and without Bloom filter enabled */
    

Preventative Measures

  1. Query Pattern Guidelines

    • Avoid mixing LEFT JOINs with UNION ALL views in complex joins
    • Place LEFT JOINs after INNER JOINs in FROM clauses
    • Use explicit COALESCE() on LEFT JOIN columns in WHERE clauses
  2. Version-Specific Optimization Control

    PRAGMA compile_options;
    /* Check for ENABLE_BLOOM_FILTER_PULLDOWN */
    PRAGMA query_only = 1; /* Disables some optimizations */
    
  3. Statistical Maintenance
    Refresh ANALYZE data after schema changes:

    ANALYZE sqlite_master; /* Rebuild all statistics */
    ANALYZE supersession;  /* Focus on problem table */
    

Deep Technical Analysis

The Bloom filter optimization conflict arises from these sequence of operations:

  1. Query Parsing Phase
    The UNION ALL view is flattened into outer query joins, creating a compound subquery spanning grid_transformation and other_transformation tables.

  2. Predicate Pushdown
    WHERE clause conditions (v1.source_crs_auth_name = ‘EPSG’) are pushed into the view’s constituent SELECT statements, creating separate Bloom filters for each UNION ALL branch.

  3. Join Order Selection
    The query planner reorders tables to process most selective joins first. With Bloom filters enabled, supersession table access is moved earlier in the sequence.

  4. Bloom Filter Construction
    A filter is built from usage.u1 and extent.a1 join conditions, then applied to supersession columns before null-extension occurs in LEFT JOIN processing.

  5. Null Handling Breakdown
    Columns from supersession needed for null-checking (ss1.*) are filtered out by Bloom predicates before the LEFT JOIN completes, violating SQL standard semantics.

The official patch corrects this by adding jointype checks during Bloom filter application:

if( pTabItem->fg.jointype & JT_LEFT ) continue;

This skips Bloom filter formation for any table participating in a LEFT JOIN’s right side, preserving the null-extension behavior required by SQL standards.

Performance Considerations

While disabling Bloom filters restores correctness, it may impact query speed on large datasets. Mitigation strategies include:

  1. Partial Indexing
    Create filtered indexes matching common query patterns:

    CREATE INDEX supersession_active_idx 
    ON supersession(superseded_table_name, superseded_auth_name, superseded_code)
    WHERE same_source_target_crs = 1;
    
  2. Materialized Views
    Convert the UNION ALL view to a temporary table with periodic refreshes:

    CREATE TEMP TABLE coord_op_materialized AS 
    SELECT * FROM coordinate_operation_view;
    
  3. Compound Index Optimization
    Combine frequently joined columns in covering indexes:

    CREATE INDEX usage_extent_idx ON usage(
      object_table_name,
      object_auth_name,
      object_code,
      extent_auth_name,
      extent_code
    );
    

Cross-Version Compatibility

For codebases requiring SQLite version flexibility:

  1. Runtime Version Checks
    Adjust query structure based on detected SQLite version:

    SELECT sqlite_version() >= '3.38.0' AS is_new;
    /* Branch query logic accordingly */
    
  2. Defensive Query Construction
    Use query patterns robust to optimizer changes:

    SELECT ... 
    FROM (SELECT * FROM coordinate_operation_view) v1 /* Prevent view flattening */
    JOIN ... 
    
  3. Isolation through Common Table Expressions
    Force materialization points with CTEs:

    WITH v1 AS MATERIALIZED (
      SELECT * FROM coordinate_operation_view
    )
    SELECT ... FROM v1 ...;
    

Conclusion

This regression highlights the delicate balance between SQLite’s advanced optimizations and complex query semantics. By understanding Bloom filter interactions with LEFT JOIN null preservation, developers can employ targeted workarounds while awaiting official patches. The solutions presented offer both immediate mitigation and long-term prevention strategies, ensuring correct query results across SQLite versions while maintaining performance efficiency.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *