Inconsistent Results with UNION ALL and Mixed Affinities in SQLite

Inconsistent Query Results Due to Affinity Conflicts in Compound SELECT Statements

Mixed Affinity Handling in UNION ALL Queries and Optimization-Dependent Behavior

The core issue arises when executing compound SELECT statements (e.g., UNION ALL) containing columns with conflicting type affinities while toggling query optimizer flags like SQLITE_QueryFlattener. SQLite may return different numbers of rows depending on whether subquery flattening optimizations are enabled. This occurs because the data type affinity of columns in compound queries becomes indeterminate when constituent subqueries return different affinities, leading to inconsistent type conversion behavior during WHERE clause evaluations.

Consider this simplified reproduction:

CREATE TABLE t0(c0 INT, c1 INT);
INSERT INTO t0 VALUES(10,10);
-- First query with optimization enabled
SELECT * FROM t0 JOIN (
  SELECT CAST(c0 AS TEXT) AS c2 FROM t0 
  UNION ALL 
  SELECT c1 FROM t0
) WHERE 10 = c2; -- Returns 2 rows
.testctrl optimizations 0x00000001; -- Disable QueryFlattener
-- Second query after optimization disabled
SELECT * FROM t0 JOIN (
  SELECT CAST(c0 AS TEXT) AS c2 FROM t0 
  UNION ALL 
  SELECT c1 FROM t0
) WHERE 10 = c2; -- Returns 1 row

The discrepancy stems from SQLite’s handling of type affinity in compound queries. When SQLITE_QueryFlattener is active, the optimizer transforms the subquery into a form that alters how column affinities are resolved. With the optimization disabled, the raw compound query structure persists, causing different type coercion rules during the WHERE 10 = c2 comparison.

Determinants of Affinity Resolution in Compound Queries

Three primary factors contribute to this behavior:

  1. Indeterminate Column Affinity in Compound SELECTs
    SQLite documentation explicitly states that when a compound query (e.g., UNION, UNION ALL, EXCEPT, INTERSECT) contains columns with differing affinities, the resulting column affinity becomes undefined. The database engine may use any constituent subquery’s affinity for type conversion decisions, and this choice can vary across SQLite versions, query executions, or even within different parts of the same query. In the example:

    • First subquery: CAST(c0 AS TEXT) creates c2 with TEXT affinity
    • Second subquery: c1 (integer column) gives c2 INTEGER affinity
      The final c2 affinity isn’t guaranteed to adopt either consistently.
  2. Optimizer-Driven Query Structure Changes
    The SQLITE_QueryFlattener optimization rewrites nested subqueries into joins where possible. When enabled:

    SELECT * FROM t0 JOIN (...subquery...) WHERE ...
    

    Becomes equivalent to:

    SELECT * FROM t0, (flattened_subquery) WHERE ...
    

    This structural change impacts how SQLite’s type resolution engine processes column affinities. Flattened queries may resolve affinities earlier in the query pipeline compared to nested compound queries, altering implicit type conversions during WHERE clause evaluation.

  3. Context-Dependent Type Conversion Rules
    SQLite performs affinity-based type conversions during comparisons using the following precedence:

    • If one operand has INTEGER affinity and the other has TEXT, numeric comparison occurs if the TEXT value is a well-formed integer
    • When comparing TEXT to INTEGER affinity columns, TEXT values not convertible to integers are treated as less than any numeric value
      In the example, WHERE 10 = c2 behaves differently depending on whether c2 is considered TEXT or INTEGER:
    • TEXT affinity: Compare string ’10’ to integer 10 → match (type conversion occurs)
    • INTEGER affinity: Compare integer 10 to integer 10 → match (no conversion needed)
      However, when c2 contains mixed affinities from a compound query, the comparison may inconsistently apply these rules across query executions.

Mitigating Affinity Conflicts and Ensuring Result Consistency

Step 1: Standardize Affinities in Compound Queries

Modify all constituent SELECT statements in compound queries to return columns with identical affinities:

-- Explicitly CAST all outputs to TEXT
SELECT * FROM t0 JOIN (
  SELECT CAST(c0 AS TEXT) AS c2 FROM t0 
  UNION ALL 
  SELECT CAST(c1 AS TEXT) FROM t0 -- Now both arms have TEXT affinity
) WHERE 10 = c2;

-- Alternative: Use INTEGER affinity throughout
SELECT * FROM t0 JOIN (
  SELECT c0 AS c2 FROM t0 -- Inherits INTEGER affinity from c0
  UNION ALL 
  SELECT c1 FROM t0 -- Also INTEGER affinity
) WHERE 10 = c2;

This eliminates affinity indeterminacy by ensuring all subquery components agree on column affinity.

Step 2: Use Explicit Type Casting in Comparisons

When modifying existing queries isn’t feasible, enforce type consistency at comparison points:

SELECT * FROM t0 JOIN (
  SELECT CAST(c0 AS TEXT) AS c2 FROM t0 
  UNION ALL 
  SELECT c1 FROM t0
) WHERE CAST(10 AS TEXT) = c2; -- Force TEXT comparison

Or:

SELECT * FROM t0 JOIN (
  SELECT CAST(c0 AS TEXT) AS c2 FROM t0 
  UNION ALL 
  SELECT c1 FROM t0
) WHERE 10 = CAST(c2 AS INTEGER); -- Force INTEGER conversion

Explicit casts make comparison rules deterministic regardless of underlying column affinity.

Step 3: Understand Optimization Flags’ Impact on Type Handling

While not recommended for production use, temporarily modifying optimization flags helps diagnose affinity-related issues:

-- Disable all optimizations for diagnostics
.testctrl optimizations 0xFFFFFFFF
-- Run query
SELECT ...;
-- Re-enable optimizations
.testctrl optimizations 0

Key flags affecting affinity resolution:

  • 0x00000001 (QueryFlattener): Disables subquery-to-join conversion
  • 0x00000004 (PushDown): Disables WHERE clause pushdown optimizations
    Test queries with different flag combinations to identify which optimization triggers behavioral changes.

Step 4: Leverage EXPLAIN to Analyze Affinity Propagation

Use EXPLAIN and EXPLAIN VIRTUAL MACHINE CODE to see how optimizations affect query structure:

EXPLAIN
SELECT * FROM t0 JOIN (
  SELECT CAST(c0 AS TEXT) AS c2 FROM t0 
  UNION ALL 
  SELECT c1 FROM t0
) WHERE 10 = c2;

Compare output before and after applying .testctrl optimizations 0x00000001 to observe:

  • Changes in subquery flattening (look for OpenEphemeral opcodes indicating temporary tables)
  • Type conversion operations (String8, ToNumeric, Affinity opcodes)
    This reveals where and how affinity conversions occur in different optimization scenarios.

Step 5: Implement Permanent Schema-Level Affinity Controls

For persistent data integrity, enforce affinities at the schema level:

-- Create view with explicit affinity
CREATE VIEW v0 AS 
SELECT CAST(c0 AS TEXT) AS c2 FROM t0 
UNION ALL 
SELECT CAST(c1 AS TEXT) FROM t0;

-- Query using affinity-stable view
SELECT * FROM t0 JOIN v0 WHERE 10 = c2;

Views with standardized affinities provide consistent type handling regardless of optimization settings.

Step 6: Monitor Affinity-Related Edge Cases

Develop automated checks for affinity mismatches in compound queries:

-- Detect mixed affinities in UNION queries
SELECT 
  type AS left_affinity,
  (SELECT type FROM pragma_table_info('subquery2') WHERE name = 'c2') AS right_affinity
FROM pragma_table_info('subquery1') 
WHERE name = 'c2';

Regularly audit queries containing UNION, JOIN, or subqueries for potential affinity conflicts.

By systematically applying these techniques, developers can eliminate optimization-dependent result inconsistencies caused by SQLite’s flexible type affinity system. The ultimate solution lies in proactively managing affinities at design time rather than relying on optimization-specific behaviors.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *