Unexpected Query Results Due to Subquery Flattening in SQLite


Understanding Subquery Flattening and Its Impact on LIMIT/ORDER BY Operations

Issue Overview: Subquery Flattening Alters Execution Order of LIMIT and ORDER BY

The core issue arises from SQLite’s query optimizer rewriting the original query to improve performance, a process known as subquery flattening. This optimization merges nested subqueries into the outer query, altering the logical order of operations such as LIMIT and ORDER BY.

Consider the original query:

SELECT * 
FROM (SELECT x, y FROM data LIMIT 100) 
ORDER BY x;

The user intended for the LIMIT 100 to execute first, restricting the dataset to 100 rows, and then sort those 100 rows by x. However, SQLite’s optimizer flattens the subquery into the outer query, effectively rewriting it as:

SELECT x, y FROM data 
ORDER BY x 
LIMIT 100;

This rewritten query performs a full table scan followed by a sort of the entire dataset before applying the LIMIT, which is computationally expensive on large tables (e.g., 45M rows). The query plan confirms this behavior:

QUERY PLAN
|--SCAN data
`--USE TEMP B-TREE FOR ORDER BY

Key observations:

  1. Flattening eliminates the subquery: The optimizer treats the subquery as part of the outer query.
  2. Execution order inversion: The ORDER BY operates on the entire dataset, not the limited subset.
  3. Performance degradation: Sorting 45M rows to retrieve 100 is inefficient.

Possible Causes: Why SQLite Rewrites the Query

  1. Subquery Flattening Conditions
    SQLite merges subqueries into outer queries when specific conditions are met. In this case:

    • The subquery lacks GROUP BY, HAVING, DISTINCT, or LIMIT OFFSET (other than LIMIT 100).
    • The outer query does not use JOIN or UNION.
    • The subquery is a SELECT in the FROM clause.

    Flattening avoids materializing the subquery as a temporary table, which would require additional I/O and memory. However, when combined with ORDER BY in the outer query, this leads to unintended execution plans.

  2. Missing ORDER BY in the Subquery
    Without an explicit ORDER BY in the subquery, the LIMIT clause has no deterministic order to rely on. SQLite may scan the table in storage order (e.g., rowid order) but is free to choose any 100 rows. The outer ORDER BY then sorts these 100 rows, but the optimizer’s flattening forces sorting the entire table.

  3. Temporary Table Materialization
    When subqueries are not flattened, SQLite materializes them as temporary tables. For example:

    SELECT * 
    FROM (SELECT x, y FROM data LIMIT 100 OFFSET 0) 
    ORDER BY x;
    

    Adding OFFSET 0 prevents flattening, forcing the subquery to materialize. The outer ORDER BY then sorts only the 100 rows. However, the original query lacked such a mechanism.

  4. Query Planner Heuristics
    The optimizer prioritizes eliminating temporary tables, even if it increases computational work. This trade-off is beneficial for small datasets but harmful for large ones.


Solutions: Controlling Execution Order with Subquery Materialization

1. Prevent Flattening with OFFSET 0

Add OFFSET 0 to the subquery’s LIMIT clause. This forces SQLite to materialize the subquery as a temporary table:

SELECT * 
FROM (SELECT x, y FROM data LIMIT 100 OFFSET 0) 
ORDER BY x;

Query Plan Changes:

QUERY PLAN
|--CO-ROUTINE (subquery)
|  `--SCAN data
|--SCAN subquery
`--USE TEMP B-TREE FOR ORDER BY

The CO-ROUTINE opcode indicates the subquery is materialized before the outer query processes it. This ensures the LIMIT is applied first.

2. Use MATERIALIZED Common Table Expressions (CTEs)

Explicitly define the CTE as MATERIALIZED to prevent flattening:

WITH cte AS MATERIALIZED (
  SELECT x, y FROM data LIMIT 100
)
SELECT * FROM cte ORDER BY x;

Why This Works:

  • The MATERIALIZED keyword forces the CTE to be stored as a temporary table.
  • The outer ORDER BY operates on the 100-row temporary table.
3. Add a Deterministic ORDER BY in the Subquery

Specify an explicit order in the subquery to control which rows are selected:

SELECT * 
FROM (SELECT x, y FROM data ORDER BY rowid LIMIT 100) 
ORDER BY x;

Behavior:

  • The inner ORDER BY rowid ensures SQLite reads rows in rowid order (storage order).
  • The outer ORDER BY x sorts the limited subset.
4. Use WHERE 1 to Inhibit Flattening

Add a dummy WHERE clause to the outer query:

SELECT * 
FROM (SELECT x, y FROM data LIMIT 100) 
WHERE 1 
ORDER BY x;

Mechanism:

  • The WHERE 1 condition violates flattening rules (specifically, the outer query cannot have a WHERE clause if the subquery uses LIMIT without ORDER BY).
  • Forces materialization of the subquery.
5. Indexing Strategies

If x is indexed, rewrite the query to leverage the index:

SELECT x, y 
FROM data 
ORDER BY x 
LIMIT 100;

Trade-offs:

  • Uses the index to avoid sorting the entire table.
  • Returns the first 100 rows by x, which may differ from the original intent of sampling 100 rows before sorting.

Performance Validation and Testing

Benchmarking Flattened vs. Materialized Queries

Test Case 1: Original Query (Flattened)

-- Flattened (slow)
SELECT * 
FROM (SELECT x, y FROM data LIMIT 100) 
ORDER BY x;
  • Execution Time: 12.189 seconds (45M rows)
  • Fullscan Steps: 47,027,655 (entire table scanned)

Test Case 2: Materialized Subquery

-- Materialized (fast)
SELECT * 
FROM (SELECT x, y FROM data LIMIT 100 OFFSET 0) 
ORDER BY x;
  • Execution Time: 0.002 seconds
  • Fullscan Steps: 100 (only the limited subset scanned)

Analyzing EXPLAIN QUERY PLAN Output

Compare the flattened vs. materialized plans:

Flattened Plan:

QUERY PLAN
|--SCAN data
`--USE TEMP B-TREE FOR ORDER BY
  • Indicates a full table scan followed by a sort.

Materialized Plan:

QUERY PLAN
|--CO-ROUTINE (subquery)
|  `--SCAN data
|--SCAN subquery
`--USE TEMP B-TREE FOR ORDER BY
  • The CO-ROUTINE step materializes the subquery, separating LIMIT and ORDER BY.

Best Practices for Deterministic Results

  1. Always Specify ORDER BY with LIMIT
    If the order of rows matters, use ORDER BY in the subquery:

    SELECT * 
    FROM (SELECT x, y FROM data ORDER BY rowid LIMIT 100) 
    ORDER BY x;
    
  2. Use OFFSET 0 for Subquery Materialization
    Add OFFSET 0 to prevent flattening when order is irrelevant:

    SELECT * 
    FROM (SELECT x, y FROM data LIMIT 100 OFFSET 0) 
    ORDER BY x;
    
  3. Avoid Ambiguous Queries
    Queries without explicit ordering are prone to optimizer changes across SQLite versions. Document assumptions about row selection.


Conclusion

SQLite’s subquery flattening optimizes many queries but can invert the intended execution order of LIMIT and ORDER BY. By forcing materialization via OFFSET 0, MATERIALIZED CTEs, or dummy clauses, developers regain control over the execution plan. Always pair LIMIT with ORDER BY for deterministic results and validate plans with EXPLAIN QUERY PLAN.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *