Unexpected Query Results Due to Subquery Flattening in SQLite
Understanding Subquery Flattening and Its Impact on LIMIT/ORDER BY Operations
Issue Overview: Subquery Flattening Alters Execution Order of LIMIT and ORDER BY
The core issue arises from SQLite’s query optimizer rewriting the original query to improve performance, a process known as subquery flattening. This optimization merges nested subqueries into the outer query, altering the logical order of operations such as LIMIT
and ORDER BY
.
Consider the original query:
SELECT *
FROM (SELECT x, y FROM data LIMIT 100)
ORDER BY x;
The user intended for the LIMIT 100
to execute first, restricting the dataset to 100 rows, and then sort those 100 rows by x
. However, SQLite’s optimizer flattens the subquery into the outer query, effectively rewriting it as:
SELECT x, y FROM data
ORDER BY x
LIMIT 100;
This rewritten query performs a full table scan followed by a sort of the entire dataset before applying the LIMIT
, which is computationally expensive on large tables (e.g., 45M rows). The query plan confirms this behavior:
QUERY PLAN
|--SCAN data
`--USE TEMP B-TREE FOR ORDER BY
Key observations:
- Flattening eliminates the subquery: The optimizer treats the subquery as part of the outer query.
- Execution order inversion: The
ORDER BY
operates on the entire dataset, not the limited subset. - Performance degradation: Sorting 45M rows to retrieve 100 is inefficient.
Possible Causes: Why SQLite Rewrites the Query
Subquery Flattening Conditions
SQLite merges subqueries into outer queries when specific conditions are met. In this case:- The subquery lacks
GROUP BY
,HAVING
,DISTINCT
, orLIMIT OFFSET
(other thanLIMIT 100
). - The outer query does not use
JOIN
orUNION
. - The subquery is a
SELECT
in theFROM
clause.
Flattening avoids materializing the subquery as a temporary table, which would require additional I/O and memory. However, when combined with
ORDER BY
in the outer query, this leads to unintended execution plans.- The subquery lacks
Missing ORDER BY in the Subquery
Without an explicitORDER BY
in the subquery, theLIMIT
clause has no deterministic order to rely on. SQLite may scan the table in storage order (e.g., rowid order) but is free to choose any 100 rows. The outerORDER BY
then sorts these 100 rows, but the optimizer’s flattening forces sorting the entire table.Temporary Table Materialization
When subqueries are not flattened, SQLite materializes them as temporary tables. For example:SELECT * FROM (SELECT x, y FROM data LIMIT 100 OFFSET 0) ORDER BY x;
Adding
OFFSET 0
prevents flattening, forcing the subquery to materialize. The outerORDER BY
then sorts only the 100 rows. However, the original query lacked such a mechanism.Query Planner Heuristics
The optimizer prioritizes eliminating temporary tables, even if it increases computational work. This trade-off is beneficial for small datasets but harmful for large ones.
Solutions: Controlling Execution Order with Subquery Materialization
1. Prevent Flattening with OFFSET 0
Add OFFSET 0
to the subquery’s LIMIT
clause. This forces SQLite to materialize the subquery as a temporary table:
SELECT *
FROM (SELECT x, y FROM data LIMIT 100 OFFSET 0)
ORDER BY x;
Query Plan Changes:
QUERY PLAN
|--CO-ROUTINE (subquery)
| `--SCAN data
|--SCAN subquery
`--USE TEMP B-TREE FOR ORDER BY
The CO-ROUTINE
opcode indicates the subquery is materialized before the outer query processes it. This ensures the LIMIT
is applied first.
2. Use MATERIALIZED Common Table Expressions (CTEs)
Explicitly define the CTE as MATERIALIZED
to prevent flattening:
WITH cte AS MATERIALIZED (
SELECT x, y FROM data LIMIT 100
)
SELECT * FROM cte ORDER BY x;
Why This Works:
- The
MATERIALIZED
keyword forces the CTE to be stored as a temporary table. - The outer
ORDER BY
operates on the 100-row temporary table.
3. Add a Deterministic ORDER BY in the Subquery
Specify an explicit order in the subquery to control which rows are selected:
SELECT *
FROM (SELECT x, y FROM data ORDER BY rowid LIMIT 100)
ORDER BY x;
Behavior:
- The inner
ORDER BY rowid
ensures SQLite reads rows in rowid order (storage order). - The outer
ORDER BY x
sorts the limited subset.
4. Use WHERE 1 to Inhibit Flattening
Add a dummy WHERE
clause to the outer query:
SELECT *
FROM (SELECT x, y FROM data LIMIT 100)
WHERE 1
ORDER BY x;
Mechanism:
- The
WHERE 1
condition violates flattening rules (specifically, the outer query cannot have aWHERE
clause if the subquery usesLIMIT
withoutORDER BY
). - Forces materialization of the subquery.
5. Indexing Strategies
If x
is indexed, rewrite the query to leverage the index:
SELECT x, y
FROM data
ORDER BY x
LIMIT 100;
Trade-offs:
- Uses the index to avoid sorting the entire table.
- Returns the first 100 rows by
x
, which may differ from the original intent of sampling 100 rows before sorting.
Performance Validation and Testing
Benchmarking Flattened vs. Materialized Queries
Test Case 1: Original Query (Flattened)
-- Flattened (slow)
SELECT *
FROM (SELECT x, y FROM data LIMIT 100)
ORDER BY x;
- Execution Time: 12.189 seconds (45M rows)
- Fullscan Steps: 47,027,655 (entire table scanned)
Test Case 2: Materialized Subquery
-- Materialized (fast)
SELECT *
FROM (SELECT x, y FROM data LIMIT 100 OFFSET 0)
ORDER BY x;
- Execution Time: 0.002 seconds
- Fullscan Steps: 100 (only the limited subset scanned)
Analyzing EXPLAIN QUERY PLAN Output
Compare the flattened vs. materialized plans:
Flattened Plan:
QUERY PLAN
|--SCAN data
`--USE TEMP B-TREE FOR ORDER BY
- Indicates a full table scan followed by a sort.
Materialized Plan:
QUERY PLAN
|--CO-ROUTINE (subquery)
| `--SCAN data
|--SCAN subquery
`--USE TEMP B-TREE FOR ORDER BY
- The
CO-ROUTINE
step materializes the subquery, separatingLIMIT
andORDER BY
.
Best Practices for Deterministic Results
Always Specify ORDER BY with LIMIT
If the order of rows matters, useORDER BY
in the subquery:SELECT * FROM (SELECT x, y FROM data ORDER BY rowid LIMIT 100) ORDER BY x;
Use OFFSET 0 for Subquery Materialization
AddOFFSET 0
to prevent flattening when order is irrelevant:SELECT * FROM (SELECT x, y FROM data LIMIT 100 OFFSET 0) ORDER BY x;
Avoid Ambiguous Queries
Queries without explicit ordering are prone to optimizer changes across SQLite versions. Document assumptions about row selection.
Conclusion
SQLite’s subquery flattening optimizes many queries but can invert the intended execution order of LIMIT
and ORDER BY
. By forcing materialization via OFFSET 0
, MATERIALIZED
CTEs, or dummy clauses, developers regain control over the execution plan. Always pair LIMIT
with ORDER BY
for deterministic results and validate plans with EXPLAIN QUERY PLAN
.