Query Scanning Beyond Necessary Rows Due to Incorrect Index Condition in Coroutine
Understanding the Core Performance Anomaly in Coroutine-Driven Queries
The problem revolves around SQLite query execution plans unexpectedly scanning the entire Results
table via an index range condition (PartId>?
) instead of terminating early when using a scalar subquery to filter by PartId
. This occurs despite explicit logic in the view (PartResults
) and outer queries that should limit results to a single PartId
. The anomaly manifests in two variants of a query (C and D) that combine a scalar subquery with a view and JOIN operations, resulting in significantly slower performance compared to logically equivalent standalone queries (A and B). At the heart of the issue is SQLite’s query planner generating a suboptimal index condition during coroutine execution, leading to unnecessary table scans even after all valid rows have been processed.
Technical Context
- Schema Design: The
Results
table stores entries withPartId
,PartSide
, andDiscriminator
, indexed viaIX_Results_PartId_PartSide_Discriminator
. TheResultKindPresences
table is a WITHOUT ROWID table with a composite primary key linking toResults.Id
. - View Definition:
PartResults
usesMAX(Id)
grouped byPartId
,PartSide
, andDiscriminator
to identify the latest applicableResultId
for each partition. - Performance Discrepancy:
- Query A (
WHERE pr.PartId = 88
) and Query B (WHERE Id = 1224
) execute efficiently by leveraging index equality searches. - Query C and D (using scalar subqueries to derive
PartId
) exhibit degraded performance due to the query planner substituting an equality condition (PartId=?
) with a range scan (PartId>?
).
- Query A (
Observed Behavior
- Query Plan Analysis:
- In fast queries (A and B), the index is searched using exact matches (
PartId=?
orrowid=?
). - In slow queries (C and D), the coroutine driving the
PartResults
view uses a range condition (PartId>?
), forcing a full scan of all entries for the targetPartId
instead of stopping at the first valid entry.
- In fast queries (A and B), the index is searched using exact matches (
- Impact on Execution Time: On a 10 GB database, this results in execution times escalating from milliseconds (for direct equality queries) to seconds when using scalar subqueries or JOIN-based filtering.
Key Indicators of the Problem
- Unexpected Range Scans: The presence of
SEARCH Results USING COVERING INDEX ... (PartId>?)
in the query plan when logic dictates only a singlePartId
is relevant. - Coroutine Materialization: SQLite’s coroutine materialization for the
PartResults
view in Query C/D introduces a scanning pattern that bypasses early termination optimizations. - Subquery Interaction: The scalar subquery
(SELECT PartId FROM Results WHERE Id = 1224)
is not being optimized to propagate its equality constraint into the coroutine’s index usage.
Root Causes of the Suboptimal Index Condition
1. Coroutine Materialization and Predicate Pushdown Limitations
SQLite’s query planner materializes coroutines (e.g., views or CTEs) as temporary tables during execution. When a scalar subquery provides a value to the coroutine, the planner may fail to propagate equality constraints into the materialized subquery. This occurs because:
- The scalar subquery’s result is treated as a dynamic value rather than a fixed constant during coroutine initialization.
- The
PartResults
view’sGROUP BY
clause forces the planner to prepare for multiplePartId
values, even when the outer query restricts it to one. This leads to conservative index usage (PartId>?
) to accommodate hypothetical ranges.
2. Index Selection Heuristics
The IX_Results_PartId_PartSide_Discriminator
index is a covering index for the PartResults
view. However, when the query planner cannot statically determine the PartId
value (due to subquery-derived values), it defaults to a range scan. This is a safeguard against incomplete statistics or variable binding delays in nested queries.
3. View Definition and Grouping Logic
The PartResults
view’s GROUP BY PartId, PartSide, Discriminator
creates ambiguity for the optimizer. Even when the outer query filters by a single PartId
, the view’s grouping columns suggest multiple possible combinations, preventing the planner from inferring that the MAX(Id)
per group is unique for a given PartId
.
4. Statistics and Schema Misalignment
Although the user ran ANALYZE
, the distribution of PartId
values might not be adequately represented in SQLite’s internal statistics. For example, if PartId
is highly selective (few rows per value), the optimizer might still favor a range scan due to stale or insufficient statistical data.
Comprehensive Solutions and Workarounds
1. Rewriting the Query to Bypass Coroutine Materialization
The most effective solution involves restructuring the query to eliminate the coroutine’s range scan by explicitly joining tables in a way that enforces equality conditions.
Example Fix Using a CTE (Mark Lawrence’s Approach):
WITH maxResultId AS (
SELECT r2.PartId, MAX(r2.Id) AS ResultId
FROM Results AS r1
INNER JOIN Results AS r2 ON r2.PartID = r1.PartID
WHERE r1.Id = 1224
GROUP BY r2.PartId, r2.PartSide, r2.Discriminator
)
SELECT mr.PartId, rkp.Kind
FROM maxResultID AS mr
INNER JOIN ResultKindPresences AS rkp ON rkp.ResultId = mr.ResultId;
Why This Works:
- The CTE
maxResultId
joinsResults
to itself, bindingr2.PartId
directly tor1.PartId
(fromr1.Id = 1224
). - The optimizer recognizes
r1.PartId
as a constant, enabling an equality search (PartId=?
) onr2
via the covering index. - Coroutine materialization is avoided, and the query plan uses efficient index lookups.
2. Materializing the View with Triggers
If query rewriting is impractical, materializing the PartResults
view as a physical table maintained via triggers ensures the MAX(Id)
logic is precomputed.
Implementation Steps:
- Create a
MaterializedPartResults
table:CREATE TABLE MaterializedPartResults ( PartId INTEGER NOT NULL, ResultId INTEGER NOT NULL, PRIMARY KEY (PartId, PartSide, Discriminator) ) WITHOUT ROWID;
- Add triggers on
Results
to updateMaterializedPartResults
on INSERT/UPDATE/DELETE. - Modify queries to reference
MaterializedPartResults
instead of the view.
Advantages:
- Eliminates runtime computation of
MAX(Id)
and grouping. - Forces the optimizer to use direct lookups via the materialized table’s primary key.
3. Index Adjustments and Query Hints
Force the optimizer to use an equality condition by:
- Adding an index hint (not natively supported in SQLite but achievable via query structure).
- Using a
CASE
expression orLIMIT 1
to constrain the result set.
Example Using Subquery Factoring:
SELECT pr.PartId, rkp.Kind
FROM (
SELECT PartId, MAX(Id) AS ResultId
FROM Results
WHERE PartId = (SELECT PartId FROM Results WHERE Id = 1224)
GROUP BY PartId, PartSide, Discriminator
) AS pr
INNER JOIN ResultKindPresences AS rkp ON rkp.ResultId = pr.ResultId;
Outcome: The inner subquery’s WHERE PartId = (...)
applies an equality filter before grouping, allowing the index to be searched with PartId=?
.
4. Version-Specific Optimizations
While the user is constrained to SQLite 3.40, testing on newer versions (3.47+) may reveal improved query planner behavior. If upgrading is feasible, validate whether the PartId>?
anomaly persists.
5. Statistical Maintenance and Query Plan Forcing
- Manually populate
sqlite_stat1
with precise statistics forPartId
distribution. - Use
INDEXED BY
clauses to override the planner’s index selection (risky but viable in controlled environments).
Example:
SELECT pr.PartId, rkp.Kind
FROM PartResults AS pr INDEXED BY IX_Results_PartId_PartSide_Discriminator
INNER JOIN ResultKindPresences AS rkp ON rkp.ResultId = pr.ResultId
WHERE pr.PartId = (SELECT PartId FROM Results WHERE Id = 1224);
Conclusion
The root cause of the performance degradation lies in SQLite’s query planner conservatively opting for a range scan (PartId>?
) when a scalar subquery provides the PartId
value. This issue is exacerbated by coroutine materialization and grouping logic in the PartResults
view. By restructuring queries to explicitly enforce equality conditions, materializing critical view data, or leveraging CTEs to bypass coroutine limitations, users can restore optimal performance. Persistent issues with index selection may require manual intervention via statistics maintenance or index hints, though these approaches demand careful testing to avoid regressions.