Query Scanning Beyond Necessary Rows Due to Incorrect Index Condition in Coroutine


Understanding the Core Performance Anomaly in Coroutine-Driven Queries

The problem revolves around SQLite query execution plans unexpectedly scanning the entire Results table via an index range condition (PartId>?) instead of terminating early when using a scalar subquery to filter by PartId. This occurs despite explicit logic in the view (PartResults) and outer queries that should limit results to a single PartId. The anomaly manifests in two variants of a query (C and D) that combine a scalar subquery with a view and JOIN operations, resulting in significantly slower performance compared to logically equivalent standalone queries (A and B). At the heart of the issue is SQLite’s query planner generating a suboptimal index condition during coroutine execution, leading to unnecessary table scans even after all valid rows have been processed.

Technical Context

  • Schema Design: The Results table stores entries with PartId, PartSide, and Discriminator, indexed via IX_Results_PartId_PartSide_Discriminator. The ResultKindPresences table is a WITHOUT ROWID table with a composite primary key linking to Results.Id.
  • View Definition: PartResults uses MAX(Id) grouped by PartId, PartSide, and Discriminator to identify the latest applicable ResultId for each partition.
  • Performance Discrepancy:
    • Query A (WHERE pr.PartId = 88) and Query B (WHERE Id = 1224) execute efficiently by leveraging index equality searches.
    • Query C and D (using scalar subqueries to derive PartId) exhibit degraded performance due to the query planner substituting an equality condition (PartId=?) with a range scan (PartId>?).

Observed Behavior

  • Query Plan Analysis:
    • In fast queries (A and B), the index is searched using exact matches (PartId=? or rowid=?).
    • In slow queries (C and D), the coroutine driving the PartResults view uses a range condition (PartId>?), forcing a full scan of all entries for the target PartId instead of stopping at the first valid entry.
  • Impact on Execution Time: On a 10 GB database, this results in execution times escalating from milliseconds (for direct equality queries) to seconds when using scalar subqueries or JOIN-based filtering.

Key Indicators of the Problem

  1. Unexpected Range Scans: The presence of SEARCH Results USING COVERING INDEX ... (PartId>?) in the query plan when logic dictates only a single PartId is relevant.
  2. Coroutine Materialization: SQLite’s coroutine materialization for the PartResults view in Query C/D introduces a scanning pattern that bypasses early termination optimizations.
  3. Subquery Interaction: The scalar subquery (SELECT PartId FROM Results WHERE Id = 1224) is not being optimized to propagate its equality constraint into the coroutine’s index usage.

Root Causes of the Suboptimal Index Condition

1. Coroutine Materialization and Predicate Pushdown Limitations

SQLite’s query planner materializes coroutines (e.g., views or CTEs) as temporary tables during execution. When a scalar subquery provides a value to the coroutine, the planner may fail to propagate equality constraints into the materialized subquery. This occurs because:

  • The scalar subquery’s result is treated as a dynamic value rather than a fixed constant during coroutine initialization.
  • The PartResults view’s GROUP BY clause forces the planner to prepare for multiple PartId values, even when the outer query restricts it to one. This leads to conservative index usage (PartId>?) to accommodate hypothetical ranges.

2. Index Selection Heuristics

The IX_Results_PartId_PartSide_Discriminator index is a covering index for the PartResults view. However, when the query planner cannot statically determine the PartId value (due to subquery-derived values), it defaults to a range scan. This is a safeguard against incomplete statistics or variable binding delays in nested queries.

3. View Definition and Grouping Logic

The PartResults view’s GROUP BY PartId, PartSide, Discriminator creates ambiguity for the optimizer. Even when the outer query filters by a single PartId, the view’s grouping columns suggest multiple possible combinations, preventing the planner from inferring that the MAX(Id) per group is unique for a given PartId.

4. Statistics and Schema Misalignment

Although the user ran ANALYZE, the distribution of PartId values might not be adequately represented in SQLite’s internal statistics. For example, if PartId is highly selective (few rows per value), the optimizer might still favor a range scan due to stale or insufficient statistical data.


Comprehensive Solutions and Workarounds

1. Rewriting the Query to Bypass Coroutine Materialization

The most effective solution involves restructuring the query to eliminate the coroutine’s range scan by explicitly joining tables in a way that enforces equality conditions.

Example Fix Using a CTE (Mark Lawrence’s Approach):

WITH maxResultId AS (
  SELECT r2.PartId, MAX(r2.Id) AS ResultId
  FROM Results AS r1
  INNER JOIN Results AS r2 ON r2.PartID = r1.PartID
  WHERE r1.Id = 1224
  GROUP BY r2.PartId, r2.PartSide, r2.Discriminator
)
SELECT mr.PartId, rkp.Kind
FROM maxResultID AS mr
INNER JOIN ResultKindPresences AS rkp ON rkp.ResultId = mr.ResultId;

Why This Works:

  • The CTE maxResultId joins Results to itself, binding r2.PartId directly to r1.PartId (from r1.Id = 1224).
  • The optimizer recognizes r1.PartId as a constant, enabling an equality search (PartId=?) on r2 via the covering index.
  • Coroutine materialization is avoided, and the query plan uses efficient index lookups.

2. Materializing the View with Triggers

If query rewriting is impractical, materializing the PartResults view as a physical table maintained via triggers ensures the MAX(Id) logic is precomputed.

Implementation Steps:

  1. Create a MaterializedPartResults table:
    CREATE TABLE MaterializedPartResults (
      PartId INTEGER NOT NULL,
      ResultId INTEGER NOT NULL,
      PRIMARY KEY (PartId, PartSide, Discriminator)
    ) WITHOUT ROWID;
    
  2. Add triggers on Results to update MaterializedPartResults on INSERT/UPDATE/DELETE.
  3. Modify queries to reference MaterializedPartResults instead of the view.

Advantages:

  • Eliminates runtime computation of MAX(Id) and grouping.
  • Forces the optimizer to use direct lookups via the materialized table’s primary key.

3. Index Adjustments and Query Hints

Force the optimizer to use an equality condition by:

  • Adding an index hint (not natively supported in SQLite but achievable via query structure).
  • Using a CASE expression or LIMIT 1 to constrain the result set.

Example Using Subquery Factoring:

SELECT pr.PartId, rkp.Kind
FROM (
  SELECT PartId, MAX(Id) AS ResultId
  FROM Results
  WHERE PartId = (SELECT PartId FROM Results WHERE Id = 1224)
  GROUP BY PartId, PartSide, Discriminator
) AS pr
INNER JOIN ResultKindPresences AS rkp ON rkp.ResultId = pr.ResultId;

Outcome: The inner subquery’s WHERE PartId = (...) applies an equality filter before grouping, allowing the index to be searched with PartId=?.

4. Version-Specific Optimizations

While the user is constrained to SQLite 3.40, testing on newer versions (3.47+) may reveal improved query planner behavior. If upgrading is feasible, validate whether the PartId>? anomaly persists.

5. Statistical Maintenance and Query Plan Forcing

  • Manually populate sqlite_stat1 with precise statistics for PartId distribution.
  • Use INDEXED BY clauses to override the planner’s index selection (risky but viable in controlled environments).

Example:

SELECT pr.PartId, rkp.Kind
FROM PartResults AS pr INDEXED BY IX_Results_PartId_PartSide_Discriminator
INNER JOIN ResultKindPresences AS rkp ON rkp.ResultId = pr.ResultId
WHERE pr.PartId = (SELECT PartId FROM Results WHERE Id = 1224);

Conclusion

The root cause of the performance degradation lies in SQLite’s query planner conservatively opting for a range scan (PartId>?) when a scalar subquery provides the PartId value. This issue is exacerbated by coroutine materialization and grouping logic in the PartResults view. By restructuring queries to explicitly enforce equality conditions, materializing critical view data, or leveraging CTEs to bypass coroutine limitations, users can restore optimal performance. Persistent issues with index selection may require manual intervention via statistics maintenance or index hints, though these approaches demand careful testing to avoid regressions.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *