NOT INDEXED Clause Fails to Suppress Multi-Column Index Usage in SQLite Queries


Composite Index Utilization Overrides NOT INDEXED Directive in Multi-Column WHERE Clauses

The core problem arises when a SQLite query includes a NOT INDEXED clause intended to force a full table scan but observes continued usage of a composite index. This occurs specifically when the WHERE clause contains predicates for multiple columns covered by the composite index. For example, given a table mp with a unique composite index theindex on columns (id, tag), executing SELECT * FROM mp NOT INDEXED WHERE id=5 AND tag=9 still shows the query optimizer using theindex in the execution plan. This contradicts the expected behavior seen in single-column WHERE clauses, where NOT INDEXED successfully suppresses index usage and triggers a table scan. The discrepancy stems from how SQLite’s query optimizer prioritizes composite index efficiency over explicit suppression directives under specific conditions.

The schema and query structure play pivotal roles. The composite index (id, tag) provides a covering index for the two-column equality predicate, making it highly efficient for the optimizer to use. When the WHERE clause includes all columns of the composite index, SQLite’s cost-based optimizer may override the NOT INDEXED hint due to the perceived efficiency gains of using the index. This behavior is inconsistent with single-column scenarios, where the absence of a covering index for the full predicate allows NOT INDEXED to function as intended. The issue is exacerbated in SQLite versions prior to 3.38.5 with specific check-ins, where the optimizer’s decision logic did not fully account for explicit index suppression in multi-column predicate contexts.


Optimizer Heuristics and Composite Index Coverage Override Explicit Index Suppression

The primary cause of this behavior lies in SQLite’s query optimizer architecture. When evaluating potential query plans, SQLite assigns a "cost" to each option, favoring plans with lower estimated costs. A composite index that exactly matches the columns in the WHERE clause’s predicates will have an exceptionally low cost due to its ability to resolve the query with minimal disk I/O and row filtering. The NOT INDEXED clause is designed to discourage index usage, but it does not force a table scan unconditionally. Instead, it adds a penalty to the cost of using any index, making table scans relatively more attractive. However, when the composite index’s inherent efficiency outweighs this penalty, the optimizer will still select the index.

A secondary factor is the distinction between single-column and multi-column indexes. In single-column queries, the absence of a covering index (or a partially applicable composite index) allows the NOT INDEXED penalty to tip the balance toward a table scan. For multi-column queries, the composite index acts as a covering index, eliminating the need for additional row lookups. This efficiency gain often nullifies the NOT INDEXED penalty. Additionally, pre-3.38.5 SQLite versions contained a subtle bug in the query planner’s handling of index suppression for composite indexes, where the penalty was not applied correctly. This bug was addressed in check-in bd87d107fe474cee, which refined the cost calculation logic to respect NOT INDEXED even when composite indexes are present.


Resolving Index Suppression Failures: Version Upgrades, Query Plan Analysis, and Schema Adjustments

Step 1: Verify SQLite Version and Apply the Fix
Upgrade to SQLite version 3.38.5 or later, ensuring the inclusion of check-in bd87d107fe474cee. This resolves the planner’s incorrect handling of NOT INDEXED for composite indexes. For embedded environments or systems where upgrading is constrained, backporting the fix to a custom build may be necessary.

Step 2: Analyze Query Plans with EXPLAIN and EXPLAIN QUERY PLAN
Use EXPLAIN and EXPLAIN QUERY PLAN to diagnose index usage. Compare outputs with and without NOT INDEXED:

EXPLAIN QUERY PLAN SELECT * FROM mp WHERE id=5 AND tag=9; -- Uses theindex
EXPLAIN QUERY PLAN SELECT * FROM mp NOT INDEXED WHERE id=5 AND tag=9; -- Should use table scan post-fix

Post-fix, the second query should show SCAN TABLE mp instead of SEARCH TABLE mp USING INDEX theindex.

Step 3: Force Table Scans with Alternative Techniques
If upgrading is not feasible, employ workarounds:

  • Use INDEXED BY with a non-existent index name to trigger an error, forcing a table scan:
    SELECT * FROM mp INDEXED BY (non_existent_index) WHERE id=5 AND tag=9;
    
  • Rewrite the query to obscure predicate coverage, e.g., using expressions that prevent the optimizer from recognizing the composite index’s applicability:
    SELECT * FROM mp WHERE id+0=5 AND tag+0=9; -- Inhibits index usage
    

Step 4: Schema Design Considerations
Avoid over-indexing. If NOT INDEXED is critical for specific queries, consider:

  • Dropping the composite index if it is not needed elsewhere.
  • Using partial indexes or indexed views to limit index scope.
  • Separating columns into single-column indexes, though this may degrade performance for other queries.

Step 5: Monitor Optimizer Statistics
SQLite uses internal statistics (e.g., sqlite_stat1) to guide the optimizer. Outdated stats may cause suboptimal plans. Periodically run ANALYZE to refresh statistics, ensuring the optimizer’s cost calculations reflect current data distributions.

Step 6: Evaluate Query Structure
Re-express queries to reduce the optimizer’s reliance on composite indexes. For example, using OR instead of AND in predicates (where logically valid) may bypass composite index coverage:

SELECT * FROM mp WHERE id=5 OR tag=9; -- Likely uses table scan unless separate indexes exist

Step 7: Leverage Runtime Directives
Use PRAGMA settings to influence optimizer behavior temporarily:

PRAGMA optimize; -- Auto-configures query planner based on current data
PRAGMA automatic_index = OFF; -- Disables automatic index creation

Final Note: The NOT INDEXED clause is a hint, not a mandate. Its effectiveness depends on the optimizer’s cost model, which prioritizes query performance over user directives. Understanding this balance is crucial for schema design and query optimization.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *