Resolving Index Usage Issues with LIKE Patterns and OR Conditions in SQLite

Issue Overview: Index Not Utilized for Multi-LIKE Queries Despite Correct Expression Matching

When working with SQLite, developers often rely on indexes to optimize query performance, especially when dealing with complex text searches. A recurring challenge arises when queries involve multiple LIKE conditions with leading wildcards (e.g., LIKE '%substring%') combined using OR operators. In such cases, SQLite’s query planner may fail to utilize a pre-created index that matches the exact structure of these LIKE patterns, even after running ANALYZE to update statistics. This forces the query to perform a full table scan, resulting in degraded performance. The problem persists regardless of whether the LIKE operator is replaced with INSTR or REGEXP functions. However, manually forcing index usage via INDEXED BY improves performance significantly (2x–9x gains), indicating that the index is valid but not being selected automatically. This discrepancy highlights a gap between the query’s logical structure and SQLite’s index selection logic.

Technical Context

  • Index Structure: The index is explicitly defined to cover the LIKE patterns (e.g., CREATE INDEX idx_text ON main_table (TextColumn) WHERE TextColumn LIKE '%String1%' OR ...).
  • Query Structure: The query filters rows using the same LIKE patterns (e.g., WHERE TextColumn LIKE '%String1%' OR TextColumn LIKE '%String2%' ...).
  • Observed Behavior: SQLite ignores the index unless forced via INDEXED BY, even when the query and index expressions are identical.
  • Secondary Observations: Similar issues occur with INSTR-based indexes and REGEXP patterns, though these alternatives perform worse due to inherent limitations.

This issue is particularly impactful when dealing with large datasets where full table scans are prohibitively slow. Developers are left with a dilemma: either accept suboptimal performance or use INDEXED BY, which the SQLite documentation discourages due to its fragility (e.g., index renaming or deletion breaks queries).


Root Causes: Why SQLite Fails to Use the Index Automatically

1. Expression Matching Limitations in SQLite’s Query Planner

SQLite’s query planner uses cost-based heuristics to select indexes. For an index to be considered, the query’s WHERE clause must directly reference the index’s expressions in a form the planner recognizes. When LIKE patterns with leading wildcards are combined via OR, the planner struggles to map the conditions to the index. This is because:

  • Leading Wildcards Disable Prefix Scanning: Indexes are optimized for prefix matches (e.g., LIKE 'abc%'). Leading wildcards (%abc%) require scanning the entire index, which the planner may deem less efficient than a table scan.
  • OR Conditions Fragment Logic: The planner evaluates each OR branch independently. If any branch cannot use the index, the entire clause may default to a table scan. This is exacerbated when 13+ LIKE conditions are used, as the planner’s cost estimates skew toward full scans.

2. Boolean Expression Ambiguity

The original query uses a structure like:

WHERE TextColumn LIKE '%A%' OR TextColumn LIKE '%B%' ...

SQLite does not recognize that the entire OR chain evaluates to a boolean result that matches the index’s WHERE clause. The index is defined using the same boolean logic, but the planner fails to associate the two. Explicitly casting the OR conditions as a boolean expression (e.g., (LIKE ... OR ...) = 1) resolves this ambiguity, enabling index usage.

3. Function-Based Index Handling

SQLite’s support for function-based indexes is limited. Indexes on expressions like INSTR(TextColumn, 'A') or TextColumn LIKE '%A%' require the query to reference the exact expression used in the index. Minor syntactic differences (e.g., omitting parentheses or using equivalent functions) prevent recognition. For example:

  • Index: CREATE INDEX idx_instr ON main_table (INSTR(TextColumn, 'A'))
  • Query: WHERE INSTR(TextColumn, 'A') > 0 (uses index)
  • Query: WHERE INSTR(TextColumn, 'A') (ignores index; lacks > 0)

This rigidity forces developers to mirror index expressions precisely in queries.

4. Statistics and Selectivity Misestimation

The ANALYZE command generates table statistics (stored in sqlite_stat1) to help the planner choose indexes. However:

  • High Cardinality: If the TextColumn contains many unique values, the planner assumes the index has low selectivity for LIKE patterns, favoring a table scan.
  • Small Tables: For small tables, the planner may ignore indexes regardless of structure, as scans are faster for minimal data.

Solutions: Ensuring Index Usage for Multi-LIKE Queries

1. Rewrite the Query to Explicitly Reference the Index’s Boolean Logic

Modify the WHERE clause to treat the OR chain as a single boolean expression:

-- Original (ignored index)
SELECT TextColumn FROM main_table 
WHERE TextColumn LIKE '%A%' OR TextColumn LIKE '%B%';

-- Revised (uses index)
SELECT TextColumn FROM main_table 
WHERE (TextColumn LIKE '%A%' OR TextColumn LIKE '%B%') = 1;

By wrapping the OR conditions and comparing to 1, SQLite recognizes the expression as a boolean match for the index’s WHERE clause. This aligns the query’s syntax with the index’s definition, bypassing the planner’s ambiguity.

2. Use Materialized Views or Shadow Tables

For highly static datasets, precompute matches and store them in a shadow table:

-- Shadow table
CREATE TABLE shadow_matches (
    main_table_id INTEGER PRIMARY KEY,
    has_match BOOLEAN,
    FOREIGN KEY (main_table_id) REFERENCES main_table(id)
);

-- Index the boolean column
CREATE INDEX idx_shadow ON shadow_matches (has_match);

-- Populate the table (run after secondary_table updates)
INSERT INTO shadow_matches (main_table_id, has_match)
SELECT id, 1 FROM main_table
WHERE EXISTS (SELECT 1 FROM secondary_table WHERE main_table.TextColumn LIKE '%'||secondary_table.String||'%');

Queries then filter using the shadow table:

SELECT TextColumn FROM main_table
WHERE id IN (SELECT main_table_id FROM shadow_matches WHERE has_match = 1);

This approach shifts computational overhead to the shadow table’s maintenance but guarantees index usage.

3. Combine LIKE Conditions into a Single GLOB or REGEXP (With Caveats)

While GLOB and REGEXP have their own limitations, they can sometimes simplify patterns:

-- Using GLOB (case-sensitive, no wildcard flexibility)
WHERE TextColumn GLOB '*A*' OR TextColumn GLOB '*B*';

-- Using REGEXP (requires extension)
WHERE TextColumn REGEXP 'A|B';

However:

  • GLOB lacks case-insensitivity without additional functions (e.g., LOWER(TextColumn) GLOB '*a*'), which may disable index usage.
  • REGEXP is not natively supported in SQLite and requires loading an extension (e.g., ICU or regexp).

4. Leverage Partial Indexes with Coverage for All Patterns

Define a partial index that covers all possible LIKE patterns from secondary_table:

-- Example for 3 patterns
CREATE INDEX idx_patterns ON main_table (TextColumn)
WHERE TextColumn LIKE '%A%' 
   OR TextColumn LIKE '%B%'
   OR TextColumn LIKE '%C%';

Ensure the index’s WHERE clause exactly matches the query’s conditions. Rebuild the index when secondary_table changes.

5. Use INDEXED BY as a Last Resort

If all else fails, force index usage:

SELECT TextColumn FROM main_table
INDEXED BY idx_patterns
WHERE TextColumn LIKE '%A%' OR TextColumn LIKE '%B%';

Risks:

  • Breaks if the index is renamed or dropped.
  • Bypasses the query planner’s adaptability to future schema changes.

6. Optimize secondary_table for String Lookups

If secondary_table is static, precompute all possible substrings and store them in a column optimized for joins:

-- Add a column with the pattern '%String%'
ALTER TABLE secondary_table ADD Pattern TEXT;
UPDATE secondary_table SET Pattern = '%' || String || '%';

-- Create a covering index
CREATE INDEX idx_secondary ON secondary_table (Pattern);

-- Revise the query to join using LIKE
SELECT m.TextColumn FROM main_table m
WHERE EXISTS (
    SELECT 1 FROM secondary_table s 
    WHERE m.TextColumn LIKE s.Pattern
);

This allows SQLite to use indexes on secondary_table.Pattern for faster lookups, though main_table scans may still occur.


Final Recommendations

  1. Prefer Boolean Casting: Rewrite WHERE (LIKE ... OR ...) = 1 to align with index logic.
  2. Profile Query Plans: Use EXPLAIN QUERY PLAN to verify index usage and refine conditions.
  3. Benchmark Forced vs. Automatic Indexing: If INDEXED BY yields significant gains, document the rationale and risks.
  4. Consider Shadow Tables: For read-heavy workloads, materialized views reduce runtime overhead.
  5. Monitor secondary_table Changes: Rebuild indexes or shadow tables when source data changes.

By understanding SQLite’s query planner limitations and restructuring queries to match its expectations, developers can achieve optimal performance without resorting to fragile fixes.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *