Resolving Index Usage Issues with LIKE Patterns and OR Conditions in SQLite
Issue Overview: Index Not Utilized for Multi-LIKE Queries Despite Correct Expression Matching
When working with SQLite, developers often rely on indexes to optimize query performance, especially when dealing with complex text searches. A recurring challenge arises when queries involve multiple LIKE
conditions with leading wildcards (e.g., LIKE '%substring%'
) combined using OR
operators. In such cases, SQLite’s query planner may fail to utilize a pre-created index that matches the exact structure of these LIKE
patterns, even after running ANALYZE
to update statistics. This forces the query to perform a full table scan, resulting in degraded performance. The problem persists regardless of whether the LIKE
operator is replaced with INSTR
or REGEXP
functions. However, manually forcing index usage via INDEXED BY
improves performance significantly (2x–9x gains), indicating that the index is valid but not being selected automatically. This discrepancy highlights a gap between the query’s logical structure and SQLite’s index selection logic.
Technical Context
- Index Structure: The index is explicitly defined to cover the
LIKE
patterns (e.g.,CREATE INDEX idx_text ON main_table (TextColumn) WHERE TextColumn LIKE '%String1%' OR ...
). - Query Structure: The query filters rows using the same
LIKE
patterns (e.g.,WHERE TextColumn LIKE '%String1%' OR TextColumn LIKE '%String2%' ...
). - Observed Behavior: SQLite ignores the index unless forced via
INDEXED BY
, even when the query and index expressions are identical. - Secondary Observations: Similar issues occur with
INSTR
-based indexes andREGEXP
patterns, though these alternatives perform worse due to inherent limitations.
This issue is particularly impactful when dealing with large datasets where full table scans are prohibitively slow. Developers are left with a dilemma: either accept suboptimal performance or use INDEXED BY
, which the SQLite documentation discourages due to its fragility (e.g., index renaming or deletion breaks queries).
Root Causes: Why SQLite Fails to Use the Index Automatically
1. Expression Matching Limitations in SQLite’s Query Planner
SQLite’s query planner uses cost-based heuristics to select indexes. For an index to be considered, the query’s WHERE
clause must directly reference the index’s expressions in a form the planner recognizes. When LIKE
patterns with leading wildcards are combined via OR
, the planner struggles to map the conditions to the index. This is because:
- Leading Wildcards Disable Prefix Scanning: Indexes are optimized for prefix matches (e.g.,
LIKE 'abc%'
). Leading wildcards (%abc%
) require scanning the entire index, which the planner may deem less efficient than a table scan. - OR Conditions Fragment Logic: The planner evaluates each
OR
branch independently. If any branch cannot use the index, the entire clause may default to a table scan. This is exacerbated when 13+LIKE
conditions are used, as the planner’s cost estimates skew toward full scans.
2. Boolean Expression Ambiguity
The original query uses a structure like:
WHERE TextColumn LIKE '%A%' OR TextColumn LIKE '%B%' ...
SQLite does not recognize that the entire OR
chain evaluates to a boolean result that matches the index’s WHERE
clause. The index is defined using the same boolean logic, but the planner fails to associate the two. Explicitly casting the OR
conditions as a boolean expression (e.g., (LIKE ... OR ...) = 1
) resolves this ambiguity, enabling index usage.
3. Function-Based Index Handling
SQLite’s support for function-based indexes is limited. Indexes on expressions like INSTR(TextColumn, 'A')
or TextColumn LIKE '%A%'
require the query to reference the exact expression used in the index. Minor syntactic differences (e.g., omitting parentheses or using equivalent functions) prevent recognition. For example:
- Index:
CREATE INDEX idx_instr ON main_table (INSTR(TextColumn, 'A'))
- Query:
WHERE INSTR(TextColumn, 'A') > 0
(uses index) - Query:
WHERE INSTR(TextColumn, 'A')
(ignores index; lacks> 0
)
This rigidity forces developers to mirror index expressions precisely in queries.
4. Statistics and Selectivity Misestimation
The ANALYZE
command generates table statistics (stored in sqlite_stat1
) to help the planner choose indexes. However:
- High Cardinality: If the
TextColumn
contains many unique values, the planner assumes the index has low selectivity forLIKE
patterns, favoring a table scan. - Small Tables: For small tables, the planner may ignore indexes regardless of structure, as scans are faster for minimal data.
Solutions: Ensuring Index Usage for Multi-LIKE Queries
1. Rewrite the Query to Explicitly Reference the Index’s Boolean Logic
Modify the WHERE
clause to treat the OR
chain as a single boolean expression:
-- Original (ignored index)
SELECT TextColumn FROM main_table
WHERE TextColumn LIKE '%A%' OR TextColumn LIKE '%B%';
-- Revised (uses index)
SELECT TextColumn FROM main_table
WHERE (TextColumn LIKE '%A%' OR TextColumn LIKE '%B%') = 1;
By wrapping the OR
conditions and comparing to 1
, SQLite recognizes the expression as a boolean match for the index’s WHERE
clause. This aligns the query’s syntax with the index’s definition, bypassing the planner’s ambiguity.
2. Use Materialized Views or Shadow Tables
For highly static datasets, precompute matches and store them in a shadow table:
-- Shadow table
CREATE TABLE shadow_matches (
main_table_id INTEGER PRIMARY KEY,
has_match BOOLEAN,
FOREIGN KEY (main_table_id) REFERENCES main_table(id)
);
-- Index the boolean column
CREATE INDEX idx_shadow ON shadow_matches (has_match);
-- Populate the table (run after secondary_table updates)
INSERT INTO shadow_matches (main_table_id, has_match)
SELECT id, 1 FROM main_table
WHERE EXISTS (SELECT 1 FROM secondary_table WHERE main_table.TextColumn LIKE '%'||secondary_table.String||'%');
Queries then filter using the shadow table:
SELECT TextColumn FROM main_table
WHERE id IN (SELECT main_table_id FROM shadow_matches WHERE has_match = 1);
This approach shifts computational overhead to the shadow table’s maintenance but guarantees index usage.
3. Combine LIKE Conditions into a Single GLOB or REGEXP (With Caveats)
While GLOB
and REGEXP
have their own limitations, they can sometimes simplify patterns:
-- Using GLOB (case-sensitive, no wildcard flexibility)
WHERE TextColumn GLOB '*A*' OR TextColumn GLOB '*B*';
-- Using REGEXP (requires extension)
WHERE TextColumn REGEXP 'A|B';
However:
GLOB
lacks case-insensitivity without additional functions (e.g.,LOWER(TextColumn) GLOB '*a*'
), which may disable index usage.REGEXP
is not natively supported in SQLite and requires loading an extension (e.g.,ICU
orregexp
).
4. Leverage Partial Indexes with Coverage for All Patterns
Define a partial index that covers all possible LIKE
patterns from secondary_table
:
-- Example for 3 patterns
CREATE INDEX idx_patterns ON main_table (TextColumn)
WHERE TextColumn LIKE '%A%'
OR TextColumn LIKE '%B%'
OR TextColumn LIKE '%C%';
Ensure the index’s WHERE
clause exactly matches the query’s conditions. Rebuild the index when secondary_table
changes.
5. Use INDEXED BY as a Last Resort
If all else fails, force index usage:
SELECT TextColumn FROM main_table
INDEXED BY idx_patterns
WHERE TextColumn LIKE '%A%' OR TextColumn LIKE '%B%';
Risks:
- Breaks if the index is renamed or dropped.
- Bypasses the query planner’s adaptability to future schema changes.
6. Optimize secondary_table for String Lookups
If secondary_table
is static, precompute all possible substrings and store them in a column optimized for joins:
-- Add a column with the pattern '%String%'
ALTER TABLE secondary_table ADD Pattern TEXT;
UPDATE secondary_table SET Pattern = '%' || String || '%';
-- Create a covering index
CREATE INDEX idx_secondary ON secondary_table (Pattern);
-- Revise the query to join using LIKE
SELECT m.TextColumn FROM main_table m
WHERE EXISTS (
SELECT 1 FROM secondary_table s
WHERE m.TextColumn LIKE s.Pattern
);
This allows SQLite to use indexes on secondary_table.Pattern
for faster lookups, though main_table
scans may still occur.
Final Recommendations
- Prefer Boolean Casting: Rewrite
WHERE (LIKE ... OR ...) = 1
to align with index logic. - Profile Query Plans: Use
EXPLAIN QUERY PLAN
to verify index usage and refine conditions. - Benchmark Forced vs. Automatic Indexing: If
INDEXED BY
yields significant gains, document the rationale and risks. - Consider Shadow Tables: For read-heavy workloads, materialized views reduce runtime overhead.
- Monitor secondary_table Changes: Rebuild indexes or shadow tables when source data changes.
By understanding SQLite’s query planner limitations and restructuring queries to match its expectations, developers can achieve optimal performance without resorting to fragile fixes.