Controlling Case Sensitivity in SQLite LIKE Queries Dynamically
SQLite LIKE Operator Case Sensitivity Mechanics and Limitations
The SQLite LIKE
operator’s default case sensitivity behavior depends on compile-time options and runtime pragmas. By default, when SQLite is built without ICU (International Components for Unicode) support, the LIKE
operator performs case-insensitive matches for ASCII characters only, while treating Unicode characters as case-sensitive. This behavior can be altered using the PRAGMA case_sensitive_like
setting. When this pragma is set to true
, the LIKE
operator becomes case-sensitive for all characters. However, this pragma has been deprecated as of SQLite 3.39.0 and may be excluded from future builds via the SQLITE_OMIT_DEPRECATED
flag.
A critical limitation arises when attempting to override case sensitivity for individual queries. Unlike the COLLATE NOCASE
clause, which works with equality comparisons (=
) and IN
operators, the COLLATE
modifier does not influence the LIKE
operator’s behavior. For example:
SELECT * FROM t WHERE a COLLATE NOCASE LIKE 'TeXt%';
This query will not enforce case insensitivity for the LIKE
operation. The COLLATE
clause here applies only to the comparison of the column a
itself, not to the pattern matching logic of LIKE
.
The deprecated PRAGMA case_sensitive_like
affects the entire database connection globally. Changing this pragma alters the behavior of all subsequent LIKE
operations within that connection until it is reset. This creates challenges for applications requiring per-query control over case sensitivity, particularly in multi-threaded environments where a single connection might handle concurrent requests.
Obstacles to Dynamic Case Sensitivity Configuration
1. Global Scope of PRAGMA case_sensitive_like
The PRAGMA case_sensitive_like
setting is connection-specific and not bound to individual statements or transactions. If an application toggles this pragma frequently, race conditions may occur when multiple threads or operations share the same database connection. For instance:
- Thread A sets
PRAGMA case_sensitive_like = true
for a case-sensitive query. - Before Thread A completes, Thread B sets
PRAGMA case_sensitive_like = false
for a case-insensitive query. - Thread A’s subsequent
LIKE
operations now inadvertently use case-insensitive matching.
2. Collation Sequences and LIKE
Incompatibility
SQLite’s COLLATE
clauses modify how string comparisons are performed for operators like =
and IN
, but they do not extend to the LIKE
operator’s pattern-matching logic. This is because LIKE
uses a fixed collation sequence determined at compile time or via the case_sensitive_like
pragma. Consequently, the following attempts to force case insensitivity will fail:
-- These do NOT work:
SELECT * FROM t WHERE a LIKE 'text%' COLLATE NOCASE;
SELECT * FROM t WHERE (a COLLATE NOCASE) LIKE 'text%';
3. Deprecation of case_sensitive_like
Pragma
The deprecation of PRAGMA case_sensitive_like
signals that SQLite’s maintainers discourage its use. Applications relying on this pragma risk future incompatibility, especially if SQLite is compiled with SQLITE_OMIT_DEPRECATED
. Developers must adopt alternative strategies to avoid dependency on deprecated features.
4. Unicode Case Folding Limitations
SQLite’s built-in case folding only applies to ASCII characters (a-z, A-Z). For Unicode characters outside this range (e.g., accented letters or Cyrillic script), LIKE
and GLOB
operators treat them as case-sensitive unless ICU extensions are integrated. This complicates applications requiring locale-aware case insensitivity.
Solutions for Per-Query Case Sensitivity Control
1. Replace LIKE
with GLOB
for Case-Sensitive Matching
The GLOB
operator in SQLite is inherently case-sensitive and uses Unix-style wildcards (*
for any string, ?
for a single character). To achieve case-sensitive pattern matching without altering pragmas:
-- Case-sensitive match for 'Foo' at start:
SELECT * FROM t WHERE a GLOB 'F*';
For case-insensitive matches using GLOB
, convert both the column and search term to lowercase:
SELECT * FROM t WHERE LOWER(a) GLOB LOWER('TeXt') || '*';
Caveats:
GLOB
does not support%
or_
wildcards; use*
and?
instead.- Using functions like
LOWER()
on indexed columns prevents index utilization, degrading performance on large datasets.
2. Canonical Column Duplication with Indexing
Create a shadow column storing a case-folded version of the original data. For example:
ALTER TABLE t ADD COLUMN a_canonical TEXT;
UPDATE t SET a_canonical = LOWER(a);
CREATE INDEX idx_t_a_canonical ON t(a_canonical);
Perform case-insensitive searches on the canonical column:
SELECT * FROM t WHERE a_canonical LIKE LOWER('TeXt') || '%';
This approach allows index-assisted searches while decoupling case sensitivity from the original data.
3. User-Defined Functions for Custom Collation
SQLite allows registering custom collation sequences or functions. For example, create a LIKENOCASE
function:
// C API example:
void likenocase(sqlite3_context *ctx, int argc, sqlite3_value **argv) {
const char *col = (const char*)sqlite3_value_text(argv[0]);
const char *pattern = (const char*)sqlite3_value_text(argv[1]);
// Implement case-insensitive matching logic here
// ...
}
sqlite3_create_function(db, "LIKENOCASE", 2, SQLITE_UTF8, NULL, likenocase, NULL, NULL);
Use the function in queries:
SELECT * FROM t WHERE LIKENOCASE(a, 'TeXt%');
Advantages:
- Full control over matching logic.
- Avoids pragma-related issues.
4. Conditional Pragma Toggling with Connection Pooling
If using PRAGMA case_sensitive_like
is unavoidable, mitigate race conditions by:
- Dedicating a connection per thread: Ensure each thread has its own database connection with independent pragma settings.
- Wrapping pragma changes in transactions:
BEGIN IMMEDIATE;
PRAGMA case_sensitive_like = true;
SELECT * FROM t WHERE a LIKE 'Foo%';
PRAGMA case_sensitive_like = false;
COMMIT;
Note: This does not isolate pragma changes from other operations on the same connection.
5. ICU Extension for Unicode-Aware Case Folding
Compile SQLite with the ICU extension to enable Unicode-aware case folding. This allows LIKE
and GLOB
to handle non-ASCII characters case-insensitively. After enabling ICU, use the ICU
collation:
SELECT * FROM t WHERE a LIKE 'ït%' COLLATE ICU;
Implementation Steps:
- Download the ICU library and SQLite ICU extension.
- Compile SQLite with
-DSQLITE_ENABLE_ICU
. - Load the ICU extension at runtime:
SELECT load_extension('icu');
6. Hybrid Approach for Partial Word Matching
For scenarios requiring case-sensitive matching of word fragments (e.g., Bob%
matching "Bob" but not "bob"):
SELECT * FROM t
WHERE (a GLOB 'Bob*' AND a LIKE 'Bob%')
OR (a GLOB 'B*b*' AND a LIKE '%b%');
This combines GLOB
for case-sensitive prefix matching and LIKE
for broader pattern inclusion.
Final Recommendations
- Avoid
PRAGMA case_sensitive_like
: Given its deprecation, useGLOB
or canonical columns instead. - Prefer Canonical Columns for Indexing: Essential for performance-critical applications with large datasets.
- Use ICU for Unicode Support: When dealing with non-ASCII characters, ICU ensures correct linguistic case folding.
- Isolate Connections in Multi-Threaded Apps: Prevent pragma conflicts by assigning dedicated connections to threads.
By adopting these strategies, developers can achieve flexible, per-query control over case sensitivity in SQLite without relying on deprecated features or risking concurrency issues.