Expression Indexes Discarding Subtype Values in SQLite Functions
How Subtype Metadata Is Lost When Using Expression Indexes in SQLite
The core issue arises when custom SQLite functions that utilize value subtypes (metadata attached to values via sqlite3_result_subtype
and sqlite3_value_subtype
) are used in expression indexes. Subtypes are designed to pass transient context between functions during query execution. However, when an expression index precomputes and stores the result of a function that sets a subtype, the subtype metadata is not preserved in the index. This leads to incorrect query results when the optimizer uses the indexed value instead of recomputing the function.
For example, consider two functions:
inner(col)
sets a subtype usingsqlite3_result_subtype
.outer(inner(col))
reads this subtype viasqlite3_value_subtype
.
If an index is created on inner(col)
, queries relying on outer(inner(col))
may return incorrect results because the indexed value of inner(col)
lacks the subtype. The optimizer replaces the computation of inner(col)
with the stored index value, stripping away the subtype. This violates the expectation that subtypes persist through function calls, creating a discrepancy between indexed and non-indexed query execution paths.
Why Expression Indexes Fail to Preserve Subtypes in Function Chains
1. Subtypes Are Ephemeral and Not Stored in Indexes
SQLite’s subtype system is designed for in-memory value propagation, not persistent storage. When a function like inner(col)
sets a subtype using sqlite3_result_subtype
, this metadata exists only in the runtime environment. Indexes store the raw value of the expression (e.g., the integer or text result) but discard subtypes. Consequently, any query using the index retrieves the raw value without the subtype, breaking assumptions made by functions like outer()
that expect the subtype to be present.
2. Optimizer Substitution Ignores Subtype Dependencies
SQLite’s query optimizer replaces expressions with precomputed index values to speed up queries. However, this substitution does not account for subtype dependencies between functions. If inner(col)
is indexed, the optimizer will replace inner(col)
in outer(inner(col))
with the indexed value, unaware that outer()
requires the subtype set by inner()
. This optimization is valid for raw values but invalid when subtypes affect downstream computations.
3. Missing Function Registration Flags for Subtype Awareness
Custom functions that set subtypes must be registered with the SQLITE_SUBTYPE
flag (previously referred to as SQLITE_FUNC_SUBTYPE
in internal code). This flag informs SQLite that the function may set a subtype, altering the optimizer’s behavior. If this flag is omitted, SQLite assumes the function does not interact with subtypes, leading the optimizer to freely substitute indexed values even when subtypes are required by dependent functions.
Correcting Subtype Loss in Indexed Queries: Workarounds and Permanent Fixes
Step 1: Upgrade to SQLite 3.44.0 or Apply the Specific Fix
The root cause was addressed in SQLite version 3.44.0 via check-in ba789a7804ab96d8. This update modifies the optimizer to disable expression substitution when:
- The indexed expression is a function marked with
SQLITE_SUBTYPE
. - The indexed value is used as an argument to another function.
Action:
- Upgrade your SQLite library to version 3.44.0 or newer. Verify the version using
SELECT sqlite_version();
. - If upgrading is impossible, backport the fix by modifying the optimizer logic to check for
SQLITE_SUBTYPE
flags during expression substitution.
Step 2: Annotate Subtype Functions with SQLITE_SUBTYPE
When registering custom functions that call sqlite3_result_subtype()
, explicitly enable the SQLITE_SUBTYPE
flag. This tells SQLite that the function’s output may carry subtype metadata, preventing the optimizer from substituting its value from an index when used as a function parameter.
Example in C:
sqlite3_create_function_v2(
db,
"inner",
1,
SQLITE_UTF8 | SQLITE_SUBTYPE, // Critical flag
NULL,
inner_func,
NULL,
NULL,
NULL
);
Validation:
Compile SQLite with -DSQLITE_STRICT_SUBTYPE=1
. This causes sqlite3_result_subtype()
to raise an error if the function is not registered with SQLITE_SUBTYPE
. Use this to audit your code for missing flags.
Step 3: Avoid Expression Indexes on Subtype-Dependent Functions
If upgrading is not feasible, redesign queries or schemas to avoid indexing expressions that set subtypes. For example, instead of indexing inner(col)
, index col
directly and compute inner(col)
at query time. This ensures subtypes are always generated fresh.
Trade-off:
Sacrificing index optimizations for correctness. Evaluate query performance to determine if this is acceptable.
Step 4: Validate Query Plans with EXPLAIN
Use EXPLAIN QUERY PLAN
to detect whether the optimizer is using an index for a subtype-dependent expression. If the index is used, force recomputation by wrapping the expression in a no-op function or disabling index usage with INDEXED BY
.
Example:
SELECT id FROM my_table
WHERE outer(inner(col)) > 1
INDEXED BY none; -- Disables index usage
Step 5: Update Documentation and Code Comments
Explicitly document subtype dependencies in your codebase. Highlight functions that require SQLITE_SUBTYPE
and flag queries where index usage might interfere with subtypes. This reduces future maintenance risks.
Subtype-Aware Indexing Best Practices
Audit Functions for Subtype Usage:
Review all custom functions to identify those callingsqlite3_result_subtype()
. Ensure every such function is registered withSQLITE_SUBTYPE
.Test Indexed and Non-Indexed Query Paths:
For queries involving subtypes, run tests with and without relevant indexes. Compare results to catch silent failures.Monitor Query Plan Changes:
UseEXPLAIN QUERY PLAN
regularly to detect when new indexes alter the execution path of subtype-dependent queries.Prefer Computed Columns Over Expression Indexes:
SQLite 3.31.0+ supports generated columns. UseVIRTUAL
generated columns to persist subtype-dependent values if feasible, though note that subtypes themselves still cannot be stored.
By addressing optimizer behavior, function registration, and query design, developers can ensure subtypes propagate correctly even when expression indexes are involved. The fix in SQLite 3.44.0 resolves the core issue, but proper function configuration remains critical.