Expression Indexes Discarding Subtype Values in SQLite Functions


How Subtype Metadata Is Lost When Using Expression Indexes in SQLite

The core issue arises when custom SQLite functions that utilize value subtypes (metadata attached to values via sqlite3_result_subtype and sqlite3_value_subtype) are used in expression indexes. Subtypes are designed to pass transient context between functions during query execution. However, when an expression index precomputes and stores the result of a function that sets a subtype, the subtype metadata is not preserved in the index. This leads to incorrect query results when the optimizer uses the indexed value instead of recomputing the function.

For example, consider two functions:

  • inner(col) sets a subtype using sqlite3_result_subtype.
  • outer(inner(col)) reads this subtype via sqlite3_value_subtype.

If an index is created on inner(col), queries relying on outer(inner(col)) may return incorrect results because the indexed value of inner(col) lacks the subtype. The optimizer replaces the computation of inner(col) with the stored index value, stripping away the subtype. This violates the expectation that subtypes persist through function calls, creating a discrepancy between indexed and non-indexed query execution paths.


Why Expression Indexes Fail to Preserve Subtypes in Function Chains

1. Subtypes Are Ephemeral and Not Stored in Indexes

SQLite’s subtype system is designed for in-memory value propagation, not persistent storage. When a function like inner(col) sets a subtype using sqlite3_result_subtype, this metadata exists only in the runtime environment. Indexes store the raw value of the expression (e.g., the integer or text result) but discard subtypes. Consequently, any query using the index retrieves the raw value without the subtype, breaking assumptions made by functions like outer() that expect the subtype to be present.

2. Optimizer Substitution Ignores Subtype Dependencies

SQLite’s query optimizer replaces expressions with precomputed index values to speed up queries. However, this substitution does not account for subtype dependencies between functions. If inner(col) is indexed, the optimizer will replace inner(col) in outer(inner(col)) with the indexed value, unaware that outer() requires the subtype set by inner(). This optimization is valid for raw values but invalid when subtypes affect downstream computations.

3. Missing Function Registration Flags for Subtype Awareness

Custom functions that set subtypes must be registered with the SQLITE_SUBTYPE flag (previously referred to as SQLITE_FUNC_SUBTYPE in internal code). This flag informs SQLite that the function may set a subtype, altering the optimizer’s behavior. If this flag is omitted, SQLite assumes the function does not interact with subtypes, leading the optimizer to freely substitute indexed values even when subtypes are required by dependent functions.


Correcting Subtype Loss in Indexed Queries: Workarounds and Permanent Fixes

Step 1: Upgrade to SQLite 3.44.0 or Apply the Specific Fix

The root cause was addressed in SQLite version 3.44.0 via check-in ba789a7804ab96d8. This update modifies the optimizer to disable expression substitution when:

  • The indexed expression is a function marked with SQLITE_SUBTYPE.
  • The indexed value is used as an argument to another function.

Action:

  • Upgrade your SQLite library to version 3.44.0 or newer. Verify the version using SELECT sqlite_version();.
  • If upgrading is impossible, backport the fix by modifying the optimizer logic to check for SQLITE_SUBTYPE flags during expression substitution.

Step 2: Annotate Subtype Functions with SQLITE_SUBTYPE

When registering custom functions that call sqlite3_result_subtype(), explicitly enable the SQLITE_SUBTYPE flag. This tells SQLite that the function’s output may carry subtype metadata, preventing the optimizer from substituting its value from an index when used as a function parameter.

Example in C:

sqlite3_create_function_v2(
  db, 
  "inner", 
  1, 
  SQLITE_UTF8 | SQLITE_SUBTYPE,  // Critical flag
  NULL, 
  inner_func, 
  NULL, 
  NULL, 
  NULL
);

Validation:
Compile SQLite with -DSQLITE_STRICT_SUBTYPE=1. This causes sqlite3_result_subtype() to raise an error if the function is not registered with SQLITE_SUBTYPE. Use this to audit your code for missing flags.

Step 3: Avoid Expression Indexes on Subtype-Dependent Functions

If upgrading is not feasible, redesign queries or schemas to avoid indexing expressions that set subtypes. For example, instead of indexing inner(col), index col directly and compute inner(col) at query time. This ensures subtypes are always generated fresh.

Trade-off:
Sacrificing index optimizations for correctness. Evaluate query performance to determine if this is acceptable.

Step 4: Validate Query Plans with EXPLAIN

Use EXPLAIN QUERY PLAN to detect whether the optimizer is using an index for a subtype-dependent expression. If the index is used, force recomputation by wrapping the expression in a no-op function or disabling index usage with INDEXED BY.

Example:

SELECT id FROM my_table 
WHERE outer(inner(col)) > 1 
INDEXED BY none;  -- Disables index usage

Step 5: Update Documentation and Code Comments

Explicitly document subtype dependencies in your codebase. Highlight functions that require SQLITE_SUBTYPE and flag queries where index usage might interfere with subtypes. This reduces future maintenance risks.


Subtype-Aware Indexing Best Practices

  1. Audit Functions for Subtype Usage:
    Review all custom functions to identify those calling sqlite3_result_subtype(). Ensure every such function is registered with SQLITE_SUBTYPE.

  2. Test Indexed and Non-Indexed Query Paths:
    For queries involving subtypes, run tests with and without relevant indexes. Compare results to catch silent failures.

  3. Monitor Query Plan Changes:
    Use EXPLAIN QUERY PLAN regularly to detect when new indexes alter the execution path of subtype-dependent queries.

  4. Prefer Computed Columns Over Expression Indexes:
    SQLite 3.31.0+ supports generated columns. Use VIRTUAL generated columns to persist subtype-dependent values if feasible, though note that subtypes themselves still cannot be stored.


By addressing optimizer behavior, function registration, and query design, developers can ensure subtypes propagate correctly even when expression indexes are involved. The fix in SQLite 3.44.0 resolves the core issue, but proper function configuration remains critical.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *