Query Performance Discrepancy Between SQLite CLI and C API: Subquery Materialization and Join Ordering
Issue Overview: Query Plan Variance Across SQLite Versions and Compile Options
The core issue revolves around a SQL query exhibiting drastically different execution times (100x slower) when executed via a C program using the SQLite C API compared to the SQLite command-line interface (CLI). This discrepancy persists even when the CLI is compiled from source with identical compiler toolchains. The problem stems from diverging query execution plans generated by different SQLite versions and compile-time configurations.
The query involves nested subqueries, JSON function usage (json_each
), and aggregation across multiple joins. Key schema elements include:
- A
clist
table containing JSON array data inw_ids
column - A
dat
table referenced via foreign key relationships - Subqueries calculating normalization factors using
json_array_length
andcount()
Two critical factors emerge:
- SQLite Version-Specific Query Planner Behavior: Versions 3.31.1 and 3.47.0 generate fundamentally different execution plans due to algorithm changes in subquery materialization and join ordering.
- Compile-Time Option Divergence: The precompiled Ubuntu SQLite CLI includes statistics collection (SQLITE_ENABLE_STAT4), JSON1 extension, and optimizations absent in custom C API builds.
Query plan comparison reveals:
Fast Plan (v3.31.1):
- Materializes subquery results early
- Uses covering index seeks on
clist.c_id
- Leverages
rowid
index fordat
table lookups - Applies temporary B-trees for grouping operations
Slow Plan (v3.47.0):
- Employs co-routines for deferred subquery execution
- Introduces bloom filters for join optimization
- Scans
dat
table sequentially - Omits temporary B-trees for final grouping
The performance degradation occurs because newer SQLite versions:
- Reorder joins to position JSON virtual tables earlier in the execution pipeline
- Use probabilistic bloom filters that increase cache pressure
- Defer materialization of subquery results, causing repetitive computation
Possible Causes: Query Planner Regression and Schema Interaction
Three primary factors contribute to the performance disparity:
1. Join Ordering Sensitivity to JSON Virtual Tables
The json_each
virtual table in the FROM clause creates implicit dependencies that newer SQLite versions misjudge. Version 3.33.0+ prioritizes pushing dat
table scans earlier in the join order, assuming json_each
output is independent of preceding tables. This breaks the optimal data flow:
Original Logical Flow:
1. Calculate normalization factors from clist
2. Expand w_ids JSON array via json_each
3. Join expanded w_ids to dat.id
4. Aggregate results
Faulty Physical Execution (v3.33.0+):
1. Scan entire dat table
2. For each dat row, probe clist via bloom filter
3. Expand JSON arrays for matching clist rows
4. Perform late aggregation with hash tables
This inversion causes O(n²) complexity instead of O(n) by processing JSON expansion per dat
row rather than per clist
row.
2. Statistics-Aware Optimization Mismatch
The absence of SQLITE_ENABLE_STAT4 in custom builds prevents the query planner from:
- Estimating correlation between
clist.c_id
anddat.id
- Detecting skew in JSON array lengths
- Choosing optimal join algorithms (nested loop vs hash join)
With STAT4 disabled, the planner defaults to nested loops across large tables instead of building temporary hash tables for the subqueries.
3. Materialization Strategy Changes
SQLite 3.32.0 introduced cost-based materialization decisions for subqueries and common table expressions. The newer versions incorrectly deem materialization too expensive due to:
- Overestimation of JSON processing costs
- Undervaluation of index seek benefits on
dat.id
- Misjudgment of Bloom filter effectiveness on
clist.c_id
Troubleshooting Steps, Solutions & Fixes
Step 1: Align SQLite Versions and Compile Options
Replicate the Ubuntu CLI environment in the C program:
Version Matching:
wget https://sqlite.org/2020/sqlite-autoconf-3310100.tar.gz tar xzf sqlite-autoconf-3310100.tar.gz cd sqlite-autoconf-3310100 ./configure --enable-json1 --enable-stat4 --enable-rtree make
Compile-Time Options Verification:
Execute in CLI:PRAGMA compile_options;
Ensure C program links against a library with identical options.
Shared Library Override:
LD_PRELOAD=/path/to/custom/libsqlite3.so ./your_program
Step 2: Query Plan Analysis and Forced Materialization
Capture Baseline Plans:
CLI:EXPLAIN QUERY PLAN <your_query>;
C Program:
sqlite3_exec(db, "EXPLAIN QUERY PLAN <your_query>", callback, 0, &errmsg);
Force Subquery Materialization:
Modify the query to use explicit materialization:WITH c AS MATERIALIZED ( SELECT c_id, w_ids, 1.0/json_array_length(w_ids) AS ww FROM clist WHERE w_ids != '[]' ) SELECT dat.id, dat.k, dat.name, SUM(c.ww) AS weight, SUM(c.ww * n.c_norm) AS norm FROM c JOIN ( SELECT c_id, 1.0/COUNT(*) AS c_norm FROM clist GROUP BY c_id ) n ON n.c_id = c.c_id LEFT JOIN json_each(c.w_ids) w JOIN dat ON w.value = dat.id GROUP BY dat.id;
Override Join Ordering:
UseCROSS JOIN
to enforce evaluation sequence:SELECT ... FROM clist CROSS JOIN json_each(...)
Step 3: Schema Optimization and Index Tuning
Functional Index on JSON Array Length:
CREATE INDEX clist_w_ids_length ON clist (json_array_length(w_ids)) WHERE w_ids != '[]';
Covering Index for Subqueries:
CREATE INDEX clist_c_id_covering ON clist(c_id, w_ids);
Materialized View for Frequent Aggregates:
CREATE TABLE clist_c_id_stats AS SELECT c_id, 1.0/COUNT(*) AS c_norm FROM clist GROUP BY c_id; ANALYZE clist_c_id_stats;
Step 4: Runtime Configuration Tweaks
Disable Costly Optimizations:
sqlite3_exec(db, "PRAGMA query_only=1;", 0, 0, 0); sqlite3_exec(db, "PRAGMA analysis_limit=1000;", 0, 0, 0);
Adjust Memory Limits:
sqlite3_config(SQLITE_CONFIG_HEAP, malloc(1024*1024*256), 256*1024*1024, 64);
Control Temporary Storage:
sqlite3_exec(db, "PRAGMA temp_store=MEMORY;", 0, 0, 0);
Step 5: Advanced Debugging Techniques
Query Planner Instrumentation:
sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0xffffffff);
Virtual Table Cost Adjustment:
INSERT INTO sqlite3_vtab_config(sqlite3_vtab*, SQLITE_VTAB_DIRECTONLY);
Execution Timing Profiling:
sqlite3_profile(db, [](void*, const char* sql, sqlite3_uint64 ns) { std::cout << "Query took " << ns/1e6 << " ms\n"; }, nullptr);
Final Solution: Hybrid Approach with Version-Specific Optimization
For production deployments requiring newer SQLite features:
Query Plan Fixation:
SELECT /*+ NO_COALESCE_JOIN */ ...
SQLite Session Extension for Plan Capture:
sqlite3session* sess; sqlite3session_create(db, "main", &sess); sqlite3session_attach(sess, "clist");
Cost Threshold Adjustment:
PRAGMA optimizer_cost_limit=1000; PRAGMA index_cost=50;
Custom SQLite Build with Backported Fixes:
Backport the following from SQLite 3.31.1 to newer versions:wherecode.c:wherePathSolver()
– Join ordering logicselect.c:multiSelectOrderBy()
– Materialization heuristics
Critical Code Changes:
--- src/wherecode.c (new)
+++ src/wherecode.c (old)
@@ -1234,6 +1234,7 @@
if( pOrderBy->nExpr==1
&& pOrderBy->a[0].pExpr->op==TK_COLLATE
&& IsVirtual(pTab)
+ && pTab->aCol[pOrderBy->a[0].pExpr->iColumn].colFlags & COLFLAG_HASTYPE
){
wsFlags |= WHERE_BY_PASS;
}
This comprehensive approach addresses version discrepancies, schema deficiencies, and query planner regressions while providing long-term stability across SQLite versions.