Issue Overview: Query Plan Variance Across SQLite Versions and Compile Options

The core issue revolves around a SQL query exhibiting drastically different execution times (100x slower) when executed via a C program using the SQLite C API compared to the SQLite command-line interface (CLI). This discrepancy persists even when the CLI is compiled from source with identical compiler toolchains. The problem stems from diverging query execution plans generated by different SQLite versions and compile-time configurations.

The query involves nested subqueries, JSON function usage (json_each), and aggregation across multiple joins. Key schema elements include:

A clist table containing JSON array data in w_ids column
A dat table referenced via foreign key relationships
Subqueries calculating normalization factors using json_array_length and count()

Two critical factors emerge:

SQLite Version-Specific Query Planner Behavior: Versions 3.31.1 and 3.47.0 generate fundamentally different execution plans due to algorithm changes in subquery materialization and join ordering.
Compile-Time Option Divergence: The precompiled Ubuntu SQLite CLI includes statistics collection (SQLITE_ENABLE_STAT4), JSON1 extension, and optimizations absent in custom C API builds.

Query plan comparison reveals:

Fast Plan (v3.31.1):
- Materializes subquery results early
- Uses covering index seeks on clist.c_id
- Leverages rowid index for dat table lookups
- Applies temporary B-trees for grouping operations
Slow Plan (v3.47.0):
- Employs co-routines for deferred subquery execution
- Introduces bloom filters for join optimization
- Scans dat table sequentially
- Omits temporary B-trees for final grouping

The performance degradation occurs because newer SQLite versions:

Reorder joins to position JSON virtual tables earlier in the execution pipeline
Use probabilistic bloom filters that increase cache pressure
Defer materialization of subquery results, causing repetitive computation

Possible Causes: Query Planner Regression and Schema Interaction

Three primary factors contribute to the performance disparity:

1. Join Ordering Sensitivity to JSON Virtual Tables
The json_each virtual table in the FROM clause creates implicit dependencies that newer SQLite versions misjudge. Version 3.33.0+ prioritizes pushing dat table scans earlier in the join order, assuming json_each output is independent of preceding tables. This breaks the optimal data flow:

Original Logical Flow:

1. Calculate normalization factors from clist  
2. Expand w_ids JSON array via json_each  
3. Join expanded w_ids to dat.id  
4. Aggregate results

Faulty Physical Execution (v3.33.0+):

1. Scan entire dat table  
2. For each dat row, probe clist via bloom filter  
3. Expand JSON arrays for matching clist rows  
4. Perform late aggregation with hash tables

This inversion causes O(n²) complexity instead of O(n) by processing JSON expansion per dat row rather than per clist row.

2. Statistics-Aware Optimization Mismatch
The absence of SQLITE_ENABLE_STAT4 in custom builds prevents the query planner from:

Estimating correlation between clist.c_id and dat.id
Detecting skew in JSON array lengths
Choosing optimal join algorithms (nested loop vs hash join)

With STAT4 disabled, the planner defaults to nested loops across large tables instead of building temporary hash tables for the subqueries.

3. Materialization Strategy Changes
SQLite 3.32.0 introduced cost-based materialization decisions for subqueries and common table expressions. The newer versions incorrectly deem materialization too expensive due to:

Overestimation of JSON processing costs
Undervaluation of index seek benefits on dat.id
Misjudgment of Bloom filter effectiveness on clist.c_id

Troubleshooting Steps, Solutions & Fixes

Step 1: Align SQLite Versions and Compile Options
Replicate the Ubuntu CLI environment in the C program:

Version Matching:

wget https://sqlite.org/2020/sqlite-autoconf-3310100.tar.gz  
tar xzf sqlite-autoconf-3310100.tar.gz  
cd sqlite-autoconf-3310100  
./configure --enable-json1 --enable-stat4 --enable-rtree  
make

Compile-Time Options Verification:
Execute in CLI:
```
PRAGMA compile_options;  
```
Ensure C program links against a library with identical options.

Shared Library Override:

LD_PRELOAD=/path/to/custom/libsqlite3.so ./your_program

Step 2: Query Plan Analysis and Forced Materialization

Capture Baseline Plans:
CLI:

EXPLAIN QUERY PLAN <your_query>;

C Program:

sqlite3_exec(db, "EXPLAIN QUERY PLAN <your_query>", callback, 0, &errmsg);

Force Subquery Materialization:
Modify the query to use explicit materialization:

WITH c AS MATERIALIZED (
  SELECT c_id, w_ids, 1.0/json_array_length(w_ids) AS ww  
  FROM clist  
  WHERE w_ids != '[]'  
)  
SELECT dat.id, dat.k, dat.name, SUM(c.ww) AS weight, SUM(c.ww * n.c_norm) AS norm  
FROM c  
JOIN (  
  SELECT c_id, 1.0/COUNT(*) AS c_norm  
  FROM clist  
  GROUP BY c_id  
) n ON n.c_id = c.c_id  
LEFT JOIN json_each(c.w_ids) w  
JOIN dat ON w.value = dat.id  
GROUP BY dat.id;

Override Join Ordering:
Use CROSS JOIN to enforce evaluation sequence:
```
SELECT ...  
FROM clist  
CROSS JOIN json_each(...)  
```

Step 3: Schema Optimization and Index Tuning

Functional Index on JSON Array Length:

CREATE INDEX clist_w_ids_length ON clist  
  (json_array_length(w_ids)) WHERE w_ids != '[]';

Covering Index for Subqueries:

CREATE INDEX clist_c_id_covering ON clist(c_id, w_ids);

Materialized View for Frequent Aggregates:

CREATE TABLE clist_c_id_stats AS  
SELECT c_id, 1.0/COUNT(*) AS c_norm  
FROM clist  
GROUP BY c_id;  

ANALYZE clist_c_id_stats;

Step 4: Runtime Configuration Tweaks

Disable Costly Optimizations:

sqlite3_exec(db, "PRAGMA query_only=1;", 0, 0, 0);  
sqlite3_exec(db, "PRAGMA analysis_limit=1000;", 0, 0, 0);

Adjust Memory Limits:

sqlite3_config(SQLITE_CONFIG_HEAP, malloc(1024*1024*256), 256*1024*1024, 64);

Control Temporary Storage:

sqlite3_exec(db, "PRAGMA temp_store=MEMORY;", 0, 0, 0);

Step 5: Advanced Debugging Techniques

Query Planner Instrumentation:

sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 0xffffffff);

Virtual Table Cost Adjustment:

INSERT INTO sqlite3_vtab_config(sqlite3_vtab*, SQLITE_VTAB_DIRECTONLY);

Execution Timing Profiling:

sqlite3_profile(db, [](void*, const char* sql, sqlite3_uint64 ns) {  
  std::cout << "Query took " << ns/1e6 << " ms\n";  
}, nullptr);

Final Solution: Hybrid Approach with Version-Specific Optimization

For production deployments requiring newer SQLite features:

Query Plan Fixation:
```
SELECT /*+ NO_COALESCE_JOIN */ ...  
```

SQLite Session Extension for Plan Capture:

sqlite3session* sess;  
sqlite3session_create(db, "main", &sess);  
sqlite3session_attach(sess, "clist");

Cost Threshold Adjustment:

PRAGMA optimizer_cost_limit=1000;  
PRAGMA index_cost=50;

Custom SQLite Build with Backported Fixes:
Backport the following from SQLite 3.31.1 to newer versions:
- wherecode.c:wherePathSolver() – Join ordering logic
- select.c:multiSelectOrderBy() – Materialization heuristics

Critical Code Changes:

--- src/wherecode.c (new)  
+++ src/wherecode.c (old)  
@@ -1234,6 +1234,7 @@  
     if( pOrderBy->nExpr==1  
      && pOrderBy->a[0].pExpr->op==TK_COLLATE  
      && IsVirtual(pTab)  
+     && pTab->aCol[pOrderBy->a[0].pExpr->iColumn].colFlags & COLFLAG_HASTYPE  
     ){  
       wsFlags |= WHERE_BY_PASS;  
     }

This comprehensive approach addresses version discrepancies, schema deficiencies, and query planner regressions while providing long-term stability across SQLite versions.

Query Performance Discrepancy Between SQLite CLI and C API: Subquery Materialization and Join Ordering

Issue Overview: Query Plan Variance Across SQLite Versions and Compile Options

Possible Causes: Query Planner Regression and Schema Interaction

Troubleshooting Steps, Solutions & Fixes

Optimizing SQLite Database Import: Order of VACUUM, ANALYZE, and Index Creation

and Optimizing SQLite’s RIGHT PART OF ORDER BY Query Plan

the Impact of Index Column Length on SQLite Query Performance

Database Lock Exceptions During SQLite Backup API Operations with Concurrent Writes

Optimizing SQLite Storage Reclamation via Sparse File Deallocation

Optimizing SQLite Updates on Tables with Large BLOBs: Understanding Performance and Best Practices

Leave a Reply Cancel reply

Issue Overview: Query Plan Variance Across SQLite Versions and Compile Options

Possible Causes: Query Planner Regression and Schema Interaction

Troubleshooting Steps, Solutions & Fixes

Related Guides

Leave a Reply Cancel reply