Segmentation Fault in decodeIntArray During Schema Analysis of Corrupted SQLite_stat4 Data
Crash Context: Schema Initialization & decodeIntArray Memory Access Violation
The segmentation fault occurs during SQLite’s schema initialization process when attempting to decode integer array data stored in the SQLITE_STAT4
table. The crash manifests in the decodeIntArray()
function due to an invalid memory write operation at address 0x000000000001
, indicating a null pointer dereference or buffer overflow. The stack trace reveals this happens while loading statistical data for query optimization via sqlite3AnalysisLoad()
, which is part of the automatic schema parsing workflow.
Key contextual details:
- Trigger Command Sequence: The crash reproduces when creating a virtual RTREE table after loading a malformed database file via
.clone
or.open
commands. The malformed file contains corrupted statistics data that bypasses initial consistency checks. - Function Chain:
sqlite3AnalysisLoad()
→loadStat4()
→loadStatTbl()
→decodeIntArray()
. The fault occurs during deserialization of blobs stored inSQLITE_STAT4
, which are expected to contain encoded integer arrays for index statistics. - Memory Violation Type: AddressSanitizer detects a write operation targeting the zero page (low memory address
0x1
), which is always invalid in user-space processes. This suggests the function is calculating an invalid pointer offset or processing a blob with an incorrect format.
Root Candidates: Invalid Blob Structure, Heap Corruption, or STAT4 Configuration
1. Malformed Blob in SQLITE_STAT4
The decodeIntArray()
function expects blobs with a specific header format: a 1-byte flag followed by a variable-length integer (varint) indicating array size. If the blob lacks this header or contains invalid varint encoding, the function miscalculates array dimensions, leading to out-of-bounds writes.
2. Heap Corruption Prior to decodeIntArray Execution
If prior operations (e.g., cloning the malformed database) corrupted the heap, the sqlite3_value
objects passed to decodeIntArray()
might reference invalid memory regions. This could occur if the malform
database has mismatched schema versions or truncated system tables.
3. Incorrect STAT4 Configuration During Compilation
Enabling SQLITE_ENABLE_STAT4
alters how statistical data is stored and parsed. If the malform
database was created without this flag but is opened with it enabled (or vice versa), the SQLITE_STAT4
table’s blob format becomes incompatible with the decoder’s expectations.
4. Race Conditions in Schema Initialization
Though less likely in single-threaded CLI usage, improper synchronization during schema parsing (e.g., concurrent writes to SQLITE_MASTER
) could leave the database in an inconsistent state. This is amplified when cloning databases with .clone
, which may not fully lock the source database.
5. Overflow in Varint Decoding
The decodeIntArray()
function uses SQLite’s internal sqlite3Fts5GetVarint32()
to parse array sizes. If a malformed varint exceeds 32-bit bounds, the decoded size becomes excessively large, causing the subsequent array write loop to exceed buffer limits.
Resolution Workflow: Diagnosing Blob Corruption and Hardening decodeIntArray
Step 1: Validate the SQLITE_STAT4 Blob Structure
Extract the offending blob from the malform
database:
SELECT tbl, idx, stat FROM SQLITE_STAT4 WHERE rowid = <crash_rowid>;
Use a hex editor or sqlite3_blob
APIs to inspect the blob’s header:
- Byte 0: Flags (expected
0x01
for integer arrays). - Bytes 1–5: Varint encoding the array length. Validate using
sqlite3GetVarint32()
. - Remaining Bytes: Array of encoded integers (varints).
If the header is missing or the varint length exceeds the blob’s actual size, the database is corrupt.
Step 2: Reproduce with Minimal STAT4 Data
Create a minimal test case to isolate the corruption:
-- Rebuild the malform database with only essential STAT4 entries
PRAGMA writable_schema=ON;
DELETE FROM SQLITE_STAT4 WHERE rowid != <crash_rowid>;
PRAGMA writable_schema=OFF;
VACUUM;
Reopen the database and attempt to create the RTREE table. If the crash persists, the isolated blob is defective.
Step 3: Patch decodeIntArray with Bounds Checks
Modify sqlite3.c
’s decodeIntArray()
to validate blob size before decoding:
static void decodeIntArray(...) {
int nBlob = sqlite3_value_bytes(pVal);
if( nBlob < 1 ) return; // Early exit on empty blob
const unsigned char *a = sqlite3_value_blob(pVal);
u8 flags = a[0];
int i, n, sz;
a += 1;
nBlob -= 1;
// Decode array size with explicit bounds checks
if( sqlite3Fts5GetVarint32(a, &n, &sz) || (nBlob < sz + n*sizeof(int)) ) {
sqlite3_result_error(pCtx, "Invalid STAT4 blob format", -1);
return;
}
a += sz;
for(i=0; i<n; i++){
sqlite3Fts5GetVarint32(a, &aArray[i], &sz);
a += sz;
}
}
This adds explicit validation for blob length and varint decoding failures.
Step 4: Enable SQLITE_DEBUG and Heap Sanitization
Recompile SQLite with additional debug flags to trace decodeIntArray
execution:
export CFLAGS="-g -DSQLITE_DEBUG -DSQLITE_ENABLE_STAT4 -fsanitize=address"
./configure --enable-debug --disable-shared && make
Run the CLI with ASAN and GDB to capture the exact blob content and pointer state at crash time:
gdb --args ./sqlite3_asan < malform.db
(gdb) break decodeIntArray
(gdb) run
Inspect pVal
(the blob value) and calculate expected array dimensions versus actual blob size.
Step 5: Rebuild Database Statistics
If the SQLITE_STAT4
table is corrupt, regenerate it:
ANALYZE;
-- Or for specific tables/indexes
ANALYZE <table>;
This rebuilds statistical data, replacing malformed blobs with properly encoded ones.
Step 6: Schema Version Validation
Ensure the malform
database’s schema version matches the expected format. Query PRAGMA schema_version;
and compare it with a known-good database. If mismatched, export data and recreate the database:
sqlite3 malform.db ".dump" | sqlite3 repaired.db
Step 7: Update SQLITE_STAT4 Encoding Logic
If the corruption stems from a SQLite bug in STAT4 encoding, patch stat4Push()
in analyze.c
to validate arrays before blob serialization:
static void stat4Push(...) {
// Validate array size before encoding
assert( nSample<=STAT4_SAMPLE_MAX );
// Encode with additional sanity checks
putVarint32(a, nSample*sizeof(int) + 1);
}
Final Recommendation:
The most probable fix involves hardening decodeIntArray()
with rigorous bounds checks and regenerating the SQLITE_STAT4
table. If corruption is systemic, consider disabling SQLITE_ENABLE_STAT4
or implementing a custom stats loader that bypasses the faulty decoder. For production systems, always validate databases after cloning or schema changes.