Segmentation Fault in decodeIntArray During Schema Analysis of Corrupted SQLite_stat4 Data


Crash Context: Schema Initialization & decodeIntArray Memory Access Violation

The segmentation fault occurs during SQLite’s schema initialization process when attempting to decode integer array data stored in the SQLITE_STAT4 table. The crash manifests in the decodeIntArray() function due to an invalid memory write operation at address 0x000000000001, indicating a null pointer dereference or buffer overflow. The stack trace reveals this happens while loading statistical data for query optimization via sqlite3AnalysisLoad(), which is part of the automatic schema parsing workflow.

Key contextual details:

  1. Trigger Command Sequence: The crash reproduces when creating a virtual RTREE table after loading a malformed database file via .clone or .open commands. The malformed file contains corrupted statistics data that bypasses initial consistency checks.
  2. Function Chain: sqlite3AnalysisLoad()loadStat4()loadStatTbl()decodeIntArray(). The fault occurs during deserialization of blobs stored in SQLITE_STAT4, which are expected to contain encoded integer arrays for index statistics.
  3. Memory Violation Type: AddressSanitizer detects a write operation targeting the zero page (low memory address 0x1), which is always invalid in user-space processes. This suggests the function is calculating an invalid pointer offset or processing a blob with an incorrect format.

Root Candidates: Invalid Blob Structure, Heap Corruption, or STAT4 Configuration

1. Malformed Blob in SQLITE_STAT4

The decodeIntArray() function expects blobs with a specific header format: a 1-byte flag followed by a variable-length integer (varint) indicating array size. If the blob lacks this header or contains invalid varint encoding, the function miscalculates array dimensions, leading to out-of-bounds writes.

2. Heap Corruption Prior to decodeIntArray Execution

If prior operations (e.g., cloning the malformed database) corrupted the heap, the sqlite3_value objects passed to decodeIntArray() might reference invalid memory regions. This could occur if the malform database has mismatched schema versions or truncated system tables.

3. Incorrect STAT4 Configuration During Compilation

Enabling SQLITE_ENABLE_STAT4 alters how statistical data is stored and parsed. If the malform database was created without this flag but is opened with it enabled (or vice versa), the SQLITE_STAT4 table’s blob format becomes incompatible with the decoder’s expectations.

4. Race Conditions in Schema Initialization

Though less likely in single-threaded CLI usage, improper synchronization during schema parsing (e.g., concurrent writes to SQLITE_MASTER) could leave the database in an inconsistent state. This is amplified when cloning databases with .clone, which may not fully lock the source database.

5. Overflow in Varint Decoding

The decodeIntArray() function uses SQLite’s internal sqlite3Fts5GetVarint32() to parse array sizes. If a malformed varint exceeds 32-bit bounds, the decoded size becomes excessively large, causing the subsequent array write loop to exceed buffer limits.


Resolution Workflow: Diagnosing Blob Corruption and Hardening decodeIntArray

Step 1: Validate the SQLITE_STAT4 Blob Structure

Extract the offending blob from the malform database:

SELECT tbl, idx, stat FROM SQLITE_STAT4 WHERE rowid = <crash_rowid>;

Use a hex editor or sqlite3_blob APIs to inspect the blob’s header:

  • Byte 0: Flags (expected 0x01 for integer arrays).
  • Bytes 1–5: Varint encoding the array length. Validate using sqlite3GetVarint32().
  • Remaining Bytes: Array of encoded integers (varints).

If the header is missing or the varint length exceeds the blob’s actual size, the database is corrupt.

Step 2: Reproduce with Minimal STAT4 Data

Create a minimal test case to isolate the corruption:

-- Rebuild the malform database with only essential STAT4 entries  
PRAGMA writable_schema=ON;  
DELETE FROM SQLITE_STAT4 WHERE rowid != <crash_rowid>;  
PRAGMA writable_schema=OFF;  
VACUUM;  

Reopen the database and attempt to create the RTREE table. If the crash persists, the isolated blob is defective.

Step 3: Patch decodeIntArray with Bounds Checks

Modify sqlite3.c’s decodeIntArray() to validate blob size before decoding:

static void decodeIntArray(...) {
  int nBlob = sqlite3_value_bytes(pVal);
  if( nBlob < 1 ) return;  // Early exit on empty blob  
  const unsigned char *a = sqlite3_value_blob(pVal);
  u8 flags = a[0];
  int i, n, sz;
  a += 1;
  nBlob -= 1;
  // Decode array size with explicit bounds checks  
  if( sqlite3Fts5GetVarint32(a, &n, &sz) || (nBlob < sz + n*sizeof(int)) ) {
    sqlite3_result_error(pCtx, "Invalid STAT4 blob format", -1);
    return;
  }
  a += sz;
  for(i=0; i<n; i++){
    sqlite3Fts5GetVarint32(a, &aArray[i], &sz);
    a += sz;
  }
}

This adds explicit validation for blob length and varint decoding failures.

Step 4: Enable SQLITE_DEBUG and Heap Sanitization

Recompile SQLite with additional debug flags to trace decodeIntArray execution:

export CFLAGS="-g -DSQLITE_DEBUG -DSQLITE_ENABLE_STAT4 -fsanitize=address"  
./configure --enable-debug --disable-shared && make  

Run the CLI with ASAN and GDB to capture the exact blob content and pointer state at crash time:

gdb --args ./sqlite3_asan < malform.db  
(gdb) break decodeIntArray  
(gdb) run  

Inspect pVal (the blob value) and calculate expected array dimensions versus actual blob size.

Step 5: Rebuild Database Statistics

If the SQLITE_STAT4 table is corrupt, regenerate it:

ANALYZE;  
-- Or for specific tables/indexes  
ANALYZE <table>;  

This rebuilds statistical data, replacing malformed blobs with properly encoded ones.

Step 6: Schema Version Validation

Ensure the malform database’s schema version matches the expected format. Query PRAGMA schema_version; and compare it with a known-good database. If mismatched, export data and recreate the database:

sqlite3 malform.db ".dump" | sqlite3 repaired.db  

Step 7: Update SQLITE_STAT4 Encoding Logic

If the corruption stems from a SQLite bug in STAT4 encoding, patch stat4Push() in analyze.c to validate arrays before blob serialization:

static void stat4Push(...) {
  // Validate array size before encoding  
  assert( nSample<=STAT4_SAMPLE_MAX );  
  // Encode with additional sanity checks  
  putVarint32(a, nSample*sizeof(int) + 1);  
}  

Final Recommendation:

The most probable fix involves hardening decodeIntArray() with rigorous bounds checks and regenerating the SQLITE_STAT4 table. If corruption is systemic, consider disabling SQLITE_ENABLE_STAT4 or implementing a custom stats loader that bypasses the faulty decoder. For production systems, always validate databases after cloning or schema changes.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *