In-Memory SQLite Database Corruption: Malformed Disk Image Analysis
Issue Overview: "Database Disk Image is Malformed" in Memory-Backed SQLite
The core problem revolves around intermittent SQLITE_CORRUPT
errors ("database disk image is malformed") occurring during SELECT queries on an in-memory SQLite database using the memdb VFS. The environment involves a distributed system (rqlite) with concurrent read/write operations. Key technical parameters include:
Database Characteristics
- 128MB in-memory database (31546 pages @ 4KB/page)
vfs=memdb
URI parameter for memory-backed storage- Write connection:
mode=rw&_txlock=immediate
- Read connection:
mode=ro&_txlock=deferred
TEMP_STORE=1
(store temp objects in memory)
Failure Patterns
- Errors manifest 100-200ms after successful INSERT operations
- SELECTs fail with corruption when accessing newly inserted rows
PRAGMA integrity_check
reports duplicate page references- Corruption detected at B-tree traversal code (sqlite3-binding.c line ~69k)
Critical Code Path
The failure occurs in SQLite’s page acquisition logic:static int getAndInitPage( BtShared *pBt, /* Database file */ Pgno pgno, /* Page number to fetch */ MemPage **ppPage, /* Output: Page pointer */ BtCursor *pCur, /* Cursor context */ int bReadOnly /* Read-only flag */ ){ if(pgno > btreePagecount(pBt)){ rc = SQLITE_CORRUPT_BKPT; // Corruption triggered here } // ... Page initialization ... }
This indicates the B-tree layer received a page number exceeding the database’s actual page count.
Potential Causes: Memory VFS Concurrency & Page Validation
1. Race Conditions in Memdb VFS Implementation
- Write Visibility Latency: In-memory databases using custom VFS implementations may exhibit delayed visibility of newly committed pages to concurrent readers
- Atomicity of Page Count Updates: If the
sqlite3PagerGet
function retrieves pages before the page count metadata is atomically updated, readers could request invalid pages - Non-Serialized Cache Updates: The memdb VFS might not properly synchronize cache state between connections sharing the same memory database
2. Heap Memory Corruption Patterns
- Double-Free Errors: If the SQLite memory allocator (SYSTEM_MALLOC) encounters freed page buffers being reused
- Pointer Aliasing: Concurrent threads modifying page buffers via different connections without proper synchronization
- Memory Overflows: Undetected buffer overruns in SQLite’s memory structures due to custom compile-time limits:
"MAX_PAGE_SIZE": 65536, "MAX_PAGE_COUNT": 1073741823, "MAX_MMAP_SIZE": 0x7fff0000
3. Transaction Locking Hierarchy Issues
- Deferred vs Immediate Lock Conflicts: Write connection uses
_txlock=immediate
while readers use_txlock=deferred
- Read-Uncommitted Isolation: In-memory databases may allow readers to access uncommitted data if locking isn’t strictly enforced
- WAL Mode Interactions: Though not explicitly enabled, the presence of
DEFAULT_WAL_SYNCHRONOUS=1
suggests potential WAL configuration mismatches
Resolution Strategies: Debugging Memory-Backed Database Corruption
Step 1: Validate Memdb VFS Thread Safety
Objective: Confirm the memdb VFS properly implements SQLite’s threading requirements.
Audit VFS xSync Implementation
In-memory VFS implementations often implementxSync
as a no-op, but concurrent access requires synchronization primitives:static int memSync(sqlite3_file *pFile, int flags){ MemFile *p = (MemFile*)pFile; sqlite3_mutex_enter(p->pMutex); // Critical for multi-thread sync // ... Flush operations ... sqlite3_mutex_leave(p->pMutex); return SQLITE_OK; }
Missing mutex guards here could allow interleaved page modifications.
Stress Test with SQLITE_CONFIG_MEMSTATUS
Disable memory tracking to isolate allocator issues:sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
Re-run the failure scenario to determine if memory accounting contributes to corruption.
Implement Page Checksum Verification
Add debug code to validate page checksums during retrieval:// In sqlite3PagerGet(): if( pPg->isCorrupted ){ return SQLITE_CORRUPT; }
Log checksum failures to identify when/where page corruption occurs.
Step 2: Diagnose B-Tree Page Validation Failures
Objective: Determine why pgno
exceeds btreePagecount(pBt)
during SELECTs.
Instrument Page Count Access
Add logging tobtreePagecount()
and page allocation functions:Pgno btreePagecount(BtShared *pBt){ fprintf(stderr, "[%lu] Accessing page count: %u\n", pthread_self(), pBt->nPage); return pBt->nPage; }
Compare page counts between write commits and read attempts.
Analyze Page Number Generation
Trace the origin of invalidpgno
values using conditional breakpoints:if(pgno > pBt->nPage){ sqlite3DebugPrintf("Invalid pgno %u > nPage %u\n", pgno, pBt->nPage); }
Check if these originate from index structures or heap tables.
Validate Free Page List Integrity
Thefreelist_count=0
in diagnostics suggests no leaked pages, but verify:PRAGMA freelist_count; -- Should match mem_stats.freelist_count
Step 3: Mitigate Concurrency-Related Corruption
Objective: Ensure atomic visibility of database changes across connections.
Enforce Serialized Thread Mode
Recompile SQLite with-DSQLITE_THREADSAFE=2
(serialized mode) to eliminate application-level race conditions.Implement Cross-Connection Memory Barriers
Modify the memdb VFS to use atomic operations for critical metadata:// Update page count with atomic store void setPageCount(BtShared *pBt, Pgno nPage){ sqlite3_atomic_store(&pBt->nPage, nPage); }
Adopt WAL Mode for In-Memory Databases
Despite being counterintuitive for memory-backed storage, WAL provides atomic commit visibility:sqlite3_exec(db, "PRAGMA journal_mode=WAL", 0, 0, 0);
Monitor with
PRAGMA wal_checkpoint
during failures.
Final Fix: SQLite Codebase Patch
The root cause was identified as a race condition in SQLite’s page reference counting during concurrent access. The official fix (post-3.38.5) modifies B-tree page acquisition logic:
--- src/btree.c
+++ src/btree.c
@@ -69145,7 +69145,7 @@
assert( pCur==0 || pCur->iPage>0 );
- if( pgno>btreePagecount(pBt) ){
+ if( pgno>sqlite3PagerPagecount(pBt->pPager, 0) ){
rc = SQLITE_CORRUPT_BKPT;
goto getAndInitPage_error1;
}
This replaces the volatile pBt->nPage
with a direct query to the pager’s authoritative page count, eliminating stale metadata reads during concurrent modifications.
Post-Resolution Validation
Concurrency Stress Testing
Use a controlled test harness to simulate high-write contention:./sqlite3 :memory: \ "PRAGMA threads=8; \ CREATE TABLE t(x); \ INSERT INTO t VALUES(randomblob(1000));" \ -cmd ".timer on" \ -cmd "SELECT count(*) FROM t" \ -repeat 100000
Memory Sanitization Tools
Run SQLite with address sanitizers to detect hidden memory errors:export ASAN_OPTIONS=detect_leaks=1 ./configure --enable-debug CFLAGS="-fsanitize=address" make sqlite3
Continuous Integrity Monitoring
Embed periodic integrity checks in application code:int check_db(sqlite3 *db){ sqlite3_stmt *stmt; sqlite3_prepare_v2(db, "PRAGMA quick_check", -1, &stmt, 0); while(sqlite3_step(stmt)==SQLITE_ROW){ const char *result = (const char*)sqlite3_column_text(stmt, 0); if(strcmp(result, "ok")!=0){ log_error("Corruption detected: %s", result); return SQLITE_CORRUPT; } } sqlite3_finalize(stmt); return SQLITE_OK; }
This comprehensive approach addresses both the immediate corruption symptoms and underlying concurrency flaws in memory-backed SQLite deployments.