In-Memory SQLite Database Corruption: Malformed Disk Image Analysis

Issue Overview: "Database Disk Image is Malformed" in Memory-Backed SQLite

The core problem revolves around intermittent SQLITE_CORRUPT errors ("database disk image is malformed") occurring during SELECT queries on an in-memory SQLite database using the memdb VFS. The environment involves a distributed system (rqlite) with concurrent read/write operations. Key technical parameters include:

  1. Database Characteristics

    • 128MB in-memory database (31546 pages @ 4KB/page)
    • vfs=memdb URI parameter for memory-backed storage
    • Write connection: mode=rw&_txlock=immediate
    • Read connection: mode=ro&_txlock=deferred
    • TEMP_STORE=1 (store temp objects in memory)
  2. Failure Patterns

    • Errors manifest 100-200ms after successful INSERT operations
    • SELECTs fail with corruption when accessing newly inserted rows
    • PRAGMA integrity_check reports duplicate page references
    • Corruption detected at B-tree traversal code (sqlite3-binding.c line ~69k)
  3. Critical Code Path
    The failure occurs in SQLite’s page acquisition logic:

    static int getAndInitPage(
      BtShared *pBt,         /* Database file */
      Pgno pgno,             /* Page number to fetch */
      MemPage **ppPage,      /* Output: Page pointer */
      BtCursor *pCur,        /* Cursor context */
      int bReadOnly          /* Read-only flag */
    ){
      if(pgno > btreePagecount(pBt)){
        rc = SQLITE_CORRUPT_BKPT;  // Corruption triggered here
      }
      // ... Page initialization ...
    }
    

    This indicates the B-tree layer received a page number exceeding the database’s actual page count.

Potential Causes: Memory VFS Concurrency & Page Validation

1. Race Conditions in Memdb VFS Implementation

  • Write Visibility Latency: In-memory databases using custom VFS implementations may exhibit delayed visibility of newly committed pages to concurrent readers
  • Atomicity of Page Count Updates: If the sqlite3PagerGet function retrieves pages before the page count metadata is atomically updated, readers could request invalid pages
  • Non-Serialized Cache Updates: The memdb VFS might not properly synchronize cache state between connections sharing the same memory database

2. Heap Memory Corruption Patterns

  • Double-Free Errors: If the SQLite memory allocator (SYSTEM_MALLOC) encounters freed page buffers being reused
  • Pointer Aliasing: Concurrent threads modifying page buffers via different connections without proper synchronization
  • Memory Overflows: Undetected buffer overruns in SQLite’s memory structures due to custom compile-time limits:
    "MAX_PAGE_SIZE": 65536,
    "MAX_PAGE_COUNT": 1073741823,
    "MAX_MMAP_SIZE": 0x7fff0000
    

3. Transaction Locking Hierarchy Issues

  • Deferred vs Immediate Lock Conflicts: Write connection uses _txlock=immediate while readers use _txlock=deferred
  • Read-Uncommitted Isolation: In-memory databases may allow readers to access uncommitted data if locking isn’t strictly enforced
  • WAL Mode Interactions: Though not explicitly enabled, the presence of DEFAULT_WAL_SYNCHRONOUS=1 suggests potential WAL configuration mismatches

Resolution Strategies: Debugging Memory-Backed Database Corruption

Step 1: Validate Memdb VFS Thread Safety

Objective: Confirm the memdb VFS properly implements SQLite’s threading requirements.

  1. Audit VFS xSync Implementation
    In-memory VFS implementations often implement xSync as a no-op, but concurrent access requires synchronization primitives:

    static int memSync(sqlite3_file *pFile, int flags){
      MemFile *p = (MemFile*)pFile;
      sqlite3_mutex_enter(p->pMutex);  // Critical for multi-thread sync
      // ... Flush operations ...
      sqlite3_mutex_leave(p->pMutex);
      return SQLITE_OK;
    }
    

    Missing mutex guards here could allow interleaved page modifications.

  2. Stress Test with SQLITE_CONFIG_MEMSTATUS
    Disable memory tracking to isolate allocator issues:

    sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
    

    Re-run the failure scenario to determine if memory accounting contributes to corruption.

  3. Implement Page Checksum Verification
    Add debug code to validate page checksums during retrieval:

    // In sqlite3PagerGet():
    if( pPg->isCorrupted ){
      return SQLITE_CORRUPT;
    }
    

    Log checksum failures to identify when/where page corruption occurs.

Step 2: Diagnose B-Tree Page Validation Failures

Objective: Determine why pgno exceeds btreePagecount(pBt) during SELECTs.

  1. Instrument Page Count Access
    Add logging to btreePagecount() and page allocation functions:

    Pgno btreePagecount(BtShared *pBt){
      fprintf(stderr, "[%lu] Accessing page count: %u\n", 
              pthread_self(), pBt->nPage);
      return pBt->nPage;
    }
    

    Compare page counts between write commits and read attempts.

  2. Analyze Page Number Generation
    Trace the origin of invalid pgno values using conditional breakpoints:

    if(pgno > pBt->nPage){
      sqlite3DebugPrintf("Invalid pgno %u > nPage %u\n", pgno, pBt->nPage);
    }
    

    Check if these originate from index structures or heap tables.

  3. Validate Free Page List Integrity
    The freelist_count=0 in diagnostics suggests no leaked pages, but verify:

    PRAGMA freelist_count;  -- Should match mem_stats.freelist_count
    

Step 3: Mitigate Concurrency-Related Corruption

Objective: Ensure atomic visibility of database changes across connections.

  1. Enforce Serialized Thread Mode
    Recompile SQLite with -DSQLITE_THREADSAFE=2 (serialized mode) to eliminate application-level race conditions.

  2. Implement Cross-Connection Memory Barriers
    Modify the memdb VFS to use atomic operations for critical metadata:

    // Update page count with atomic store
    void setPageCount(BtShared *pBt, Pgno nPage){
      sqlite3_atomic_store(&pBt->nPage, nPage);
    }
    
  3. Adopt WAL Mode for In-Memory Databases
    Despite being counterintuitive for memory-backed storage, WAL provides atomic commit visibility:

    sqlite3_exec(db, "PRAGMA journal_mode=WAL", 0, 0, 0);
    

    Monitor with PRAGMA wal_checkpoint during failures.

Final Fix: SQLite Codebase Patch

The root cause was identified as a race condition in SQLite’s page reference counting during concurrent access. The official fix (post-3.38.5) modifies B-tree page acquisition logic:

--- src/btree.c
+++ src/btree.c
@@ -69145,7 +69145,7 @@
   assert( pCur==0 || pCur->iPage>0 );
 
-  if( pgno>btreePagecount(pBt) ){
+  if( pgno>sqlite3PagerPagecount(pBt->pPager, 0) ){
     rc = SQLITE_CORRUPT_BKPT;
     goto getAndInitPage_error1;
   }

This replaces the volatile pBt->nPage with a direct query to the pager’s authoritative page count, eliminating stale metadata reads during concurrent modifications.

Post-Resolution Validation

  1. Concurrency Stress Testing
    Use a controlled test harness to simulate high-write contention:

    ./sqlite3 :memory: \
      "PRAGMA threads=8; \
       CREATE TABLE t(x); \
       INSERT INTO t VALUES(randomblob(1000));" \
      -cmd ".timer on" \
      -cmd "SELECT count(*) FROM t" \
      -repeat 100000
    
  2. Memory Sanitization Tools
    Run SQLite with address sanitizers to detect hidden memory errors:

    export ASAN_OPTIONS=detect_leaks=1
    ./configure --enable-debug CFLAGS="-fsanitize=address"
    make sqlite3
    
  3. Continuous Integrity Monitoring
    Embed periodic integrity checks in application code:

    int check_db(sqlite3 *db){
      sqlite3_stmt *stmt;
      sqlite3_prepare_v2(db, "PRAGMA quick_check", -1, &stmt, 0);
      while(sqlite3_step(stmt)==SQLITE_ROW){
        const char *result = (const char*)sqlite3_column_text(stmt, 0);
        if(strcmp(result, "ok")!=0){
          log_error("Corruption detected: %s", result);
          return SQLITE_CORRUPT;
        }
      }
      sqlite3_finalize(stmt);
      return SQLITE_OK;
    }
    

This comprehensive approach addresses both the immediate corruption symptoms and underlying concurrency flaws in memory-backed SQLite deployments.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *