Detecting SQLite Database Corruption in Multi-Attached Environments When Integrity Check Fails


Understanding SQLITE_CORRUPT Errors in Multi-Attached Databases and Limitations of PRAGMA integrity_check

Issue Overview

When working with multiple attached databases in SQLite, an SQLITE_CORRUPT error can occur during transactions that interact with multiple databases. This error indicates structural corruption but does not specify which attached database is responsible. The standard diagnostic tool, PRAGMA integrity_check, validates database structures such as B-trees, indexes, and page linkages. However, it does not validate the free page list, a critical metadata structure that tracks unused pages in the database file. Corruption in the free page list can go undetected by PRAGMA integrity_check, leading to silent false negatives. This creates a scenario where a database operation (e.g., page allocation during a write) triggers SQLITE_CORRUPT due to free list inconsistencies, yet no attached database is flagged as corrupt by conventional checks.

The free page list corruption described in the discussion arises when the "total free pages" value stored in page 1 of a database exceeds the maximum page count (mxPage). SQLite’s allocateBtreePage function validates this during page allocation and raises SQLITE_CORRUPT_BKPT if the count is invalid. Since PRAGMA integrity_check skips free list validation, developers are left without a straightforward method to identify the corrupted database when multiple databases are attached.

Root Causes of Undetected Free Page List Corruption

  1. Incomplete Validation by PRAGMA integrity_check:
    The PRAGMA integrity_check command focuses on logical consistency of data structures (e.g., index integrity, foreign key constraints) but omits validation of metadata structures like the free page list. This is intentional, as free list validation requires simulating low-level operations (e.g., page allocation) that are resource-intensive and rarely necessary for typical integrity checks.

  2. Multi-Database Transaction Complexity:
    Transactions spanning multiple attached databases complicate error attribution. SQLite’s error reporting mechanism returns a generic SQLITE_CORRUPT code without specifying the database handle responsible. This is because corruption detection often occurs during low-level operations (e.g., writing to a b-tree page), which may involve cross-database interdependencies.

  3. Ephemeral Corruption States:
    Free page list corruption may only manifest under specific conditions, such as when a transaction attempts to allocate a new page. For example, if a database has a corrupted free list but no write operations are performed during PRAGMA integrity_check, the corruption remains undetected until a later write operation triggers the error.

  4. Database File Manipulation Outside SQLite:
    If external tools or processes modify the database file (e.g., manual edits, unsafe file copies), they may introduce inconsistencies in the free page list that evade SQLite’s standard checks.

Diagnosing and Resolving Free Page List Corruption in Multi-Attached Databases

Step 1: Manual Free Page List Validation

Since PRAGMA integrity_check does not validate the free page list, developers must perform targeted checks using SQLite’s internal metadata tables and pragmas.

Procedure:

  1. For each attached database, query the mxPage value (total pages in the database) using:
    PRAGMA [database_name].page_count;  
    
  2. Extract the free page count from the database’s page 1 header. The free page count is stored as a 4-byte big-endian integer at offset 36 (decimal) of page 1. This can be read using the hex and substr functions on the sqlite_dbpage virtual table (requires SQLite compiled with -DSQLITE_ENABLE_DBPAGE_VTAB):
    SELECT hex(substr(data, 37, 4)) AS free_page_count  
    FROM [database_name].sqlite_dbpage  
    WHERE pgno = 1;  
    

    Convert the hexadecimal result to a decimal integer.

  3. Compare the free page count to mxPage. If the free page count exceeds mxPage, the database is corrupt.

Example:
For an attached database aux_db:

ATTACH 'aux.db' AS aux_db;  
PRAGMA aux_db.page_count; -- Returns mxPage = 100  
SELECT hex(substr(data, 37, 4)) FROM aux_db.sqlite_dbpage WHERE pgno = 1; -- Returns '00000064' (100 in decimal)  

A free page count of 100 with mxPage=100 is valid. A value of 101 would indicate corruption.

Step 2: Forcing Free Page List Validation via Vacuum

The VACUUM command rebuilds the entire database, which implicitly validates and reconstructs the free page list. If a database has free list corruption, VACUUM will fail with SQLITE_CORRUPT and point to the problematic database.

Procedure:

  1. Detach all databases except one.
  2. Execute VACUUM on the remaining database.
  3. Repeat for each attached database.

Limitation:
VACUUM requires exclusive access to the database and may not be feasible in live production environments.

Step 3: Leveraging SQLITE_DEBUG and Custom Builds

SQLite’s debugging interfaces provide granular control over corruption checks. Developers can compile a custom SQLite build with debug flags to enable free page list validation.

Procedure:

  1. Compile SQLite with -DSQLITE_DEBUG and -DSQLITE_ENABLE_CORRUPT_PGNO.
  2. Use sqlite3_test_control(SQLITE_TESTCTRL_ISKEYWORD, ...) to enable verbose corruption logging.
  3. Execute transactions while monitoring debug output for corruption warnings tied to specific databases.

Example Debug Hook:

void corruption_callback(void* ctx, int errCode, const char* errMsg) {  
    if (errCode == SQLITE_CORRUPT) {  
        fprintf(stderr, "Corruption detected in database: %s\n", (const char*)ctx);  
    }  
}  
// Register the callback for each database  
sqlite3_db_config(db, SQLITE_DBCONFIG_CORRUPT_CB, corruption_callback, "main", NULL);  
Step 4: Incremental Isolation of Corrupted Databases

When debugging in production is impractical, isolate databases by incrementally attaching them and testing operations known to trigger free page list access.

Procedure:

  1. Start with a fresh connection and attach one database.
  2. Execute a write operation (e.g., INSERT INTO test VALUES (1);).
  3. If SQLITE_CORRUPT occurs, the attached database is corrupt.
  4. If no error, repeat with the next database.

Optimization:
Use a binary search pattern: Attach half of the databases, test for corruption, and iteratively narrow down the culprit.

Step 5: Salvaging Data from Corrupted Databases

Once the corrupted database is identified, use .dump or sqlite3_db_recover to extract intact data.

Procedure:

  1. For the corrupted database, run:
    sqlite3 corrupted.db ".dump" > salvage.sql  
    
  2. Rebuild the database:
    sqlite3 new.db < salvage.sql  
    

Note:
The .dump command may fail if critical system tables (e.g., sqlite_master) are corrupted. Use sqlite3_db_recover (a third-party tool) for advanced recovery.


Summary

Detecting free page list corruption in multi-attached SQLite databases requires bypassing PRAGMA integrity_check and directly validating metadata structures. By combining manual free page count checks, forced validation via VACUUM, debug-enabled builds, and incremental isolation, developers can pinpoint the corrupted database and recover data. These steps address the limitations of standard tools and provide a systematic approach to diagnosing elusive corruption scenarios.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *