Detecting SQLite Database Corruption in Multi-Attached Environments When Integrity Check Fails
Understanding SQLITE_CORRUPT Errors in Multi-Attached Databases and Limitations of PRAGMA integrity_check
Issue Overview
When working with multiple attached databases in SQLite, an SQLITE_CORRUPT
error can occur during transactions that interact with multiple databases. This error indicates structural corruption but does not specify which attached database is responsible. The standard diagnostic tool, PRAGMA integrity_check
, validates database structures such as B-trees, indexes, and page linkages. However, it does not validate the free page list, a critical metadata structure that tracks unused pages in the database file. Corruption in the free page list can go undetected by PRAGMA integrity_check
, leading to silent false negatives. This creates a scenario where a database operation (e.g., page allocation during a write) triggers SQLITE_CORRUPT
due to free list inconsistencies, yet no attached database is flagged as corrupt by conventional checks.
The free page list corruption described in the discussion arises when the "total free pages" value stored in page 1 of a database exceeds the maximum page count (mxPage
). SQLite’s allocateBtreePage
function validates this during page allocation and raises SQLITE_CORRUPT_BKPT
if the count is invalid. Since PRAGMA integrity_check
skips free list validation, developers are left without a straightforward method to identify the corrupted database when multiple databases are attached.
Root Causes of Undetected Free Page List Corruption
Incomplete Validation by PRAGMA integrity_check:
ThePRAGMA integrity_check
command focuses on logical consistency of data structures (e.g., index integrity, foreign key constraints) but omits validation of metadata structures like the free page list. This is intentional, as free list validation requires simulating low-level operations (e.g., page allocation) that are resource-intensive and rarely necessary for typical integrity checks.Multi-Database Transaction Complexity:
Transactions spanning multiple attached databases complicate error attribution. SQLite’s error reporting mechanism returns a genericSQLITE_CORRUPT
code without specifying the database handle responsible. This is because corruption detection often occurs during low-level operations (e.g., writing to a b-tree page), which may involve cross-database interdependencies.Ephemeral Corruption States:
Free page list corruption may only manifest under specific conditions, such as when a transaction attempts to allocate a new page. For example, if a database has a corrupted free list but no write operations are performed duringPRAGMA integrity_check
, the corruption remains undetected until a later write operation triggers the error.Database File Manipulation Outside SQLite:
If external tools or processes modify the database file (e.g., manual edits, unsafe file copies), they may introduce inconsistencies in the free page list that evade SQLite’s standard checks.
Diagnosing and Resolving Free Page List Corruption in Multi-Attached Databases
Step 1: Manual Free Page List Validation
Since PRAGMA integrity_check
does not validate the free page list, developers must perform targeted checks using SQLite’s internal metadata tables and pragmas.
Procedure:
- For each attached database, query the
mxPage
value (total pages in the database) using:PRAGMA [database_name].page_count;
- Extract the free page count from the database’s page 1 header. The free page count is stored as a 4-byte big-endian integer at offset 36 (decimal) of page 1. This can be read using the
hex
andsubstr
functions on thesqlite_dbpage
virtual table (requires SQLite compiled with-DSQLITE_ENABLE_DBPAGE_VTAB
):SELECT hex(substr(data, 37, 4)) AS free_page_count FROM [database_name].sqlite_dbpage WHERE pgno = 1;
Convert the hexadecimal result to a decimal integer.
- Compare the free page count to
mxPage
. If the free page count exceedsmxPage
, the database is corrupt.
Example:
For an attached database aux_db
:
ATTACH 'aux.db' AS aux_db;
PRAGMA aux_db.page_count; -- Returns mxPage = 100
SELECT hex(substr(data, 37, 4)) FROM aux_db.sqlite_dbpage WHERE pgno = 1; -- Returns '00000064' (100 in decimal)
A free page count of 100 with mxPage=100
is valid. A value of 101 would indicate corruption.
Step 2: Forcing Free Page List Validation via Vacuum
The VACUUM
command rebuilds the entire database, which implicitly validates and reconstructs the free page list. If a database has free list corruption, VACUUM
will fail with SQLITE_CORRUPT
and point to the problematic database.
Procedure:
- Detach all databases except one.
- Execute
VACUUM
on the remaining database. - Repeat for each attached database.
Limitation:
VACUUM
requires exclusive access to the database and may not be feasible in live production environments.
Step 3: Leveraging SQLITE_DEBUG and Custom Builds
SQLite’s debugging interfaces provide granular control over corruption checks. Developers can compile a custom SQLite build with debug flags to enable free page list validation.
Procedure:
- Compile SQLite with
-DSQLITE_DEBUG
and-DSQLITE_ENABLE_CORRUPT_PGNO
. - Use
sqlite3_test_control(SQLITE_TESTCTRL_ISKEYWORD, ...)
to enable verbose corruption logging. - Execute transactions while monitoring debug output for corruption warnings tied to specific databases.
Example Debug Hook:
void corruption_callback(void* ctx, int errCode, const char* errMsg) {
if (errCode == SQLITE_CORRUPT) {
fprintf(stderr, "Corruption detected in database: %s\n", (const char*)ctx);
}
}
// Register the callback for each database
sqlite3_db_config(db, SQLITE_DBCONFIG_CORRUPT_CB, corruption_callback, "main", NULL);
Step 4: Incremental Isolation of Corrupted Databases
When debugging in production is impractical, isolate databases by incrementally attaching them and testing operations known to trigger free page list access.
Procedure:
- Start with a fresh connection and attach one database.
- Execute a write operation (e.g.,
INSERT INTO test VALUES (1);
). - If
SQLITE_CORRUPT
occurs, the attached database is corrupt. - If no error, repeat with the next database.
Optimization:
Use a binary search pattern: Attach half of the databases, test for corruption, and iteratively narrow down the culprit.
Step 5: Salvaging Data from Corrupted Databases
Once the corrupted database is identified, use .dump
or sqlite3_db_recover
to extract intact data.
Procedure:
- For the corrupted database, run:
sqlite3 corrupted.db ".dump" > salvage.sql
- Rebuild the database:
sqlite3 new.db < salvage.sql
Note:
The .dump
command may fail if critical system tables (e.g., sqlite_master
) are corrupted. Use sqlite3_db_recover
(a third-party tool) for advanced recovery.
Summary
Detecting free page list corruption in multi-attached SQLite databases requires bypassing PRAGMA integrity_check
and directly validating metadata structures. By combining manual free page count checks, forced validation via VACUUM
, debug-enabled builds, and incremental isolation, developers can pinpoint the corrupted database and recover data. These steps address the limitations of standard tools and provide a systematic approach to diagnosing elusive corruption scenarios.