Identifying Tables Affected by SQLite Database Page Corruption
Database Corruption Manifestation & Diagnostic Challenges
Issue Overview
Database corruption in SQLite manifests through errors such as SQLITE_CORRUPT (database disk image is malformed)
during query execution. This error indicates structural inconsistencies in the database file, often caused by incomplete writes due to power outages, hardware failures, or filesystem errors. The PRAGMA integrity_check
command is the primary diagnostic tool for detecting corruption. When executed, it traverses the database’s B-tree structures, verifying page linkages, cell pointers, and metadata.
In the example provided, PRAGMA integrity_check
reports failures on pages 3284, 3283, and 3282 with the error btreeInitPage() returns error code 11
. These pages belong to the B-tree structures that underpin SQLite tables or indexes. The btreeInitPage
function initializes a page by parsing its header and cell pointers. Error code 11 (corresponding to SQLITE_CORRUPT
) suggests that the page header or cell content is invalid, rendering the page unreadable.
The .recover
command attempts to salvage data by reconstructing a database from corrupt pages. However, this process often fails when encountering structural anomalies. In the case described, .recover
generates a lost_and_found
table with an excessive number of columns (exceeding SQLite’s 2048-column limit), leading to syntax errors. Additionally, attempts to reinsert recovered data trigger UNIQUE constraint
violations, indicating duplicate entries in tables that enforce uniqueness. These errors stem from the recovery process’s inability to reconcile partially written or overlapping data from corrupted pages.
The core challenge lies in mapping corrupt pages (e.g., 3282–3284) to specific tables. SQLite’s internal page allocation metadata may itself be corrupted, making it impossible to rely on standard tools like .tables
or sqlite_master
queries. Without knowing which tables are affected, targeted recovery strategies (e.g., exporting valid data from intact tables) become impractical.
Potential Causes of Page Corruption & Recovery Failures
Hardware-Induced Page Zeroing
Power outages or storage media failures can truncate writes, leaving pages partially overwritten or filled with zeros. A zeroed page lacks valid header information (e.g., page type, free block offsets), causing btreeInitPage
to fail. Pages containing critical metadata (e.g., the sqlite_master
table) exacerbate corruption, as they prevent the database from bootstrapping its schema.
B-Tree Structure Inconsistencies
SQLite stores tables and indexes as B-trees. Each B-tree root page (stored in sqlite_master
) points to child pages. Corruption in a root page or intermediate node disrupts access to entire subtrees. For example, if page 3282 is a leaf page in a table’s B-tree, its corruption affects a subset of records. However, if it is an internal node, entire branches of the tree become inaccessible.
Schema Decoupling During Recovery
The .recover
command operates by extracting raw data from pages without fully relying on the schema. When pages are missing or unreadable, it creates a lost_and_found
table with generic columns (c0
, c1
, etc.). This approach ignores constraints and indexes, leading to duplicate entries when reinserting data into reconstructed tables. Additionally, .recover
may misattribute pages to incorrect tables if schema metadata is corrupted.
Toolchain Misconfiguration
The showdb
utility, essential for analyzing page usage, is absent from the SQLite amalgamation distribution. Users unaware of the distinction between amalgamation and canonical source distributions will fail to build showdb
, hindering low-level diagnostics.
Diagnostic Workflow & Remediation Strategies
Building the showdb Utility
- Obtain Canonical Source Code
Download the canonical source tarball from SQLite’s download page under Alternative Source Code Formats or clone the Fossil repository:fossil clone https://www.sqlite.org/src sqlite.fossil mkdir sqlite-src cd sqlite-src fossil open ../sqlite.fossil
- Compile showdb
Navigate to the source directory and compile:./configure make showdb
This generates the
showdb
executable in the root directory.
Analyzing Page Usage with showdb
Generate Page Allocation Report
Executeshowdb
with thepgidx
option to list page usage:./showdb MyVideos116.db pgidx
The output categorizes pages by type (freelist, overflow, table/index B-trees) and maps them to schema objects where possible.
Inspect Corrupt Pages
Dump the content of corrupt pages (e.g., 3282):./showdb MyVideos116.db page 3282
A zeroed page displays as a block of zeros or meaningless bytes. Valid pages show B-tree headers, cell pointers, and payload data.
Cross-Reference sqlite_master
Query thesqlite_master
table to map root pages to tables/indexes:SELECT name, type, rootpage FROM sqlite_master;
Corrupt root pages indicate damaged tables/indexes. For example, if
rootpage
3284 corresponds to an index, its corruption affects query performance but not data integrity.
Salvaging Data Without showdb
Export Intact Tables
Use.dump
to export tables unaffected by corruption:sqlite3 MyVideos116.db ".dump --preserve-rowids" > backup.sql
The
--preserve-rowids
flag retainsROWID
values, crucial for tables relying on implicit rowids.Isolate Corrupt Tables
If.dump
fails on specific tables, exclude them using--skip
:sqlite3 MyVideos116.db ".dump --preserve-rowids --skip corrupt_table" > partial_backup.sql
Manual Recovery via Hex Editor
For critical data in corrupt pages:- Identify the table’s storage layout (column types, indexes) from application code or schema backups.
- Use a hex editor (e.g.,
hexdump -C
) to extract raw bytes from the database file. - Parse B-tree cells manually, extracting payloads and reconstructing records.
Mitigating UNIQUE Constraint Violations
Temporarily Disable Constraints
During recovery, disable constraints to bypass duplication errors:PRAGMA ignore_check_constraints = ON; PRAGMA defer_foreign_keys = ON;
Re-enable them after data insertion.
Deduplicate Recovered Data
UseINSERT OR IGNORE
orINSERT OR REPLACE
to handle duplicates:INSERT OR IGNORE INTO bookmark (idBookmark, ...) VALUES (...);
Preventing Future Corruption
Enable Write-Ahead Logging (WAL)
WAL reduces the chance of corruption by separating writes into a log file:PRAGMA journal_mode = WAL;
Use Atomic Transactions
Group writes into transactions to ensure atomicity:BEGIN; -- Multiple INSERT/UPDATE statements COMMIT;
Regular Backups with .dump
Schedule periodic backups using.dump
to create schema-aware SQL scripts.
This guide provides a systematic approach to diagnosing and recovering from SQLite page corruption, emphasizing the use of low-level tools like showdb
and strategic recovery techniques to minimize data loss.