Identifying Tables Affected by SQLite Database Page Corruption

Database Corruption Manifestation & Diagnostic Challenges

Issue Overview

Database corruption in SQLite manifests through errors such as SQLITE_CORRUPT (database disk image is malformed) during query execution. This error indicates structural inconsistencies in the database file, often caused by incomplete writes due to power outages, hardware failures, or filesystem errors. The PRAGMA integrity_check command is the primary diagnostic tool for detecting corruption. When executed, it traverses the database’s B-tree structures, verifying page linkages, cell pointers, and metadata.

In the example provided, PRAGMA integrity_check reports failures on pages 3284, 3283, and 3282 with the error btreeInitPage() returns error code 11. These pages belong to the B-tree structures that underpin SQLite tables or indexes. The btreeInitPage function initializes a page by parsing its header and cell pointers. Error code 11 (corresponding to SQLITE_CORRUPT) suggests that the page header or cell content is invalid, rendering the page unreadable.

The .recover command attempts to salvage data by reconstructing a database from corrupt pages. However, this process often fails when encountering structural anomalies. In the case described, .recover generates a lost_and_found table with an excessive number of columns (exceeding SQLite’s 2048-column limit), leading to syntax errors. Additionally, attempts to reinsert recovered data trigger UNIQUE constraint violations, indicating duplicate entries in tables that enforce uniqueness. These errors stem from the recovery process’s inability to reconcile partially written or overlapping data from corrupted pages.

The core challenge lies in mapping corrupt pages (e.g., 3282–3284) to specific tables. SQLite’s internal page allocation metadata may itself be corrupted, making it impossible to rely on standard tools like .tables or sqlite_master queries. Without knowing which tables are affected, targeted recovery strategies (e.g., exporting valid data from intact tables) become impractical.

Potential Causes of Page Corruption & Recovery Failures

Hardware-Induced Page Zeroing

Power outages or storage media failures can truncate writes, leaving pages partially overwritten or filled with zeros. A zeroed page lacks valid header information (e.g., page type, free block offsets), causing btreeInitPage to fail. Pages containing critical metadata (e.g., the sqlite_master table) exacerbate corruption, as they prevent the database from bootstrapping its schema.

B-Tree Structure Inconsistencies

SQLite stores tables and indexes as B-trees. Each B-tree root page (stored in sqlite_master) points to child pages. Corruption in a root page or intermediate node disrupts access to entire subtrees. For example, if page 3282 is a leaf page in a table’s B-tree, its corruption affects a subset of records. However, if it is an internal node, entire branches of the tree become inaccessible.

Schema Decoupling During Recovery

The .recover command operates by extracting raw data from pages without fully relying on the schema. When pages are missing or unreadable, it creates a lost_and_found table with generic columns (c0, c1, etc.). This approach ignores constraints and indexes, leading to duplicate entries when reinserting data into reconstructed tables. Additionally, .recover may misattribute pages to incorrect tables if schema metadata is corrupted.

Toolchain Misconfiguration

The showdb utility, essential for analyzing page usage, is absent from the SQLite amalgamation distribution. Users unaware of the distinction between amalgamation and canonical source distributions will fail to build showdb, hindering low-level diagnostics.

Diagnostic Workflow & Remediation Strategies

Building the showdb Utility

  1. Obtain Canonical Source Code
    Download the canonical source tarball from SQLite’s download page under Alternative Source Code Formats or clone the Fossil repository:

    fossil clone https://www.sqlite.org/src sqlite.fossil  
    mkdir sqlite-src  
    cd sqlite-src  
    fossil open ../sqlite.fossil  
    
  2. Compile showdb
    Navigate to the source directory and compile:

    ./configure  
    make showdb  
    

    This generates the showdb executable in the root directory.

Analyzing Page Usage with showdb

  1. Generate Page Allocation Report
    Execute showdb with the pgidx option to list page usage:

    ./showdb MyVideos116.db pgidx  
    

    The output categorizes pages by type (freelist, overflow, table/index B-trees) and maps them to schema objects where possible.

  2. Inspect Corrupt Pages
    Dump the content of corrupt pages (e.g., 3282):

    ./showdb MyVideos116.db page 3282  
    

    A zeroed page displays as a block of zeros or meaningless bytes. Valid pages show B-tree headers, cell pointers, and payload data.

  3. Cross-Reference sqlite_master
    Query the sqlite_master table to map root pages to tables/indexes:

    SELECT name, type, rootpage FROM sqlite_master;  
    

    Corrupt root pages indicate damaged tables/indexes. For example, if rootpage 3284 corresponds to an index, its corruption affects query performance but not data integrity.

Salvaging Data Without showdb

  1. Export Intact Tables
    Use .dump to export tables unaffected by corruption:

    sqlite3 MyVideos116.db ".dump --preserve-rowids" > backup.sql  
    

    The --preserve-rowids flag retains ROWID values, crucial for tables relying on implicit rowids.

  2. Isolate Corrupt Tables
    If .dump fails on specific tables, exclude them using --skip:

    sqlite3 MyVideos116.db ".dump --preserve-rowids --skip corrupt_table" > partial_backup.sql  
    
  3. Manual Recovery via Hex Editor
    For critical data in corrupt pages:

    • Identify the table’s storage layout (column types, indexes) from application code or schema backups.
    • Use a hex editor (e.g., hexdump -C) to extract raw bytes from the database file.
    • Parse B-tree cells manually, extracting payloads and reconstructing records.

Mitigating UNIQUE Constraint Violations

  1. Temporarily Disable Constraints
    During recovery, disable constraints to bypass duplication errors:

    PRAGMA ignore_check_constraints = ON;  
    PRAGMA defer_foreign_keys = ON;  
    

    Re-enable them after data insertion.

  2. Deduplicate Recovered Data
    Use INSERT OR IGNORE or INSERT OR REPLACE to handle duplicates:

    INSERT OR IGNORE INTO bookmark (idBookmark, ...) VALUES (...);  
    

Preventing Future Corruption

  1. Enable Write-Ahead Logging (WAL)
    WAL reduces the chance of corruption by separating writes into a log file:

    PRAGMA journal_mode = WAL;  
    
  2. Use Atomic Transactions
    Group writes into transactions to ensure atomicity:

    BEGIN;  
    -- Multiple INSERT/UPDATE statements  
    COMMIT;  
    
  3. Regular Backups with .dump
    Schedule periodic backups using .dump to create schema-aware SQL scripts.

This guide provides a systematic approach to diagnosing and recovering from SQLite page corruption, emphasizing the use of low-level tools like showdb and strategic recovery techniques to minimize data loss.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *