Detecting Missing Rows in SQLite: PRAGMA integrity_check Limitations and Solutions

Issue Overview: How PRAGMA integrity_check Handles Missing Row Detection

The core issue revolves around whether SQLite’s PRAGMA integrity_check command reliably detects missing rows caused by database corruption, such as hardware failures (e.g., bad RAM). This question arises from ambiguity in SQLite’s documentation, which explicitly states that integrity_check verifies index consistency but does not directly clarify whether it identifies rows that are entirely absent from tables.

At a technical level, SQLite databases store data in B-tree structures. Each table and index exists as a separate B-tree. The integrity_check command performs two primary tasks:

  1. Structural Validation: It ensures that all B-trees (tables and indexes) adhere to SQLite’s file format specifications. This includes verifying page integrity, free-list consistency, and pointer validity.
  2. Cross-Consistency Checks: It validates that every entry in an index corresponds to a valid row in the associated table.

However, detecting missing rows requires a different type of verification. A “missing row” implies that a row that should exist (based on application logic or transaction history) is no longer present in the table. SQLite’s integrity_check does not track historical data or application-specific expectations. Instead, it focuses on the current state of the database. If a row is missing due to corruption—for example, if the database page containing the row was overwritten or deleted—integrity_check can only detect this indirectly through inconsistencies between tables and indexes.

For example, if an index entry references a row that no longer exists in the table, integrity_check flags this as an error. Conversely, if both the row and its index entries are missing (e.g., due to corruption affecting multiple pages), integrity_check will not detect the absence of the row because there is no orphaned index entry to trigger a warning.

Possible Causes: Scenarios Where Missing Rows May Go Undetected

Missing rows can result from various forms of corruption, but not all scenarios are detectable by PRAGMA integrity_check. The following conditions determine whether the command will identify the issue:

1. Silent Data Deletion Without Index Inconsistencies

If corruption deletes a row and all associated index entries (e.g., due to a corrupted page that stored both table data and index entries), integrity_check will not report an error. This is because the command only checks for invalid references (e.g., an index pointing to a non-existent row), not for expected data that might be missing.

2. Partial Page Corruption

SQLite stores rows in database pages. If a page is partially corrupted—say, a single row is overwritten with garbage data—the B-tree structure might still appear valid. For instance, if the corrupted row’s parent and child pointers remain intact, integrity_check may not flag the page. The row’s absence would only be detectable if an index references it.

3. Index-Free Tables

Tables without indexes rely solely on their B-tree structure for validation. If a row is missing from such a table, integrity_check will not detect it unless the table’s B-tree structure is invalid (e.g., incorrect key order or broken page links).

4. Application-Level Corruption

If the application logic inadvertently deletes a row (e.g., due to a bug), this is not considered database corruption. integrity_check only identifies structural issues, not logical errors in data content.

5. Write-Ahead Log (WAL) Mode Artifacts

In WAL mode, uncommitted transactions in the WAL file can complicate corruption scenarios. If a crash occurs during a write operation, the database may recover correctly, but hardware-level corruption could leave the WAL or database in an inconsistent state. integrity_check does not account for WAL-specific anomalies unless the main database file itself is corrupted.

Troubleshooting Steps, Solutions & Fixes: Ensuring Comprehensive Data Integrity

To address missing rows and other forms of corruption, a multi-layered approach is required. Below are detailed steps to diagnose, resolve, and prevent such issues:


Step 1: Run PRAGMA integrity_check with Verbose Output

Execute PRAGMA integrity_check and review its output. If the command returns ok, the database has no structural issues or index inconsistencies. However, this does not guarantee that all rows are present. If errors are reported (e.g., row missing from index), investigate the affected tables and indexes.

For a more detailed analysis, enable the integrity_check in verbose mode using a script:

PRAGMA integrity_check;
-- Output example: 
-- 'row 5 missing from index idx_mycolumn'

Step 2: Cross-Validate with PRAGMA foreign_key_check

If foreign key constraints are enabled, run PRAGMA foreign_key_check to identify orphaned rows (child rows without corresponding parent rows). While this does not directly detect missing rows, it can reveal inconsistencies caused by missing parent rows.

PRAGMA foreign_key_check;

Step 3: Dump and Rebuild the Database

Use the .dump command in the SQLite CLI to generate a SQL script of the entire database. This process implicitly validates the data, as any missing rows or corruption may cause errors during the dump:

sqlite3 corrupted.db ".dump" > dump.sql
sqlite3 new.db < dump.sql

If the dump encounters errors (e.g., database disk image is malformed), it indicates severe corruption that integrity_check may have missed.


Step 4: Use Third-Party Recovery Tools

Tools like sqlite3_recover (officially developed by SQLite) can reconstruct a damaged database by extracting as much data as possible. This is particularly useful when integrity_check fails to detect missing rows:

sqlite3_recover corrupted.db > recovered.sql
sqlite3 new.db < recovered.sql

Step 5: Implement Application-Level Checksums

Add checksums or hash columns to critical tables. For example, store a SHA-256 hash of the row’s contents in a separate column. Periodically verify these hashes to detect missing or altered rows:

-- Example table with checksum column
CREATE TABLE sensitive_data (
    id INTEGER PRIMARY KEY,
    data TEXT,
    checksum TEXT
);

-- Verify checksums periodically
SELECT id FROM sensitive_data WHERE checksum != sha256(data);

Step 6: Enable Full-Text Search (FTS) for Critical Data

SQLite’s FTS extension creates shadow tables that index textual data. If the original row is missing but the FTS entry remains, queries against the FTS table will reveal discrepancies:

-- Create FTS table
CREATE VIRTUAL TABLE fts_data USING fts5(id, data);

-- Insert data into both main and FTS tables
INSERT INTO main_data VALUES (1, 'critical info');
INSERT INTO fts_data VALUES (1, 'critical info');

-- Query for orphaned FTS entries
SELECT fts_data.rowid FROM fts_data 
LEFT JOIN main_data ON fts_data.rowid = main_data.id 
WHERE main_data.id IS NULL;

Step 7: Monitor Hardware and Filesystem Health

Since hardware failures are a common cause of corruption, use tools like smartctl (for disk health) and memtester (for RAM) to diagnose underlying issues. For filesystems, enable journaling or use Copy-on-Write (CoW) filesystems like ZFS or Btrfs to reduce corruption risks.


Step 8: Regularly Back Up the Database

Schedule frequent backups using SQLite’s .backup command, which creates a hot backup without locking the database:

sqlite3 original.db ".backup backup.db"

Step 9: Enable SQLite’s Built-In Corruption Detection

Set PRAGMA journal_mode = WAL; and PRAGMA synchronous = FULL; to enhance crash resilience. While these settings do not prevent corruption, they reduce the likelihood of incomplete writes.


Step 10: Perform Offline Consistency Checks

For mission-critical systems, periodically take the database offline and run comprehensive checks using sqlite3_test_control (requires a custom SQLite build) or external auditing scripts.


By combining these strategies, developers can mitigate the limitations of PRAGMA integrity_check and ensure robust detection of missing rows and other forms of database corruption.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *