FTS4 Virtual Table Corruption: SQLITE_CORRUPT_VTAB When Querying Columns Despite Integrity Check
FTS4 Shadow Table Corruption Leading to Column-Specific Query Failures
The core issue involves an SQLite database with an FTS4 virtual table that passes a PRAGMA integrity_check
but throws a SQLITE_CORRUPT_VTAB
error when querying specific columns or invoking built-in FTS4 auxiliary functions like match_info()
or snippet()
. The virtual table returns all data via SELECT *
, but column-level access fails with a "database disk malformed" error. This discrepancy arises because FTS4 virtual tables rely on hidden shadow tables to manage indexed content and metadata. While the main database structure passes integrity checks, corruption in these shadow tables—or inconsistencies between the virtual table’s logical schema and its underlying physical storage—disrupt columnar queries and auxiliary function execution. The error manifests specifically when SQLite attempts to access fragmented or missing data within the shadow tables during column projection or FTS4-specific operations.
Root Causes: Shadow Table Integrity and FTS4 Implementation Nuances
The SQLITE_CORRUPT_VTAB
error in this context is tied to three primary factors: incomplete or corrupted shadow table records, mismatched schema definitions, or custom build anomalies.
FTS4 virtual tables depend on a set of shadow tables prefixed with the virtual table’s name (e.g., %_content
, %_segdir
, %_stat
). These tables store tokenized data, segment metadata, and optimization statistics. Corruption in these tables—such as missing rows in %_content
, invalid segment pointers in %_segdir
, or misaligned %_stat
entries—can leave the virtual table partially functional. For example, SELECT *
may work because it reads directly from the %_content
table, but column-specific queries require SQLite to reconstruct data from indexed segments, which involves cross-referencing shadow tables. If a shadow table entry is missing or contains invalid binary data, columnar access fails.
A second cause is schema mismatches. If the virtual table’s declared schema (e.g., column count, data types) does not align with the shadow tables’ stored data, SQLite’s query planner may misinterpret the physical storage layout. This can occur if the virtual table was altered without rebuilding its shadow tables or if manual edits were made to the shadow tables.
The third factor is the use of a custom SQLite build. Modifications to SQLite’s core or FTS4 module—such as altered tokenizers, custom functions, or changes to the shadow table management logic—can introduce edge cases where the virtual table’s internal consistency checks pass (via PRAGMA integrity_check
) but runtime operations fail. Built-in FTS4 functions like match_info()
rely on low-level APIs that assume specific shadow table structures; custom builds that alter these APIs may break compatibility. However, user-defined functions that do not interact with shadow tables (e.g., scalar utilities) will continue to work, as observed in the original issue.
Resolving Shadow Table Corruption and Restoring FTS4 Functionality
Step 1: Rebuild the FTS4 Index
The first mitigation is to rebuild the FTS4 virtual table’s index using the rebuild
command. Execute:
INSERT INTO fts_table(fts_table) VALUES('rebuild');
This triggers a reindexing process that regenerates all shadow tables from the source data. If the original data is intact, this resolves inconsistencies caused by partial writes, interrupted optimizations, or transient corruption. Verify success by re-executing a column-specific query (e.g., SELECT column1 FROM fts_table
). If the error persists, proceed to deeper diagnostics.
Step 2: Audit Shadow Tables Manually
Inspect the shadow tables directly to identify missing or malformed records. For a virtual table named fts_table
, query:
SELECT COUNT(*) FROM fts_table_content; -- Expect non-zero
SELECT * FROM fts_table_segdir ORDER BY level, idx; -- Check for valid (start_block, end_block) pairs
SELECT * FROM fts_table_stat; -- Verify 'value' fields are integers
Compare the results against a known-good FTS4 table. If fts_table_content
is empty or fts_table_segdir
contains negative block indices, the shadow tables are irreparably corrupted. Export the source data, drop the virtual table, and recreate it.
Step 3: Migrate to FTS5
FTS5’s improved storage format and consistency checks reduce susceptibility to shadow table corruption. Export the FTS4 data to a temporary table, drop the FTS4 virtual table, and recreate it with FTS5:
CREATE VIRTUAL TABLE fts_table_new USING fts5(column1, column2);
INSERT INTO fts_table_new SELECT * FROM fts_table;
DROP TABLE fts_table;
ALTER TABLE fts_table_new RENAME TO fts_table;
FTS5’s stricter transaction handling and atomic writes mitigate many issues caused by interrupted operations.
Step 4: Validate the Custom SQLite Build
If rebuilding the index and migrating to FTS5 fail, audit the custom SQLite build for FTS4-specific modifications. Revert to an unmodified SQLite build and test the database. If the error disappears, the custom build likely introduced instability in FTS4’s shadow table management. Key areas to review include:
- Tokenizer implementations that mishandle edge cases (e.g., non-ASCII text).
- Overrides of
xCreate
,xConnect
, orxBestIndex
in the FTS4 virtual table module. - Memory allocation changes (e.g., custom
sqlite3_malloc
hooks) that disrupt FTS4’s storage layer.
Step 5: Salvage Data via .dump and Reimport
For severe corruption, use SQLite’s .dump
command to export the database schema and data:
sqlite3 corrupted.db .dump > backup.sql
Edit backup.sql
to remove FTS4 virtual table definitions and shadow tables. Recreate the database:
sqlite3 new.db < backup.sql
Recreate the FTS4 table and repopulate it from the salvaged data. This bypasses corrupted shadow tables entirely.
Final Measure: Enable Write-Ahead Logging (WAL)
To prevent future corruption, enable WAL mode:
PRAGMA journal_mode=WAL;
WAL reduces the risk of partial writes during crashes, which is a common cause of shadow table inconsistencies. Combine this with periodic VACUUM
operations to defragment the database.
By systematically addressing shadow table integrity, validating the SQLite build, and migrating to FTS5, the SQLITE_CORRUPT_VTAB
error can be resolved while hardening the database against recurrence.