FTS5 Index Corruption Due to Mismatched INSERT/DELETE Trigger Values

Inconsistent External Content FTS5 Trigger Implementation Causes Malformed Database Image

Trigger Logic Discrepancy Between Insertion and Deletion Operations

The core issue arises when maintaining an external content FTS5 virtual table through manually defined triggers. The problem manifests as a "database disk image is malformed" error during query execution, specifically when searching for terms that should have been removed from the index but persist due to incomplete trigger logic.

In the provided scenario, a base table (tbl) is linked to an FTS5 index (fts_idx) using content=’tbl’. Three triggers (tbl_ai, tbl_ad, tbl_au) manage synchronization between the base table and the FTS5 index. The INSERT trigger (tbl_ai) erroneously hardcodes a value ('abc wrong') instead of using the new.b column from the inserted row. Meanwhile, the DELETE trigger (tbl_ad) correctly references old.b when removing the indexed entry. This creates a mismatch: the FTS5 index contains tokens from the hardcoded string 'abc wrong' that never existed in the base table. When a row is deleted, the trigger removes the actual old.b value but leaves the phantom 'wrong' token in the index. Subsequent queries attempting to access these orphaned tokens trigger a malformed database error because the index contains references to nonexistent base table content.

The error occurs specifically when querying the phantom term because SQLite’s FTS5 module attempts to locate the source data referenced by the index entry. When the base table row associated with the index entry no longer exists (or never existed), the internal consistency checks fail, resulting in a disk image malformation error. This is distinct from ordinary search misses – the database engine detects structural incompatibility between the index’s expectations and the actual table state.

Primary Causes of Orphaned FTS5 Index Entries and Database Corruption

1. Asymmetric Data Handling in INSERT/DELETE Triggers
The root cause lies in the lack of parity between the data transformation steps applied during insertion and deletion. The INSERT trigger introduces an artificial value ('abc wrong') that diverges from the base table’s new.b column. However, the DELETE trigger operates on the authentic old.b value. Consequently, the FTS5 index accumulates tokens ('wrong') that have no correspondence to any actual base table content. When the deletion trigger removes the valid old.b tokens ('abc'), the phantom 'wrong' token remains unaccounted for. This violates the FTS5 external content table’s fundamental requirement that all index entries must map to valid content_rowid values in the base table.

2. Absence of Preprocessing Consistency in Data Lifecycle Management
Custom preprocessing functions (explicit or implicit) must be applied uniformly across all triggers interacting with the FTS5 index. In this case, the INSERT trigger’s substitution of new.b with 'abc wrong' acts as an undocumented preprocessing step. The DELETE trigger’s failure to reverse this substitution – by using the raw old.b value instead of the transformed 'abc wrong' – means the deletion operation targets different lexical content than what was originally indexed. This breaks the bidirectional linkage between base table rows and their FTS5 index entries.

3. Overreliance on Manual Trigger Synchronization
External content FTS5 tables require meticulous manual trigger design to ensure that every modification to the base table (INSERT/UPDATE/DELETE) is mirrored accurately in the index. The example demonstrates how even a minor inconsistency in trigger logic (hardcoding a value in one trigger but not another) cascades into database corruption. Unlike contentless FTS5 tables (which automatically manage content deletion via the contentless_delete option), external content tables place the burden of data consistency entirely on the trigger implementer.

Comprehensive Remediation Strategy for FTS5 Trigger-Induced Corruption

Step 1: Align Data Transformation Across All Triggers
Revise the INSERT, UPDATE, and DELETE triggers to ensure identical data handling logic. If a preprocessing step (like substituting new.b with 'abc wrong') is necessary during insertion, the inverse operation must occur during deletion.

Before:

-- Faulty INSERT trigger using hardcoded value
CREATE TRIGGER tbl_ai AFTER INSERT ON tbl BEGIN 
  INSERT INTO fts_idx(rowid, b, c) VALUES (new.a, 'abc wrong', new.c); 
END;

After:

-- Corrected INSERT trigger using actual new.b value
CREATE TRIGGER tbl_ai AFTER INSERT ON tbl BEGIN 
  INSERT INTO fts_idx(rowid, b, c) VALUES (new.a, new.b, new.c); 
END;

If preprocessing is genuinely required (e.g., data enrichment or sanitization), ensure that the DELETE trigger references the processed value, not the raw old.b value. For example, if 'abc wrong' was derived from new.b via a function preprocess(), the DELETE trigger must apply preprocess(old.b) to compute the value used during insertion:

CREATE TRIGGER tbl_ad AFTER DELETE ON tbl BEGIN 
  INSERT INTO fts_idx(fts_idx, rowid, b, c) VALUES(
    'delete', 
    old.a, 
    preprocess(old.b),  -- Match INSERT's preprocessing
    old.c
  ); 
END;

Step 2: Rebuild Corrupted FTS5 Indexes
Once trigger logic is corrected, existing corrupted indexes must be rebuilt to purge orphaned entries. Two methods exist:

Method A: Manual Rebuild via Data Reinsertion

-- Disable triggers to prevent redundant FTS5 updates
DROP TRIGGER tbl_ai;
DROP TRIGGER tbl_ad;
DROP TRIGGER tbl_au;

-- Rebuild FTS5 index from base table
DELETE FROM fts_idx;
INSERT INTO fts_idx(rowid, b, c) SELECT a, b, c FROM tbl;

-- Recreate triggers with corrected logic
CREATE TRIGGER tbl_ai ... ; -- Corrected logic
CREATE TRIGGER tbl_ad ... ; -- Corrected logic
CREATE TRIGGER tbl_au ... ; -- Corrected logic

Method B: Leverage FTS5 Integrity Check and Repair
SQLite’s FTS5 module provides an integrity-check pragma, though its capabilities are limited compared to SQLite’s core integrity_check. To diagnose and resolve index corruption:

-- Check for structural issues (returns 1 if ok, raises error if not)
INSERT INTO fts_idx(fts_idx) VALUES('integrity-check');

-- If corruption is detected, rebuild the entire index
INSERT INTO fts_idx(fts_idx) VALUES('rebuild');

Step 3: Implement Contentless FTS5 Tables Where Appropriate
If the FTS5 index does not require external content (i.e., the indexed data exists solely within the FTS5 table), consider converting to a contentless table with the contentless_delete option. This eliminates trigger maintenance overhead by enabling automatic deletion of index entries when base table rows are removed.

-- Contentless FTS5 table with automatic deletion
CREATE VIRTUAL TABLE fts_idx USING fts5(b, c, contentless_delete=1);

Note: Contentless tables store a copy of the indexed data within the FTS5 table itself. This duplicates storage but simplifies synchronization. External content tables remain necessary when the FTS5 index must reference large or externally managed content.

Step 4: Enforce Trigger Symmetry Through Automated Testing
Develop test harnesses to validate trigger behavior after schema changes. For example, use transactional test cases to verify that INSERT followed by DELETE leaves no residual FTS5 entries:

BEGIN;
  INSERT INTO tbl(b, c) VALUES ('test', 'test');
  SELECT COUNT(*) FROM fts_idx WHERE fts_idx MATCH 'test'; -- Expect 1
  DELETE FROM tbl WHERE a = last_insert_rowid();
  SELECT COUNT(*) FROM fts_idx WHERE fts_idx MATCH 'test'; -- Expect 0
ROLLBACK;

Automate these tests using SQLite’s .test infrastructure or external testing frameworks.

Step 5: Utilize SQLite’s FTS5 Auxiliary Functions for Debugging
Leverage fts5vocab virtual tables to inspect tokenization outcomes and identify orphaned entries:

-- Create vocabulary table for inspecting 'b' column
CREATE VIRTUAL TABLE fts_idx_v USING fts5vocab(fts_idx, 'col');

-- Query for unexpected tokens
SELECT * FROM fts_idx_v WHERE term = 'wrong';

If the vocabulary table reveals lingering terms after deletions, it indicates unresolved trigger asymmetry.

Step 6: Adopt Defensive Trigger Design Patterns

  • Use Row-Level Snapshots: In triggers, reference OLD.* and NEW.* values directly without transformation unless absolutely necessary. If transformation is unavoidable, centralize the logic in a SQL function or application-layer code to ensure consistency.
  • Atomic Trigger Updates: For UPDATE operations, ensure the DELETE and INSERT phases of the trigger use identical preprocessing for OLD and NEW values respectively.
  • Logging Triggers: Implement companion triggers to audit FTS5 operations, logging changes to a separate table for post-hoc analysis.

Final Note on External Content Table Maintenance
External content FTS5 tables offer flexibility at the cost of increased complexity. Every manual trigger introduces a potential failure vector. Mitigate this by:

  • Avoiding ad-hoc preprocessing in triggers without corresponding cleanup logic.
  • Conducting periodic integrity checks using PRAGMA quick_check and FTS5’s integrity-check.
  • Documenting all data transformations applied during FTS5 index updates to ensure parity across triggers.

By rigorously applying these steps, developers can prevent FTS5 index corruption caused by trigger mismatches and maintain robust full-text search functionality in SQLite.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *