Corrupt FTS5 Index Due to Incorrect Trigger Configuration with External Content
FTS5 External Content Table Corruption and Trigger-Induced Inconsistencies
Root Cause: Mismanagement of FTS5 Rowid and External Content Table Relationships
The core issue arises from improper synchronization between an external content table (my_table) and its associated FTS5 virtual table (my_fts) due to misconfigured triggers. The original trigger logic failed to respect the rowid-based linkage required by FTS5 for external content tables. Specifically:
-
Misalignment of FTS5 Rowid and Content Table Identifier:
The FTS5 virtual table relies on an implicit or explicitrowidcolumn to map indexed entries to rows in the external content table. In the original triggers, inserts intomy_ftsexplicitly populated theidcolumn (a user-defined field markedUNINDEXED) instead of the FTS5rowid. This caused the FTS5 index to reference invalid or mismatched rows inmy_table, as therowidof the FTS5 table (which defaults to an auto-incremented value if not explicitly set) no longer aligned with theidcolumn ofmy_table(which serves as itsINTEGER PRIMARY KEYand thus corresponds to SQLite’s implicitrowid). -
Incomplete Handling of Updates and Deletions:
The originalBEFORE UPDATEandBEFORE DELETEtriggers attempted to remove entries frommy_ftsusing aWHERE id=old.idclause. However, this directly manipulated theidcolumn of the FTS5 table rather than targeting itsrowid. Since FTS5’s internal index is organized aroundrowid, deletions based onidleft orphaned index entries, destabilizing the relationship between the virtual table and the content table. Concurrent operations (e.g., rapid updates) exacerbated this inconsistency, leading to corruption. -
Race Conditions from Trigger Execution Order:
Themy_table_sort_default_valuetrigger, which updates thesortcolumn after inserts, introduced a subtle race condition. If this trigger fired after the FTS5 sync triggers, the FTS5 index could capture stale data frommy_tablebefore thesortcolumn was finalized. While not directly responsible for corruption, this created opportunities for transient inconsistencies during high-frequency write operations.
Mechanisms of Corruption in FTS5 External Content Tables
FTS5 external content tables operate under strict assumptions:
- The
rowidof the FTS5 table must correspond to therowid(or aliasedINTEGER PRIMARY KEY) of the external content table. - Direct modifications to the FTS5 table (via
INSERT,UPDATE, orDELETE) must use FTS5-specific syntax to ensure index integrity. For example, deletions require the'delete'command verb to signal index maintenance.
In the original setup:
- Inserting into
my_fts(id, meaning)populated theidcolumn (a data field) but left the FTS5rowidauto-assigned. This decoupled the index frommy_table’sid, causing FTS5 to reference incorrect rows during queries. - Deletions via
DELETE FROM my_fts WHERE id=old.idfailed to remove the correct index entries, as they targeted a data column instead of therowid. Over time, these orphaned entries corrupted the index structure.
Resolving Corruption Through Rowid-Centric Trigger Design
The corrected triggers address these issues by:
-
Explicitly Setting FTS5 Rowid During Inserts:
UsingINSERT INTO my_fts(rowid, meaning)ensures the FTS5 index’srowidmatchesmy_table’sid, preserving the 1:1 mapping required for external content tables. -
Using FTS5’s Delete Command for Index Maintenance:
TheINSERT INTO my_fts(my_fts, rowid, ...) VALUES('delete', ...)syntax triggers FTS5’s internal deletion mechanism, which properly invalidates index entries associated with the specifiedrowid. -
Sequencing Updates as Delete-Insert Pairs:
The revisedAFTER UPDATEtrigger first removes the old index entry (via'delete') and then inserts the new state, preventing overlapping references to the samerowid.
Implementation Guidelines for Stable FTS5 External Content Tables
-
Schema Design:
- Ensure the external content table declares an
INTEGER PRIMARY KEY(e.g.,id) to alias SQLite’srowid. - Define FTS5 virtual tables with
content='my_table'and explicitly map therowidduring trigger operations.
- Ensure the external content table declares an
-
Trigger Best Practices:
- Use
AFTERtriggers instead ofBEFOREto guarantee all modifications to the content table are finalized before updating the FTS5 index. - Avoid referencing data columns (e.g.,
id) in FTS5WHEREclauses; always targetrowid.
- Use
-
Concurrency and Isolation:
- Wrap operations affecting both the content table and FTS5 index in transactions to atomicity.
- Monitor trigger execution order when multiple triggers exist on the same table (e.g.,
my_table_sort_default_value).
By adhering to these principles, developers can eliminate corruption risks and maintain robust full-text search functionality in SQLite.