Corrupt FTS5 Index Due to Incorrect Trigger Configuration with External Content

FTS5 External Content Table Corruption and Trigger-Induced Inconsistencies

Root Cause: Mismanagement of FTS5 Rowid and External Content Table Relationships

The core issue arises from improper synchronization between an external content table (my_table) and its associated FTS5 virtual table (my_fts) due to misconfigured triggers. The original trigger logic failed to respect the rowid-based linkage required by FTS5 for external content tables. Specifically:

  1. Misalignment of FTS5 Rowid and Content Table Identifier:
    The FTS5 virtual table relies on an implicit or explicit rowid column to map indexed entries to rows in the external content table. In the original triggers, inserts into my_fts explicitly populated the id column (a user-defined field marked UNINDEXED) instead of the FTS5 rowid. This caused the FTS5 index to reference invalid or mismatched rows in my_table, as the rowid of the FTS5 table (which defaults to an auto-incremented value if not explicitly set) no longer aligned with the id column of my_table (which serves as its INTEGER PRIMARY KEY and thus corresponds to SQLite’s implicit rowid).

  2. Incomplete Handling of Updates and Deletions:
    The original BEFORE UPDATE and BEFORE DELETE triggers attempted to remove entries from my_fts using a WHERE id=old.id clause. However, this directly manipulated the id column of the FTS5 table rather than targeting its rowid. Since FTS5’s internal index is organized around rowid, deletions based on id left orphaned index entries, destabilizing the relationship between the virtual table and the content table. Concurrent operations (e.g., rapid updates) exacerbated this inconsistency, leading to corruption.

  3. Race Conditions from Trigger Execution Order:
    The my_table_sort_default_value trigger, which updates the sort column after inserts, introduced a subtle race condition. If this trigger fired after the FTS5 sync triggers, the FTS5 index could capture stale data from my_table before the sort column was finalized. While not directly responsible for corruption, this created opportunities for transient inconsistencies during high-frequency write operations.

Mechanisms of Corruption in FTS5 External Content Tables

FTS5 external content tables operate under strict assumptions:

  • The rowid of the FTS5 table must correspond to the rowid (or aliased INTEGER PRIMARY KEY) of the external content table.
  • Direct modifications to the FTS5 table (via INSERT, UPDATE, or DELETE) must use FTS5-specific syntax to ensure index integrity. For example, deletions require the 'delete' command verb to signal index maintenance.

In the original setup:

  • Inserting into my_fts(id, meaning) populated the id column (a data field) but left the FTS5 rowid auto-assigned. This decoupled the index from my_table’s id, causing FTS5 to reference incorrect rows during queries.
  • Deletions via DELETE FROM my_fts WHERE id=old.id failed to remove the correct index entries, as they targeted a data column instead of the rowid. Over time, these orphaned entries corrupted the index structure.

Resolving Corruption Through Rowid-Centric Trigger Design

The corrected triggers address these issues by:

  1. Explicitly Setting FTS5 Rowid During Inserts:
    Using INSERT INTO my_fts(rowid, meaning) ensures the FTS5 index’s rowid matches my_table’s id, preserving the 1:1 mapping required for external content tables.

  2. Using FTS5’s Delete Command for Index Maintenance:
    The INSERT INTO my_fts(my_fts, rowid, ...) VALUES('delete', ...) syntax triggers FTS5’s internal deletion mechanism, which properly invalidates index entries associated with the specified rowid.

  3. Sequencing Updates as Delete-Insert Pairs:
    The revised AFTER UPDATE trigger first removes the old index entry (via 'delete') and then inserts the new state, preventing overlapping references to the same rowid.

Implementation Guidelines for Stable FTS5 External Content Tables

  1. Schema Design:

    • Ensure the external content table declares an INTEGER PRIMARY KEY (e.g., id) to alias SQLite’s rowid.
    • Define FTS5 virtual tables with content='my_table' and explicitly map the rowid during trigger operations.
  2. Trigger Best Practices:

    • Use AFTER triggers instead of BEFORE to guarantee all modifications to the content table are finalized before updating the FTS5 index.
    • Avoid referencing data columns (e.g., id) in FTS5 WHERE clauses; always target rowid.
  3. Concurrency and Isolation:

    • Wrap operations affecting both the content table and FTS5 index in transactions to atomicity.
    • Monitor trigger execution order when multiple triggers exist on the same table (e.g., my_table_sort_default_value).

By adhering to these principles, developers can eliminate corruption risks and maintain robust full-text search functionality in SQLite.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *