SQLite FTS5 Shadow Table Mismatch Due to Non-Recursive Triggers

FTS5 Shadow Table Row Count Mismatch with Source Table

When using SQLite’s FTS5 (Full-Text Search) module, it is expected that the shadow tables associated with the virtual FTS5 table will maintain a consistent relationship with the source table. However, in this scenario, the licenses_fts_docsize shadow table, which should ideally have the same number of rows as the licenses table (7 rows), has ballooned to 9,141 rows. This discrepancy indicates a significant misalignment between the FTS5 index and the underlying data.

The licenses_fts table is created as a virtual table using the FTS5 module, with its content sourced from the licenses table. The licenses_fts_docsize table, a shadow table, is automatically managed by SQLite to store document sizes for the FTS5 index. The licenses_fts_data table, another shadow table, stores the actual indexed data. Under normal circumstances, the number of rows in licenses_fts_docsize should match the number of rows in the licenses table, as each row in the source table should correspond to a single entry in the licenses_fts_docsize table.

The issue becomes more perplexing when considering that the licenses_fts table itself correctly reflects the 7 rows from the licenses table, but its associated shadow tables do not. This suggests that the FTS5 indexing mechanism is functioning correctly at a high level, but the underlying data structures are being corrupted or misaligned during data manipulation operations.

Non-Recursive Triggers and INSERT OR REPLACE Conflicts

The root cause of this issue lies in the interaction between SQLite’s INSERT OR REPLACE statement and the behavior of non-recursive triggers. When INSERT OR REPLACE is used on a table with a TEXT PRIMARY KEY, SQLite performs a DELETE operation followed by an INSERT operation internally. However, if the PRAGMA recursive_triggers setting is not enabled, the DELETE triggers associated with the table will not fire. This means that the licenses_ad trigger, which is responsible for removing entries from the licenses_fts table when a row is deleted from the licenses table, will not execute.

The licenses table has three triggers defined: licenses_ai (after insert), licenses_ad (after delete), and licenses_au (after update). These triggers are designed to keep the licenses_fts table in sync with the licenses table. However, due to the non-recursive nature of the triggers, the licenses_ad trigger does not fire when INSERT OR REPLACE is used. As a result, the licenses_fts table is not properly updated, leading to orphaned rows in the licenses_fts_docsize table.

The licenses table uses a TEXT PRIMARY KEY, which means that the rowid column is not explicitly defined. While this does not inherently cause issues, it does mean that the rowid values are not persistent and can change under certain conditions, such as when a VACUUM operation is performed. However, in this case, the rowid values remain consistent even after a VACUUM operation, indicating that the issue is not related to rowid instability.

Enabling Recursive Triggers and Rebuilding FTS5 Index

To resolve this issue, the first step is to enable recursive triggers by executing PRAGMA recursive_triggers = 1;. This ensures that the licenses_ad trigger will fire when INSERT OR REPLACE is used, allowing the licenses_fts table to be properly updated. Enabling recursive triggers prevents the accumulation of orphaned rows in the licenses_fts_docsize table.

Once recursive triggers are enabled, the next step is to clean up the existing orphaned rows in the licenses_fts_docsize table. This can be done by running a DELETE statement to remove rows that do not have corresponding entries in the licenses_fts table:

DELETE FROM licenses_fts_docsize
WHERE rowid NOT IN (SELECT rowid FROM licenses_fts);

However, a more robust solution is to rebuild the FTS5 index from scratch. This can be achieved using the rebuild command, which reconstructs the FTS5 index based on the current contents of the source table:

INSERT INTO licenses_fts(licenses_fts) VALUES('rebuild');

The rebuild command ensures that the licenses_fts_docsize and licenses_fts_data tables are correctly populated, eliminating any orphaned rows and ensuring that the FTS5 index is in sync with the licenses table. This approach is more efficient and less error-prone than manually deleting orphaned rows.

In addition to these steps, it is important to ensure that all future operations on the licenses table are performed with recursive triggers enabled. This can be done by setting the PRAGMA recursive_triggers option at the beginning of any script or application that interacts with the database.

Finally, it is worth noting that the licenses_fts_data table, which stores the actual indexed data, may also contain orphaned rows. While these rows do not directly impact the functionality of the FTS5 index, they can be cleaned up using a similar approach:

DELETE FROM licenses_fts_data
WHERE rowid NOT IN (SELECT rowid FROM licenses_fts);

However, the rebuild command will also address any inconsistencies in the licenses_fts_data table, making manual cleanup unnecessary in most cases.

By following these steps, the issue of mismatched row counts between the licenses table and its associated FTS5 shadow tables can be effectively resolved, ensuring that the FTS5 index remains accurate and consistent with the underlying data.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *