SQLite FTS5 Shadow Table Mismatch Due to Non-Recursive Triggers
FTS5 Shadow Table Row Count Mismatch with Source Table
When using SQLite’s FTS5 (Full-Text Search) module, it is expected that the shadow tables associated with the virtual FTS5 table will maintain a consistent relationship with the source table. However, in this scenario, the licenses_fts_docsize
shadow table, which should ideally have the same number of rows as the licenses
table (7 rows), has ballooned to 9,141 rows. This discrepancy indicates a significant misalignment between the FTS5 index and the underlying data.
The licenses_fts
table is created as a virtual table using the FTS5 module, with its content sourced from the licenses
table. The licenses_fts_docsize
table, a shadow table, is automatically managed by SQLite to store document sizes for the FTS5 index. The licenses_fts_data
table, another shadow table, stores the actual indexed data. Under normal circumstances, the number of rows in licenses_fts_docsize
should match the number of rows in the licenses
table, as each row in the source table should correspond to a single entry in the licenses_fts_docsize
table.
The issue becomes more perplexing when considering that the licenses_fts
table itself correctly reflects the 7 rows from the licenses
table, but its associated shadow tables do not. This suggests that the FTS5 indexing mechanism is functioning correctly at a high level, but the underlying data structures are being corrupted or misaligned during data manipulation operations.
Non-Recursive Triggers and INSERT OR REPLACE Conflicts
The root cause of this issue lies in the interaction between SQLite’s INSERT OR REPLACE
statement and the behavior of non-recursive triggers. When INSERT OR REPLACE
is used on a table with a TEXT PRIMARY KEY
, SQLite performs a DELETE
operation followed by an INSERT
operation internally. However, if the PRAGMA recursive_triggers
setting is not enabled, the DELETE
triggers associated with the table will not fire. This means that the licenses_ad
trigger, which is responsible for removing entries from the licenses_fts
table when a row is deleted from the licenses
table, will not execute.
The licenses
table has three triggers defined: licenses_ai
(after insert), licenses_ad
(after delete), and licenses_au
(after update). These triggers are designed to keep the licenses_fts
table in sync with the licenses
table. However, due to the non-recursive nature of the triggers, the licenses_ad
trigger does not fire when INSERT OR REPLACE
is used. As a result, the licenses_fts
table is not properly updated, leading to orphaned rows in the licenses_fts_docsize
table.
The licenses
table uses a TEXT PRIMARY KEY
, which means that the rowid
column is not explicitly defined. While this does not inherently cause issues, it does mean that the rowid
values are not persistent and can change under certain conditions, such as when a VACUUM
operation is performed. However, in this case, the rowid
values remain consistent even after a VACUUM
operation, indicating that the issue is not related to rowid
instability.
Enabling Recursive Triggers and Rebuilding FTS5 Index
To resolve this issue, the first step is to enable recursive triggers by executing PRAGMA recursive_triggers = 1;
. This ensures that the licenses_ad
trigger will fire when INSERT OR REPLACE
is used, allowing the licenses_fts
table to be properly updated. Enabling recursive triggers prevents the accumulation of orphaned rows in the licenses_fts_docsize
table.
Once recursive triggers are enabled, the next step is to clean up the existing orphaned rows in the licenses_fts_docsize
table. This can be done by running a DELETE
statement to remove rows that do not have corresponding entries in the licenses_fts
table:
DELETE FROM licenses_fts_docsize
WHERE rowid NOT IN (SELECT rowid FROM licenses_fts);
However, a more robust solution is to rebuild the FTS5 index from scratch. This can be achieved using the rebuild
command, which reconstructs the FTS5 index based on the current contents of the source table:
INSERT INTO licenses_fts(licenses_fts) VALUES('rebuild');
The rebuild
command ensures that the licenses_fts_docsize
and licenses_fts_data
tables are correctly populated, eliminating any orphaned rows and ensuring that the FTS5 index is in sync with the licenses
table. This approach is more efficient and less error-prone than manually deleting orphaned rows.
In addition to these steps, it is important to ensure that all future operations on the licenses
table are performed with recursive triggers enabled. This can be done by setting the PRAGMA recursive_triggers
option at the beginning of any script or application that interacts with the database.
Finally, it is worth noting that the licenses_fts_data
table, which stores the actual indexed data, may also contain orphaned rows. While these rows do not directly impact the functionality of the FTS5 index, they can be cleaned up using a similar approach:
DELETE FROM licenses_fts_data
WHERE rowid NOT IN (SELECT rowid FROM licenses_fts);
However, the rebuild
command will also address any inconsistencies in the licenses_fts_data
table, making manual cleanup unnecessary in most cases.
By following these steps, the issue of mismatched row counts between the licenses
table and its associated FTS5 shadow tables can be effectively resolved, ensuring that the FTS5 index remains accurate and consistent with the underlying data.