Malformed Database Error When FTS5 Content Table and Replace Conflict Handling Interact via Triggers
Understanding the Malformed Database Error in FTS5 External Content Tables with Replace Operations
Scenario: Trigger-Based FTS5 Index Updates and Replace Conflict Handling
The core issue revolves around a malformed database error (SQLITE_CORRUPT
, error code 11) that occurs when querying an FTS5 virtual table configured with an external content table (a user-defined table acting as the data source). The error manifests after executing INSERT OR REPLACE
statements on the content table, which are propagated to the FTS5 index via an AFTER INSERT
trigger. The problem arises specifically when querying the FTS5 table using a LIKE
operator with a trigram tokenizer. The sequence of operations involves:
- Creating a content table
t
with a composite primary key on columnsa
andb
. - Defining an FTS5 virtual table
ti
configured to uset
as its content table, with the trigram tokenizer. - Setting up an
AFTER INSERT
trigger ont
to propagate new rows toti
. - Executing two identical
INSERT OR REPLACE
statements ont
with the same primary key values. - Querying
ti
using aWHERE b LIKE '%alu%'
clause, which triggers the malformed database error.
The critical observation is that the REPLACE
conflict resolution strategy on the content table interacts poorly with the FTS5 external content table’s constraints. The documentation explicitly states that external content tables do not support REPLACE
conflict handling at the FTS5 level. However, the error here stems from the implicit interaction between the content table’s REPLACE
operations and the absence of corresponding cleanup logic in triggers when rows are replaced or deleted.
Root Causes: Missing Delete Propagation and FTS5 Index Inconsistency
The malformed database error is caused by an inconsistency between the content table t
and the FTS5 index ti
. When an INSERT OR REPLACE
operation is executed on t
, SQLite first attempts to delete any existing row with the same primary key before inserting the new row. However, the AFTER INSERT
trigger only propagates the insertion to ti
, not the deletion. This leaves the FTS5 index with stale entries that reference non-existent rows in the content table. The inconsistency is exposed when querying the FTS5 index, as the trigram tokenizer attempts to resolve invalid row references, leading to a corrupted database state.
Three primary factors contribute to this issue:
Incomplete Trigger Logic for Replace Operations: The
AFTER INSERT
trigger handles new rows but does not account for the implicit deletion caused byREPLACE
conflict resolution. Without anAFTER DELETE
trigger, the FTS5 index retains outdated entries, creating referential mismatches.FTS5 External Content Table Constraints: The FTS5 documentation explicitly prohibits
REPLACE
conflict handling on the virtual table itself. However, when the conflict resolution is applied to the content table (viaINSERT OR REPLACE
), the restriction is not inherently enforced. The FTS5 index assumes that the content table’s modifications are transactionally consistent, which is violated when deletions are not propagated.Trigram Tokenizer and Query Execution: The
LIKE '%alu%'
query leverages the trigram tokenizer’s internal index structures. Stale entries in the FTS5 index may reference invalid row IDs or data offsets, causing the tokenizer to access corrupted data pages during query execution.
Resolution: Ensuring Transactional Consistency Between Content and FTS5 Tables
To resolve the malformed database error, the content table’s REPLACE
operations must be fully propagated to the FTS5 index, ensuring that deletions and updates are reflected in the virtual table. The following steps outline the necessary fixes and preventative measures:
Step 1: Correct Schema Definition and Trigger Configuration
First, ensure the content table and triggers are defined with proper syntax and conflict handling:
-- Fix missing closing parenthesis in content table definition
CREATE TABLE t(a, b, PRIMARY KEY(a, b));
-- Define FTS5 virtual table with explicit column mapping
CREATE VIRTUAL TABLE ti USING fts5(a, b, content='t', tokenize='trigram');
-- Add triggers for INSERT, DELETE, and UPDATE operations
CREATE TRIGGER t_after_insert AFTER INSERT ON t BEGIN
INSERT INTO ti(rowid, a, b) VALUES (new.rowid, new.a, new.b);
END;
CREATE TRIGGER t_after_delete AFTER DELETE ON t BEGIN
DELETE FROM ti WHERE rowid = old.rowid;
END;
CREATE TRIGGER t_after_update AFTER UPDATE ON t BEGIN
DELETE FROM ti WHERE rowid = old.rowid;
INSERT INTO ti(rowid, a, b) VALUES (new.rowid, new.a, new.b);
END;
The AFTER DELETE
and AFTER UPDATE
triggers ensure that the FTS5 index is updated when rows are replaced or modified. Without these, the FTS5 index will accumulate orphaned entries.
Step 2: Validate SQLite Version and Configuration
The behavior of FTS5 external content tables and conflict handling can vary between SQLite versions. Verify that the SQLite version in use is 3.43.0 or newer (as of 2023), as earlier versions may have undocumented constraints. Use the following command to check the version:
SELECT sqlite_version();
If using an older version, upgrade to the latest stable release to ensure compatibility with FTS5 external content table best practices.
Step 3: Use Proper String Literals and Syntax
While not directly causing the corruption, using double quotes for string literals ("key1"
, "value1"
) is non-standard and can lead to parsing ambiguities. Use single quotes for literals to adhere to SQL standards:
INSERT OR REPLACE INTO t(a, b) VALUES ('key1', 'value1');
Step 4: Test with a Minimal Reproducible Example
After implementing the fixes, validate the solution with a minimal test case:
-- Clean up previous objects
DROP TABLE IF EXISTS t;
DROP TABLE IF EXISTS ti;
-- Recreate tables and triggers with corrected syntax
CREATE TABLE t(a, b, PRIMARY KEY(a, b));
CREATE VIRTUAL TABLE ti USING fts5(a, b, content='t', tokenize='trigram');
CREATE TRIGGER t_after_insert AFTER INSERT ON t BEGIN
INSERT INTO ti(rowid, a, b) VALUES (new.rowid, new.a, new.b);
END;
CREATE TRIGGER t_after_delete AFTER DELETE ON t BEGIN
DELETE FROM ti WHERE rowid = old.rowid;
END;
-- Insert initial data
INSERT OR REPLACE INTO t(a, b) VALUES ('key1', 'value1');
INSERT OR REPLACE INTO t(a, b) VALUES ('key1', 'value1'); -- Triggers REPLACE
-- Query FTS5 index
SELECT * FROM ti WHERE b LIKE '%alu%';
With the AFTER DELETE
trigger in place, the second INSERT OR REPLACE
will first delete the existing row in t
, which the trigger propagates to ti
. The subsequent insert updates ti
with the new row ID, maintaining consistency.
Step 5: Monitor Transaction Boundaries and Isolation
In complex applications, ensure that operations on the content table and FTS5 index are wrapped in explicit transactions to prevent partial updates:
BEGIN;
INSERT OR REPLACE INTO t(a, b) VALUES ('key1', 'value1');
COMMIT;
This guarantees that the delete and insert operations triggered by REPLACE
are atomic, avoiding intermediate states where the FTS5 index is out of sync with the content table.
Step 6: Analyze FTS5 Integrity (Advanced)
For databases that have already encountered corruption, use the integrity-check
pragma to identify inconsistencies:
PRAGMA quick_check;
If corruption is detected, restore the database from a backup or rebuild the FTS5 index. To rebuild ti
:
-- WARNING: This will delete all data in ti
INSERT INTO ti(ti) VALUES('rebuild');
Rebuilding the FTS5 index ensures that it accurately reflects the current state of the content table.
Key Takeaways and Best Practices
Complete Trigger Coverage: Always define
AFTER DELETE
andAFTER UPDATE
triggers when using FTS5 external content tables withINSERT OR REPLACE
. This ensures that all modifications to the content table are mirrored in the FTS5 index.Conflict Handling Awareness: Recognize that
REPLACE
conflict resolution on the content table implicitly deletes rows. These deletions must be propagated to dependent structures like FTS5 indexes.Version Compatibility: Regularly update SQLite to leverage fixes and improvements in FTS5 and virtual table implementations.
Syntax Rigor: Use single quotes for string literals and validate schema definitions to avoid parser-related ambiguities.
By addressing the trigger logic gap and adhering to transactional best practices, developers can prevent malformed database errors in FTS5-backed systems.