Resolving FTS5 Integrity Check Failures with Soft Deletes and Content Tables


Understanding FTS5 Content Table Mismatches and Soft Delete Scenarios

Issue Overview: Integrity Check Failures in FTS5 with Separable Content Tables

When using SQLite’s FTS5 extension with a separable content table (via the content= option), developers often encounter a "database disk image is malformed" error during integrity checks. This occurs when the content table and the FTS5 index become inconsistent. A common scenario involves implementing soft deletes, where rows are marked as inactive (e.g., via an is_deleted flag) but remain in the content table. The FTS5 index, however, may exclude these rows, leading to discrepancies that trigger integrity check failures.

In the provided example, the user table serves as the content source for the user_fts FTS5 virtual table. A trigger populates the FTS index when new users are added. When a row is soft-deleted by flagging is_deleted=1, the corresponding entry is removed from the FTS index via an INSERT ... VALUES('delete', ...) statement. However, the original row remains in the user table. The 'integrity-check' command then detects a mismatch between the content table and the FTS index, resulting in the malformed database error.

The root challenge lies in reconciling the FTS5 index’s expectations of parity with the content table when soft-deleted rows are excluded from searches but retained in the underlying data. This conflict arises from FTS5’s design: when using a separable content table, the extension assumes that the content table and FTS index are exact mirrors of each other. Any deviation—such as retaining soft-deleted rows in the content table but removing them from the FTS index—violates this assumption.


Possible Causes: Why FTS5 and Content Tables Become Desynchronized

  1. Direct Modifications to the FTS Index Without Updating the Content Table
    When using INSERT INTO user_fts(user_fts, ...) VALUES('delete', ...) to remove entries from the FTS index, the corresponding rows in the content table (user) remain untouched. FTS5’s integrity-check mechanism compares the content table’s data against the index and flags inconsistencies as malformations. This is intentional: the content= option delegates synchronization responsibility to the developer, and FTS5 does not automatically propagate changes between the two structures.

  2. Soft Delete Workflows That Bypass FTS5 Index Updates
    Soft delete implementations often mark rows as inactive without deleting them outright. If the FTS index is updated to exclude these rows (via 'delete' commands), but the content table retains them, the integrity check fails. This is exacerbated when triggers or application logic selectively modify the FTS index without reflecting those changes in the content table.

  3. Use of Views or Filtered Content Sources Without Proper Isolation
    Attempting to use a view (e.g., CREATE VIEW userview AS SELECT ... WHERE is_deleted=0) as the content table for FTS5 can resolve integrity issues if the view excludes soft-deleted rows. However, improper configuration—such as omitting the content_rowid parameter or failing to maintain the view’s schema—can reintroduce discrepancies. Views must precisely mirror the structure and row identifiers of the underlying table to ensure FTS5 can reconcile the index with the content source.

  4. Transactional Isolation and Partial Updates
    If updates to the content table and FTS index are not wrapped in transactions, partial failures can leave the two structures out of sync. For example, a soft delete operation that updates is_deleted=1 in the user table but fails to delete the corresponding FTS index entry (or vice versa) creates an inconsistency detectable by the integrity check.


Troubleshooting Steps, Solutions, and Fixes

Step 1: Validate the Content Table and FTS Index Relationship
Begin by confirming that the FTS5 virtual table is correctly configured to reference the content table. Verify the content= and content_rowid= parameters in the CREATE VIRTUAL TABLE statement. For example:

CREATE VIRTUAL TABLE user_fts USING fts5(
  bio,
  content='user',       -- Content table name
  content_rowid='id'    -- Rowid column in the content table
);

If the content_rowid is omitted or incorrectly specified, FTS5 cannot map index entries to content table rows, leading to integrity check failures.

Step 2: Reconcile Soft Deletes with FTS5 Index Management
To support soft deletes while maintaining integrity:

  • Option A: Use a Filtered View as the Content Source
    Replace the direct reference to the user table with a view that excludes soft-deleted rows:

    CREATE VIEW userview AS
    SELECT id, name, bio FROM user WHERE is_deleted = 0;
    
    CREATE VIRTUAL TABLE user_fts USING fts5(
      bio,
      content='userview',   -- Reference the view instead of the base table
      content_rowid='id'
    );
    

    The FTS5 index will now reflect only non-deleted rows, and integrity checks will compare the index against the filtered view. Ensure the view includes all columns referenced in the FTS5 table definition.

  • Option B: Temporarily Adjust Content Table During Integrity Checks
    If retaining soft-deleted rows in the content table is non-negotiable, disable integrity checks during normal operation. Before running an integrity check, temporarily remove soft-deleted rows in a transaction:

    BEGIN TRANSACTION;
    -- Create a backup of soft-deleted rows
    CREATE TEMP TABLE deleted_users_backup AS 
    SELECT * FROM user WHERE is_deleted = 1;
    
    -- Remove soft-deleted rows from the content table
    DELETE FROM user WHERE is_deleted = 1;
    
    -- Perform integrity check
    INSERT INTO user_fts(user_fts, rank) VALUES('integrity-check', 1);
    
    -- Restore soft-deleted rows
    INSERT INTO user SELECT * FROM deleted_users_backup;
    DROP TABLE deleted_users_backup;
    COMMIT;
    

    This approach allows periodic integrity verification without permanently altering the content table.

Step 3: Revise Triggers to Align with Soft Delete Logic
Ensure triggers update both the content table and FTS index consistently. For soft deletes, modify the AFTER UPDATE trigger to conditionally remove entries from the FTS index:

CREATE TRIGGER user_au AFTER UPDATE ON user
WHEN (OLD.is_deleted <> NEW.is_deleted AND NEW.is_deleted = 1)
BEGIN
  -- Remove from FTS index
  INSERT INTO user_fts(user_fts, rowid, bio) VALUES ('delete', NEW.id, NEW.bio);
END;

For hard deletes (physical row removal), add a corresponding AFTER DELETE trigger:

CREATE TRIGGER user_ad AFTER DELETE ON user
BEGIN
  INSERT INTO user_fts(user_fts, rowid, bio) VALUES ('delete', OLD.id, OLD.bio);
END;

Step 4: Test Integrity Checks with Controlled Workflows
After implementing the above changes, validate the setup:

  1. Insert test data with mixed soft-deleted and active rows.
  2. Run the integrity check:
    INSERT INTO user_fts(user_fts, rank) VALUES('integrity-check', 1);
    
  3. If using a view, confirm that the view’s WHERE clause correctly excludes soft-deleted rows:
    SELECT * FROM userview WHERE id IN (SELECT rowid FROM user_fts);
    

    This query should return no results if the FTS index and view are synchronized.

Step 5: Monitor Transaction Boundaries and Concurrency
Wrap operations affecting both the content table and FTS index in transactions to prevent partial updates:

BEGIN TRANSACTION;
-- Soft delete a user
UPDATE user SET is_deleted = 1 WHERE id = 1;
-- Remove from FTS index
INSERT INTO user_fts(user_fts, rowid, bio) VALUES ('delete', 1, (SELECT bio FROM user WHERE id = 1));
COMMIT;

This ensures atomicity: either both the content table and FTS index are updated, or neither is.

Step 6: Consider Alternative Architectures Without content=
If maintaining synchronization proves too complex, avoid the content= option. Let FTS5 manage its own copy of the data:

CREATE VIRTUAL TABLE user_fts USING fts5(
  bio,
  content='',       -- Omit content= to let FTS5 store its own data
  content_rowid='id'
);

In this mode, FTS5 automatically mirrors the user table, and soft deletes require updating the FTS index via DELETE FROM user_fts WHERE rowid = ?. However, this duplicates storage and may not be suitable for large datasets.


By systematically addressing the relationship between content tables and FTS5 indexes, developers can implement soft deletes without triggering integrity errors. The choice between filtered views, transactional workarounds, or architectural adjustments depends on specific use cases and performance requirements.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *