Resolving FTS5 External Content Table Errors: Malformed Database & Missing Data


Issue Overview: FTS5 External Content Tables Failing to Populate & Causing Database Corruption

When working with SQLite’s Full-Text Search (FTS5) extension and external content tables, developers often encounter two critical issues:

  1. The FTS5 virtual table remains empty after inserting data into the external content table.
  2. Querying the FTS5 table triggers a "database disk image is malformed" error.

These problems stem from mismatches between the schema design of the external content table and the configuration of the FTS5 virtual table. Specifically, the interaction between the content_rowid parameter in FTS5 and the primary key definition of the external content table is often misunderstood. The FTS5 module expects a stable integer reference to rows in the external content table, and deviations from this requirement lead to silent failures or catastrophic database errors.

The confusion arises because SQLite allows developers to define non-integer primary keys for external content tables without immediate validation. The FTS5 module only detects schema incompatibilities at runtime, often during query execution rather than at table creation. This delayed error reporting complicates debugging, as the connection between schema design flaws and runtime failures is not immediately obvious.


Root Causes: Schema Design Flaws in FTS5 External Content Configuration

1. Invalid content_rowid Specification

The content_rowid parameter in FTS5 must reference an INTEGER PRIMARY KEY column in the external content table. SQLite uses this column to maintain a stable 1:1 relationship between FTS5 index entries and their corresponding rows in the external table. Common mistakes include:

  • Using a TEXT primary key for the external content table while specifying content_rowid=Key
  • Relying on SQLite’s implicit rowid without understanding its volatility

When content_rowid points to a non-integer column or an unstable row identifier, FTS5 cannot correctly map index entries to external table rows. This corrupts the internal data structures of the FTS5 table, eventually triggering the "malformed database" error during queries.

2. Implicit rowid Instability

SQLite automatically creates an implicit rowid column (aliased as ROWID, _ROWID_, or OID) for tables without an explicit INTEGER PRIMARY KEY. However:

  • This value can change during VACUUM operations or row deletions
  • FTS5 does not track changes to the implicit rowid, leading to orphaned index entries

Developers who configure content_rowid=rowid without declaring an explicit integer primary key introduce hidden instability. While this setup may appear functional initially, it becomes unreliable over time as row IDs shift.

3. Schema Validation Gaps

SQLite does not enforce type compatibility between the content_rowid column and FTS5’s internal requirements at table creation time. The system permits:

  • Virtual FTS5 tables referencing non-integer content_rowid columns
  • Mismatched data types between external content keys and FTS5 index entries

These oversights are only detected during data insertion or query execution, making debugging challenging. The absence of upfront schema validation shifts the burden to developers to manually verify type compatibility.


Solutions: Correctly Configuring FTS5 External Content Tables

1. Define an Explicit INTEGER PRIMARY Key in External Content Table

Modify the external content table schema to include a stable integer identifier:

-- Incorrect: TEXT primary key
CREATE TABLE t1 (
  Key TEXT PRIMARY KEY,
  Value TEXT
);

-- Correct: INTEGER primary key
CREATE TABLE t1 (
  Key INTEGER PRIMARY KEY,
  Value TEXT
);

This ensures the content_rowid parameter in FTS5 references a guaranteed-stable integer value. If business logic requires a non-integer key (e.g., UUIDs), add a separate integer column dedicated to FTS5 indexing:

CREATE TABLE t1 (
  BusinessKey TEXT PRIMARY KEY,  -- Natural key for application use
  FtsRowId INTEGER PRIMARY KEY,  -- Dedicated column for FTS5
  Value TEXT
);

2. Configure FTS5 with Valid content_rowid Parameter

Explicitly map the FTS5 virtual table to the external content’s integer key:

-- Incorrect: Points to TEXT column
CREATE VIRTUAL TABLE tfts USING fts5(
  Key,
  Value,
  content=t1,
  content_rowid=Key  -- Fails if Key is TEXT
);

-- Correct: References INTEGER column
CREATE VIRTUAL TABLE tfts USING fts5(
  Key,
  Value,
  content=t1,
  content_rowid=FtsRowId  -- Explicit integer column
);

When using the external table’s implicit rowid, omit content_rowid or set it explicitly:

-- Using implicit rowid (not recommended for production)
CREATE VIRTUAL TABLE tfts USING fts5(
  Key,
  Value,
  content=t1,
  content_rowid=rowid
);

3. Implement Robust Data Synchronization

FTS5 does not automatically mirror changes to the external content table. Developers must manually maintain parity:

A. Insert Triggers

CREATE TRIGGER t1_after_insert AFTER INSERT ON t1
BEGIN
  INSERT INTO tfts(rowid, Key, Value)
  VALUES (NEW.FtsRowId, NEW.Key, NEW.Value);
END;

B. Update Triggers

CREATE TRIGGER t1_after_update AFTER UPDATE ON t1
BEGIN
  UPDATE tfts
  SET Key = NEW.Key,
      Value = NEW.Value
  WHERE rowid = OLD.FtsRowId;
END;

C. Delete Triggers

CREATE TRIGGER t1_after_delete AFTER DELETE ON t1
BEGIN
  DELETE FROM tfts WHERE rowid = OLD.FtsRowId;
END;

4. Validate Schema Compatibility Programmatically

Add pre-flight checks in application code to verify FTS5 configuration:

def validate_fts5_config(connection, fts_table, content_table):
    # Get FTS5 parameters
    fts_info = connection.execute(f"""
        SELECT *
        FROM pragma_table_info('{fts_table}')
    """).fetchall()
    
    # Retrieve content_table's primary key
    content_pk = connection.execute(f"""
        SELECT name 
        FROM pragma_table_info('{content_table}')
        WHERE pk > 0
    """).fetchone()
    
    # Verify content_rowid is INTEGER
    if content_pk:
        pk_type = connection.execute(f"""
            SELECT type 
            FROM pragma_table_info('{content_table}')
            WHERE name = ?
        """, (content_pk[0],)).fetchone()[1]
        if pk_type.upper() != 'INTEGER':
            raise TypeError(f"Content table PK {content_pk[0]} must be INTEGER")

5. Handle Implicit rowid Volatility

If using the external table’s implicit rowid (not recommended for production):

  • Disable VACUUM operations that rebuild the database
  • Avoid deleting rows from the external content table
  • Regularly rebuild the FTS5 index to purge orphaned entries:
INSERT INTO tfts(tfts) VALUES('rebuild');

6. Debugging "Malformed Database" Errors

When encountering database corruption:

  1. Dump FTS5 Table Structure:
    SELECT * FROM sqlite_master WHERE name = 'tfts';
    
  2. Check External Content Integrity:
    PRAGMA quick_check;
    
  3. Rebuild FTS5 Index:
    INSERT INTO tfts(tfts) VALUES('rebuild');
    
  4. Migrate to New Database:
    sqlite3 corrupted.db ".clone new.db"
    

Key Takeaways

  1. Stable Integer Keys Are Mandatory: FTS5’s content_rowid must reference an INTEGER PRIMARY KEY column in the external content table.
  2. Implicit rowid Is Fragile: Avoid relying on SQLite’s automatic row IDs for FTS5 integration in production systems.
  3. Manual Synchronization Required: Implement triggers or application logic to keep FTS5 indexes synchronized with external content changes.
  4. Schema Validation Is Critical: Add programmatic checks to enforce compatibility between external tables and FTS5 configurations during application startup.

By strictly adhering to these practices, developers can leverage FTS5’s external content tables while avoiding database corruption and indexing failures. The solution requires upfront schema design rigor but pays dividends in maintainability and performance.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *