Using FTS5 with Metadata: UNINDEXED Columns vs. External Content Tables
FTS5 Indexing Strategies for Combining Searchable Content and Non-Indexed Metadata
Core Challenge: Selective Column Indexing in Full-Text Search Tables
The central technical challenge revolves around designing an SQLite FTS5 virtual table that indexes specific text content (e.g., message bodies) while excluding associated metadata (e.g., timestamps, author names) from the full-text search index. This requires balancing three objectives:
- Efficient full-text query performance on targeted columns
- Retention of non-searchable metadata within the same logical row structure
- Prevention of accidental metadata inclusion in FTS5 tokenization workflows
Two primary SQLite mechanisms address this: UNINDEXED column declarations and external content tables. Each approach modifies how FTS5 stores, indexes, and retrieves data, with implications for schema design, query patterns, and maintenance overhead.
Anatomy of UNINDEXED Columns
When defining an FTS5 table, appending UNINDEXED to a column definition excludes it from tokenization while retaining it as a standard SQL column. For example:
CREATE VIRTUAL TABLE messages USING fts5(
message_body,
channel UNINDEXED,
post_date UNINDEXED,
author UNINDEXED
);
The message_body column undergoes FTS5 tokenization and appears in the internal messages_idx index structure. The channel, post_date, and author columns are stored as raw values in the underlying messages_content shadow table but bypass all text processing. Queries using MATCH operate exclusively on message_body, while standard SQL operators (=, LIKE) can filter UNINDEXED columns.
External Content Table Paradigm
External content tables decouple the FTS5 index from the source data storage:
-- Base table holding all data
CREATE TABLE messages_data(
id INTEGER PRIMARY KEY,
message_body TEXT,
channel TEXT,
post_date DATETIME,
author TEXT
);
-- FTS5 table referencing external content
CREATE VIRTUAL TABLE messages_fts USING fts5(
message_body,
content='messages_data',
content_rowid='id'
);
Here, messages_fts contains only the tokenized message_body data. The content parameter links it to messages_data, with content_rowid aligning the FTS5 rowid to the base table’s primary key. The FTS5 table becomes a lightweight index that must be synchronized with the base table through triggers or application logic.
Pitfalls in Metadata Handling Across Indexing Strategies
Both approaches introduce potential failure modes that manifest during schema changes, query execution, or data synchronization:
UNINDEXED Column Misconfiguration
- Omitting
UNINDEXEDqualifiers causes metadata columns to undergo tokenization, bloating the FTS5 index and enabling unintendedMATCHoperations on non-content fields. - Attempting to use
MATCHon UNINDEXED columns yields silent failures (no results) since they’re absent from the FTS5 index.
External Content Synchronization Failures
- Direct inserts/updates to the FTS5 table without corresponding base table modifications create phantom rows visible only to full-text queries.
- Base table changes not propagated to the FTS5 table cause search results to reference stale or nonexistent content.
Schema Drift Vulnerabilities
- Adding or modifying columns in the base table without updating associated FTS5 triggers or queries breaks referential integrity.
- Altering UNINDEXED column types (e.g., changing
post_datefromDATETIMEtoINTEGER) in FTS5 tables requires full table rebuilds due to SQLite’sALTER TABLElimitations.
Resolution Framework: Schema Design and Query Optimization
Addressing these issues requires methodical validation of schema definitions, trigger logic (for external content), and query patterns.
Step 1: Validate UNINDEXED Column Usage
For schemas using UNINDEXED columns, confirm that:
- All non-content columns are explicitly marked
UNINDEXED - The FTS5 table’s
CREATE VIRTUAL TABLEstatement doesn’t include tokenization options (e.g.,tokenize=porter) for UNINDEXED columns, as they’re ignored
Diagnostic Query
SELECT
name,
fts5(:tableName, name) AS indexed
FROM pragma_table_info(:tableName);
Replace :tableName with your FTS5 table name. The indexed column returns 1 for indexed (tokenized) columns and 0 for UNINDEXED.
Step 2: External Content Trigger Auditing
When using external content tables, three triggers must exist to maintain consistency:
- AFTER INSERT on base table: Inserts corresponding FTS5 entry
- AFTER UPDATE on base table: Updates corresponding FTS5 entry
- AFTER DELETE on base table: Deletes corresponding FTS5 entry
Trigger Implementation Example
-- After Insert
CREATE TRIGGER messages_data_ai AFTER INSERT ON messages_data BEGIN
INSERT INTO messages_fts(rowid, message_body) VALUES (new.id, new.message_body);
END;
-- After Update
CREATE TRIGGER messages_data_au AFTER UPDATE ON messages_data BEGIN
UPDATE messages_fts SET message_body = new.message_body WHERE rowid = old.id;
END;
-- After Delete
CREATE TRIGGER messages_data_ad AFTER DELETE ON messages_data BEGIN
DELETE FROM messages_fts WHERE rowid = old.id;
END;
Validation Procedure
- Confirm trigger existence via
sqlite_masterquery:SELECT name, sql FROM sqlite_master WHERE type = 'trigger' AND tbl_name = 'messages_data'; - Test trigger functionality by inserting/updating/deleting base table rows and verifying FTS5 table changes.
Step 3: Query Pattern Enforcement
Ensure application queries adhere to the indexing strategy:
- UNINDEXED Tables: Use
MATCHonly on indexed columns; filter others with standard SQL.-- Correct SELECT * FROM messages WHERE messages MATCH 'search_term' AND author = 'John'; -- Incorrect (MATCH on UNINDEXED column) SELECT * FROM messages WHERE author MATCH 'John'; -- Fails silently - External Content Tables: Join FTS5 and base tables using
rowid:SELECT m.* FROM messages_data m JOIN messages_fts f ON m.id = f.rowid WHERE f.message_body MATCH 'search_term';
Step 4: Migration and Schema Change Protocols
Modifying existing FTS5 schemas requires careful planning:
Adding Columns to UNINDEXED FTS5 Tables
- Create new FTS5 table with additional UNINDEXED columns
- Copy data from old FTS5 table:
INSERT INTO new_fts SELECT * FROM old_fts; - Recreate dependent triggers/views
Altering External Content Schemas
- Update base table schema
- Adjust FTS5 triggers to handle new/modified columns
- Rebuild FTS5 index:
INSERT INTO messages_fts(messages_fts) VALUES ('rebuild');
Performance Optimization Techniques
- UNINDEXED Column Storage: UNINDEXED columns are stored in the
contentshadow table, which uses a B-tree structure. Frequent queries filtering on UNINDEXED columns benefit from adding explicit indexes:CREATE INDEX messages_channel_idx ON messages(channel); - External Content Query Planning: Use
EXPLAIN QUERY PLANto verify joins utilize base table indexes:EXPLAIN QUERY PLAN SELECT m.* FROM messages_data m JOIN messages_fts f ON m.id = f.rowid WHERE f MATCH 'sqlite' AND m.channel = 'general';Optimal output should show
SEARCH messages_fts VIRTUAL TABLE INDEX 0:Mfollowed bySEARCH messages_data USING INTEGER PRIMARY KEY (rowid=?).
Recovery from Common Failure Scenarios
Scenario 1: Accidental Tokenization of Metadata Columns
Symptoms: Full-text queries returning results based on metadata fields; inflated index size.
Resolution:
- Create new FTS5 table with proper UNINDEXED designations
- Migrate data:
INSERT INTO new_fts SELECT * FROM old_fts; - Drop old table and rename new one
Scenario 2: External Content Table Orphaned Rows
Symptoms: FTS5 search returns entries not present in the base table.
Resolution:
- Identify orphans:
SELECT f.rowid FROM messages_fts f LEFT JOIN messages_data m ON f.rowid = m.id WHERE m.id IS NULL; - Delete orphans:
DELETE FROM messages_fts WHERE rowid IN (:orphan_ids); - Audit triggers for missing logic
Scenario 3: Inconsistent FTS5 Index After Base Table Updates
Symptoms: Search results don’t reflect recent base table changes.
Resolution:
- Verify trigger activity using
PRAGMA trigger_list; - Manually repopulate the FTS5 index:
DELETE FROM messages_fts; INSERT INTO messages_fts SELECT rowid, message_body FROM messages_data;
Advanced Considerations: Hybrid Approaches
For scenarios requiring both UNINDEXED columns and external content benefits (e.g., large text blobs in the base table), combine the two methods:
-- Base table with large text field
CREATE TABLE messages_data(
id INTEGER PRIMARY KEY,
message_body TEXT,
channel TEXT,
post_date DATETIME,
author TEXT
);
-- FTS5 table with UNINDEXED metadata
CREATE VIRTUAL TABLE messages_fts USING fts5(
message_body,
channel UNINDEXED,
content='messages_data',
content_rowid='id'
);
This structure stores message_body in both the base table and FTS5 index (duplicated), while channel is stored as UNINDEXED in FTS5 but also resides in the base table. Trade-offs include increased storage for duplicated columns versus reduced join complexity.
Final Schema Recommendations
-
Use UNINDEXED Columns When:
- Metadata fields are small (e.g., strings under 255 chars)
- Queries frequently filter on metadata without requiring joins
- Simplified schema maintenance is preferred
-
Use External Content Tables When:
- Base table contains large non-text data (BLOBs, JSON)
- Existing normalization requires separate table storage
- FTS5 index size must be minimized
By methodically applying these diagnostics, repair strategies, and design patterns, developers can robustly implement FTS5 tables with non-indexed metadata while avoiding common pitfalls in query behavior, data integrity, and index maintenance.