Using FTS5 with Metadata: UNINDEXED Columns vs. External Content Tables
FTS5 Indexing Strategies for Combining Searchable Content and Non-Indexed Metadata
Core Challenge: Selective Column Indexing in Full-Text Search Tables
The central technical challenge revolves around designing an SQLite FTS5 virtual table that indexes specific text content (e.g., message bodies) while excluding associated metadata (e.g., timestamps, author names) from the full-text search index. This requires balancing three objectives:
- Efficient full-text query performance on targeted columns
- Retention of non-searchable metadata within the same logical row structure
- Prevention of accidental metadata inclusion in FTS5 tokenization workflows
Two primary SQLite mechanisms address this: UNINDEXED column declarations and external content tables. Each approach modifies how FTS5 stores, indexes, and retrieves data, with implications for schema design, query patterns, and maintenance overhead.
Anatomy of UNINDEXED Columns
When defining an FTS5 table, appending UNINDEXED
to a column definition excludes it from tokenization while retaining it as a standard SQL column. For example:
CREATE VIRTUAL TABLE messages USING fts5(
message_body,
channel UNINDEXED,
post_date UNINDEXED,
author UNINDEXED
);
The message_body
column undergoes FTS5 tokenization and appears in the internal messages_idx
index structure. The channel
, post_date
, and author
columns are stored as raw values in the underlying messages_content
shadow table but bypass all text processing. Queries using MATCH
operate exclusively on message_body
, while standard SQL operators (=
, LIKE
) can filter UNINDEXED columns.
External Content Table Paradigm
External content tables decouple the FTS5 index from the source data storage:
-- Base table holding all data
CREATE TABLE messages_data(
id INTEGER PRIMARY KEY,
message_body TEXT,
channel TEXT,
post_date DATETIME,
author TEXT
);
-- FTS5 table referencing external content
CREATE VIRTUAL TABLE messages_fts USING fts5(
message_body,
content='messages_data',
content_rowid='id'
);
Here, messages_fts
contains only the tokenized message_body
data. The content
parameter links it to messages_data
, with content_rowid
aligning the FTS5 rowid to the base table’s primary key. The FTS5 table becomes a lightweight index that must be synchronized with the base table through triggers or application logic.
Pitfalls in Metadata Handling Across Indexing Strategies
Both approaches introduce potential failure modes that manifest during schema changes, query execution, or data synchronization:
UNINDEXED Column Misconfiguration
- Omitting
UNINDEXED
qualifiers causes metadata columns to undergo tokenization, bloating the FTS5 index and enabling unintendedMATCH
operations on non-content fields. - Attempting to use
MATCH
on UNINDEXED columns yields silent failures (no results) since they’re absent from the FTS5 index.
External Content Synchronization Failures
- Direct inserts/updates to the FTS5 table without corresponding base table modifications create phantom rows visible only to full-text queries.
- Base table changes not propagated to the FTS5 table cause search results to reference stale or nonexistent content.
Schema Drift Vulnerabilities
- Adding or modifying columns in the base table without updating associated FTS5 triggers or queries breaks referential integrity.
- Altering UNINDEXED column types (e.g., changing
post_date
fromDATETIME
toINTEGER
) in FTS5 tables requires full table rebuilds due to SQLite’sALTER TABLE
limitations.
Resolution Framework: Schema Design and Query Optimization
Addressing these issues requires methodical validation of schema definitions, trigger logic (for external content), and query patterns.
Step 1: Validate UNINDEXED Column Usage
For schemas using UNINDEXED columns, confirm that:
- All non-content columns are explicitly marked
UNINDEXED
- The FTS5 table’s
CREATE VIRTUAL TABLE
statement doesn’t include tokenization options (e.g.,tokenize=porter
) for UNINDEXED columns, as they’re ignored
Diagnostic Query
SELECT
name,
fts5(:tableName, name) AS indexed
FROM pragma_table_info(:tableName);
Replace :tableName
with your FTS5 table name. The indexed
column returns 1 for indexed (tokenized) columns and 0 for UNINDEXED.
Step 2: External Content Trigger Auditing
When using external content tables, three triggers must exist to maintain consistency:
- AFTER INSERT on base table: Inserts corresponding FTS5 entry
- AFTER UPDATE on base table: Updates corresponding FTS5 entry
- AFTER DELETE on base table: Deletes corresponding FTS5 entry
Trigger Implementation Example
-- After Insert
CREATE TRIGGER messages_data_ai AFTER INSERT ON messages_data BEGIN
INSERT INTO messages_fts(rowid, message_body) VALUES (new.id, new.message_body);
END;
-- After Update
CREATE TRIGGER messages_data_au AFTER UPDATE ON messages_data BEGIN
UPDATE messages_fts SET message_body = new.message_body WHERE rowid = old.id;
END;
-- After Delete
CREATE TRIGGER messages_data_ad AFTER DELETE ON messages_data BEGIN
DELETE FROM messages_fts WHERE rowid = old.id;
END;
Validation Procedure
- Confirm trigger existence via
sqlite_master
query:SELECT name, sql FROM sqlite_master WHERE type = 'trigger' AND tbl_name = 'messages_data';
- Test trigger functionality by inserting/updating/deleting base table rows and verifying FTS5 table changes.
Step 3: Query Pattern Enforcement
Ensure application queries adhere to the indexing strategy:
- UNINDEXED Tables: Use
MATCH
only on indexed columns; filter others with standard SQL.-- Correct SELECT * FROM messages WHERE messages MATCH 'search_term' AND author = 'John'; -- Incorrect (MATCH on UNINDEXED column) SELECT * FROM messages WHERE author MATCH 'John'; -- Fails silently
- External Content Tables: Join FTS5 and base tables using
rowid
:SELECT m.* FROM messages_data m JOIN messages_fts f ON m.id = f.rowid WHERE f.message_body MATCH 'search_term';
Step 4: Migration and Schema Change Protocols
Modifying existing FTS5 schemas requires careful planning:
Adding Columns to UNINDEXED FTS5 Tables
- Create new FTS5 table with additional UNINDEXED columns
- Copy data from old FTS5 table:
INSERT INTO new_fts SELECT * FROM old_fts;
- Recreate dependent triggers/views
Altering External Content Schemas
- Update base table schema
- Adjust FTS5 triggers to handle new/modified columns
- Rebuild FTS5 index:
INSERT INTO messages_fts(messages_fts) VALUES ('rebuild');
Performance Optimization Techniques
- UNINDEXED Column Storage: UNINDEXED columns are stored in the
content
shadow table, which uses a B-tree structure. Frequent queries filtering on UNINDEXED columns benefit from adding explicit indexes:CREATE INDEX messages_channel_idx ON messages(channel);
- External Content Query Planning: Use
EXPLAIN QUERY PLAN
to verify joins utilize base table indexes:EXPLAIN QUERY PLAN SELECT m.* FROM messages_data m JOIN messages_fts f ON m.id = f.rowid WHERE f MATCH 'sqlite' AND m.channel = 'general';
Optimal output should show
SEARCH messages_fts VIRTUAL TABLE INDEX 0:M
followed bySEARCH messages_data USING INTEGER PRIMARY KEY (rowid=?)
.
Recovery from Common Failure Scenarios
Scenario 1: Accidental Tokenization of Metadata Columns
Symptoms: Full-text queries returning results based on metadata fields; inflated index size.
Resolution:
- Create new FTS5 table with proper UNINDEXED designations
- Migrate data:
INSERT INTO new_fts SELECT * FROM old_fts;
- Drop old table and rename new one
Scenario 2: External Content Table Orphaned Rows
Symptoms: FTS5 search returns entries not present in the base table.
Resolution:
- Identify orphans:
SELECT f.rowid FROM messages_fts f LEFT JOIN messages_data m ON f.rowid = m.id WHERE m.id IS NULL;
- Delete orphans:
DELETE FROM messages_fts WHERE rowid IN (:orphan_ids);
- Audit triggers for missing logic
Scenario 3: Inconsistent FTS5 Index After Base Table Updates
Symptoms: Search results don’t reflect recent base table changes.
Resolution:
- Verify trigger activity using
PRAGMA trigger_list;
- Manually repopulate the FTS5 index:
DELETE FROM messages_fts; INSERT INTO messages_fts SELECT rowid, message_body FROM messages_data;
Advanced Considerations: Hybrid Approaches
For scenarios requiring both UNINDEXED columns and external content benefits (e.g., large text blobs in the base table), combine the two methods:
-- Base table with large text field
CREATE TABLE messages_data(
id INTEGER PRIMARY KEY,
message_body TEXT,
channel TEXT,
post_date DATETIME,
author TEXT
);
-- FTS5 table with UNINDEXED metadata
CREATE VIRTUAL TABLE messages_fts USING fts5(
message_body,
channel UNINDEXED,
content='messages_data',
content_rowid='id'
);
This structure stores message_body
in both the base table and FTS5 index (duplicated), while channel
is stored as UNINDEXED in FTS5 but also resides in the base table. Trade-offs include increased storage for duplicated columns versus reduced join complexity.
Final Schema Recommendations
Use UNINDEXED Columns When:
- Metadata fields are small (e.g., strings under 255 chars)
- Queries frequently filter on metadata without requiring joins
- Simplified schema maintenance is preferred
Use External Content Tables When:
- Base table contains large non-text data (BLOBs, JSON)
- Existing normalization requires separate table storage
- FTS5 index size must be minimized
By methodically applying these diagnostics, repair strategies, and design patterns, developers can robustly implement FTS5 tables with non-indexed metadata while avoiding common pitfalls in query behavior, data integrity, and index maintenance.