Resolving SQL Logic Errors in FTS4 Virtual Tables Joined with Views in SQLite
Understanding the "SQL Logic Error" in FTS4 Virtual Table Queries Involving Joined Views
The core issue revolves around a "SQL logic error" occurring when querying a view (wholeIndex
) that joins a standard table (searchIndex
) with an FTS4 virtual table (queryIndex
). The error manifests in two scenarios:
- Direct queries against the FTS4 virtual table (
SELECT * FROM queryIndex
) - Queries against the view that joins the FTS4 table with the standard table (
SELECT * FROM wholeIndex
)
The error message "SQL logic error" is generic but typically indicates structural inconsistencies between the virtual table implementation and SQLite’s query execution engine. Key characteristics of the environment include:
- FTS4 virtual table created with non-standard tokenizer configuration
- View implementation using
JOIN
onrowid
between physical and virtual tables - Trigger-based population mechanism for synchronizing data between tables
- Empty query results from FTS4 match operations despite existing segment data
The operational context reveals several red flags:
- The
queryIndex
FTS4 table uses a custom tokenizer declaration (tokenize=simple XX [* ]
) - The
content=""
parameter in FTS4 table creation suggests external content handling - Integrity checks pass for the database file but fail for FTS4-specific operations
- Application code (doc2dash) shows no obvious FTS4 population logic
Potential Root Causes of FTS4-Related SQL Logic Errors
1. Invalid FTS4 Table Configuration Syntax
The virtual table declaration contains several non-standard parameters:
CREATE VIRTUAL TABLE queryIndex USING FTS4 (
content="",
perfect,
prefix,
suffixes,
titleDescriptionPerfect,
titleDescriptionPrefix,
titleDescriptionSuffixes,
matchinfo=fts3,
tokenize=simple XX [* ]
)
Critical analysis reveals three problematic areas:
A. Tokenizer Configuration Syntax
The tokenize=simple XX [* ]
clause violates FTS4 tokenizer specification rules:
- Only predefined tokenizers (
simple
,porter
,unicode61
) or custom tokenizers registered viasqlite3_tokenizer
may be used - The
XX [* ]
suffix appears to be an invalid attempt to modify tokenizer parameters - Proper syntax for tokenizer arguments requires parentheses:
tokenize=simple(args)
B. Content Table Declaration
The content=""
parameter normally points to an external content table but is empty here. This configuration:
- Creates an FTS4 table that stores content in both the virtual table and underlying %_content table
- May conflict with the view/trigger-based population strategy
- Could cause inconsistencies when joined with the
searchIndex
table
C. Column Name Conflicts
Columns named prefix
, suffixes
, and similar may collide with:
- FTS4’s internal naming conventions for prefix indexes
- Reserved keywords in certain query contexts
2. FTS4 Index Corruption
The presence of data in queryIndex_segments
(43 rows) suggests partial index population, but:
- Failed
integrity-check
andrebuild
commands indicate structural damage - Mismatch between segment metadata and actual content
- Possible version incompatibility between FTS4 implementation and SQLite 3.37.0
3. Improper Data Population via Triggers
The index_insert
trigger attempts to synchronize inserts between tables:
CREATE TRIGGER index_insert INSTEAD OF INSERT ON wholeIndex
BEGIN
INSERT INTO searchIndex (...) VALUES (...);
INSERT INTO queryIndex (...) VALUES (last_insert_rowid(), ...);
END
Potential issues include:
- Reliance on
last_insert_rowid()
across separate INSERT statements - Missing population of FTS4-specific columns (
perfect
,prefix
, etc.) - Failure to handle FTS4’s hidden columns (like
docid
)
4. View Join Incompatibility
The view’s join condition searchIndex.rowid = queryIndex.rowid
assumes:
- Direct correspondence between physical table rowids and FTS4 docids
- Synchronized insertion order between tables
- Identical row counts in both tables
These assumptions are fragile when:
- Using triggers for population
- FTS4 undergoes optimization operations (segment merging)
- Rows are deleted from one table but not the other
Comprehensive Diagnosis and Resolution Strategies
Step 1: Validate FTS4 Table Configuration
A. Tokenizer Correction
Recreate the FTS4 table with proper tokenizer syntax:
-- Remove invalid tokenizer parameters
CREATE VIRTUAL TABLE queryIndex_new USING FTS4 (
content="searchIndex",
perfect,
prefix,
suffixes,
titleDescriptionPerfect,
titleDescriptionPrefix,
titleDescriptionSuffixes,
matchinfo=fts3,
tokenize=simple
);
Key changes:
- Removed
XX [* ]
from tokenizer declaration - Set
content="searchIndex"
to link with source table - Simplified column list to essential fields
B. Content Table Configuration
If using external content:
- Create shadow tables manually:
CREATE TABLE queryIndex_content( docid INTEGER PRIMARY KEY, perfect TEXT, prefix TEXT, suffixes TEXT, titleDescriptionPerfect TEXT, titleDescriptionPrefix TEXT, titleDescriptionSuffixes TEXT );
- Recreate FTS4 table with explicit content reference:
CREATE VIRTUAL TABLE queryIndex USING FTS4( content="queryIndex_content", matchinfo=fts3, tokenize=simple );
Step 2: Repair FTS4 Index Structure
A. Forced Rebuild
Bypass the broken rebuild
command with direct SQL:
-- Create new FTS4 table
CREATE VIRTUAL TABLE queryIndex_temp USING FTS4(...);
-- Copy data from old table
INSERT INTO queryIndex_temp(docid, ...)
SELECT rowid, ... FROM queryIndex;
-- Drop corrupted table
DROP TABLE queryIndex;
-- Rename temp table
ALTER TABLE queryIndex_temp RENAME TO queryIndex;
B. Low-Level Segment Repair
For advanced users only – directly modify FTS4 shadow tables:
-- WARNING: Experimental procedure
DELETE FROM queryIndex_segments;
DELETE FROM queryIndex_segdir;
DELETE FROM queryIndex_docsize;
DELETE FROM queryIndex_stat;
-- Rebuild using content table
INSERT INTO queryIndex(queryIndex) VALUES('rebuild');
Step 3: Validate Trigger-Based Population
A. Trigger Modification
Enhance the population trigger to handle FTS4 requirements:
CREATE TRIGGER index_insert INSTEAD OF INSERT ON wholeIndex
BEGIN
-- Insert into physical table
INSERT INTO searchIndex (name, type, path, titleDescription)
VALUES (NEW.name, NEW.type, NEW.path, NEW.titleDescription);
-- Insert into FTS4 with explicit docid
INSERT INTO queryIndex (
docid,
perfect,
prefix,
suffixes,
titleDescriptionPerfect,
titleDescriptionPrefix,
titleDescriptionSuffixes
) VALUES (
last_insert_rowid(),
NEW.perfect,
NEW.prefix,
NEW.suffixes,
NEW.titleDescriptionPerfect,
NEW.titleDescriptionPrefix,
NEW.titleDescriptionSuffixes
);
END;
Critical improvements:
- Explicit
docid
column mapping - Column order alignment with FTS4 structure
- Removal of implicit
rowid
assumptions
B. Population Testing
Execute controlled insertions to validate the trigger:
-- Test insert through view
INSERT INTO wholeIndex(
name, type, path, titleDescription,
perfect, prefix, suffixes,
titleDescriptionPerfect, titleDescriptionPrefix, titleDescriptionSuffixes
) VALUES (
'test', 'Function', '/test', 'Test description',
'perfect', 'pre', 'suf',
'tdperfect', 'tdpre', 'tdsuf'
);
-- Verify physical table
SELECT * FROM searchIndex WHERE name = 'test';
-- Verify FTS4 table
SELECT * FROM queryIndex WHERE queryIndex MATCH 'test';
Step 4: View Join Optimization
A. Explicit DocID Mapping
Modify the view to use FTS4’s docid
instead of implicit rowid
:
CREATE VIEW wholeIndex AS
SELECT
queryIndex.docid AS rowid,
searchIndex.name,
searchIndex.type,
searchIndex.path,
searchIndex.titleDescription,
queryIndex.perfect,
queryIndex.prefix,
queryIndex.suffixes,
queryIndex.titleDescriptionPerfect,
queryIndex.titleDescriptionPrefix,
queryIndex.titleDescriptionSuffixes
FROM searchIndex
JOIN queryIndex ON searchIndex.rowid = queryIndex.docid;
B. Query Plan Analysis
Use EXPLAIN
to diagnose join execution:
EXPLAIN QUERY PLAN
SELECT * FROM wholeIndex WHERE name = 'test';
Validate that:
- FTS4 table uses efficient docid lookups
- No full table scans occur on virtual tables
- Join order optimizes virtual table access
Step 5: Advanced FTS4 Diagnostics
A. Internal State Inspection
Query FTS4 shadow tables to diagnose index health:
-- Check segment distribution
SELECT level, idx, count(*)
FROM queryIndex_segdir
GROUP BY level, idx;
-- Analyze segment sizes
SELECT blockid, length(block)
FROM queryIndex_segments
ORDER BY blockid DESC
LIMIT 10;
B. Matchinfo Debugging
Use FTS4 auxiliary functions to test tokenization:
-- Test tokenizer output
SELECT fts3_tokenizer_test('simple', 'Test input for tokenization');
-- Validate match patterns
SELECT snippet(queryIndex), offsets(queryIndex)
FROM queryIndex
WHERE queryIndex MATCH 'pre*';
Step 6: Application Code Reconciliation
A. Population Workflow Audit
Inspect the doc2dash codebase for:
- Missing FTS4 population logic
- Incorrect column mappings
- Transaction management around FTS4 inserts
B. Batch Insert Optimization
Modify insertion logic to:
- Use transactions for bulk inserts
- Separate physical and FTS4 insertions
- Leverage
INSERT ... SELECT
for FTS4 population:INSERT INTO queryIndex (docid, ...) SELECT rowid, ... FROM searchIndex;
Step 7: Fallback Strategies
A. FTS5 Migration
If FTS4 proves irreparable, recreate with FTS5:
CREATE VIRTUAL TABLE queryIndex USING FTS5(
perfect,
prefix,
suffixes,
titleDescriptionPerfect,
titleDescriptionPrefix,
titleDescriptionSuffixes,
content='searchIndex',
tokenize='simple'
);
B. Alternative Indexing Approaches
For non-full-text search requirements:
- Use standard SQL indexes on
searchIndex
- Implement prefix search with
LIKE
and indexed columns - Utilize SQLite’s JSON1 extension for structured data
This comprehensive approach addresses the SQL logic error through systematic validation of FTS4 configuration, index integrity verification, trigger modification, and query optimization. Each step incrementally isolates potential failure points while maintaining compatibility with the existing database structure.