Optimizing FTS5 Queries with Numeric Comparisons in SQLite
Understanding FTS5 Performance with Numeric Comparisons
FTS5, or Full-Text Search version 5, is a powerful extension in SQLite designed to handle text-based searches efficiently. It excels in scenarios where you need to search through large volumes of text data, such as documents, articles, or logs, and retrieve results quickly. However, FTS5 is not optimized for numeric comparisons, such as filtering results based on timestamps or other numeric columns. This limitation becomes evident when you attempt to combine full-text search with numeric range queries, leading to significant performance degradation.
The core issue arises because FTS5 is designed to index and search text data, not numeric data. When you include a numeric comparison in your query, such as columnLastmod >= 1710025200
, SQLite cannot leverage the FTS5 index to speed up the search. Instead, it must perform a full scan of the FTS5 table to find rows that match both the text search condition and the numeric comparison. This full scan is computationally expensive, especially when dealing with large datasets, and results in query times that are orders of magnitude slower than pure text searches.
To illustrate, consider a table t
with approximately 8 million rows, where one of the columns, columnLastmod
, stores timestamps. An FTS5 table t_fts5
is created to index the text data in t
. A simple text search query like SELECT rowid FROM t_fts5 WHERE t_fts5 MATCH 'SEARCHTEXT' LIMIT 1000
executes in about 100 milliseconds. However, when you add a numeric comparison to filter results based on columnLastmod
, the query time balloons to 146 seconds. This dramatic increase in query time highlights the inefficiency of combining FTS5 text searches with numeric comparisons.
Why Numeric Comparisons Slow Down FTS5 Queries
The primary reason for the performance degradation when combining FTS5 text searches with numeric comparisons lies in the way FTS5 indexes data. FTS5 is optimized for text-based searches, using inverted indexes to quickly locate rows that contain specific words or phrases. However, FTS5 does not maintain indexes for numeric columns, which means that any numeric comparison must be evaluated by scanning the entire table.
When you execute a query like SELECT rowid FROM t_fts5 WHERE t_fts5 MATCH 'SEARCHTEXT' AND columnLastmod >= 1710025200 LIMIT 1000
, SQLite first uses the FTS5 index to find rows that match the text search condition. However, it then needs to scan these rows to filter out those that do not meet the numeric comparison condition. This scan is particularly costly if the text search condition yields a large number of matches, as is often the case with full-text searches.
Another factor contributing to the slowdown is the inability to create indexes on FTS5 tables. In a standard SQLite table, you can create indexes on numeric columns to speed up range queries. However, FTS5 tables do not support the creation of such indexes, which means that numeric comparisons must be evaluated without the benefit of an index. This limitation forces SQLite to perform a full table scan, which is inherently slow, especially for large datasets.
Additionally, the way SQLite processes queries involving both FTS5 and numeric comparisons can lead to suboptimal execution plans. SQLite’s query planner may not always choose the most efficient strategy for combining text searches with numeric filters, further exacerbating the performance issues. This is particularly true when the query involves complex joins or subqueries, as the planner may struggle to find an optimal path through the data.
Strategies for Optimizing FTS5 Queries with Numeric Comparisons
Given the limitations of FTS5 when it comes to numeric comparisons, several strategies can be employed to optimize queries that require both text search and numeric filtering. These strategies involve restructuring the data, using joins, and leveraging SQLite’s query planner to achieve better performance.
1. Separate Numeric Data into a Standard Table
One effective approach is to separate the numeric data from the FTS5 table and store it in a standard SQLite table. This allows you to create indexes on the numeric columns, which can significantly speed up range queries. You can then join the FTS5 table with the standard table to combine text search results with numeric filters.
For example, instead of storing columnLastmod
in the FTS5 table, you can store it in a separate table t_numeric
with an index on columnLastmod
. The FTS5 table t_fts5
would then only contain the text data. To perform a combined search, you can use a query like:
SELECT t_fts5.rowid
FROM t_fts5
JOIN t_numeric ON t_fts5.rowid = t_numeric.rowid
WHERE t_fts5 MATCH 'SEARCHTEXT'
AND t_numeric.columnLastmod >= 1710025200
AND t_numeric.columnLastmod < 1710111600
LIMIT 1000;
This approach leverages the index on columnLastmod
in the t_numeric
table to quickly filter rows based on the numeric condition, while still using the FTS5 index for the text search. The join operation ensures that only rows that meet both conditions are returned.
2. Use Subqueries to Filter Numeric Data First
Another strategy is to use subqueries to filter the numeric data first, and then apply the text search condition to the filtered results. This approach can be more efficient because it reduces the number of rows that need to be processed by the FTS5 search.
For example, you can first filter the rows based on the numeric condition using a subquery, and then join the results with the FTS5 table:
SELECT t_fts5.rowid
FROM t_fts5
JOIN (
SELECT rowid
FROM t_numeric
WHERE columnLastmod >= 1710025200
AND columnLastmod < 1710111600
) AS filtered
ON t_fts5.rowid = filtered.rowid
WHERE t_fts5 MATCH 'SEARCHTEXT'
LIMIT 1000;
In this query, the subquery filtered
first retrieves the rows that meet the numeric condition, and then the FTS5 search is applied to these filtered rows. This reduces the number of rows that need to be processed by the FTS5 search, leading to faster query execution.
3. Leverage SQLite’s Query Planner with EXPLAIN QUERY PLAN
SQLite provides a powerful tool for analyzing and optimizing queries: the EXPLAIN QUERY PLAN
statement. This tool allows you to see how SQLite plans to execute a query, including which indexes it will use and how it will join tables. By analyzing the query plan, you can identify potential bottlenecks and optimize your queries accordingly.
For example, you can use EXPLAIN QUERY PLAN
to analyze the performance of a query that combines FTS5 search with numeric filtering:
EXPLAIN QUERY PLAN
SELECT t_fts5.rowid
FROM t_fts5
JOIN t_numeric ON t_fts5.rowid = t_numeric.rowid
WHERE t_fts5 MATCH 'SEARCHTEXT'
AND t_numeric.columnLastmod >= 1710025200
AND t_numeric.columnLastmod < 1710111600
LIMIT 1000;
The output of EXPLAIN QUERY PLAN
will show you how SQLite plans to execute the query, including whether it is using the index on columnLastmod
and how it is joining the tables. Based on this information, you can adjust your query or schema to improve performance.
4. Use INNER JOINs Instead of IN Clauses
In some cases, using an INNER JOIN
instead of an IN
clause can improve query performance. The IN
clause can be inefficient when dealing with large datasets, as it may not leverage indexes effectively. By contrast, an INNER JOIN
can be more efficient because it allows SQLite to use indexes to speed up the join operation.
For example, consider the following query that uses an IN
clause:
SELECT r
FROM (
SELECT t_fts5.rowid AS r
FROM t_fts5
WHERE t_fts5 MATCH 'SEARCHTEXT'
LIMIT 1000
)
WHERE r IN (
SELECT rowid
FROM t_numeric
WHERE columnLastmod >= 1710025200
AND columnLastmod < 1710111600
);
This query can be optimized by replacing the IN
clause with an INNER JOIN
:
SELECT r
FROM (
SELECT t_fts5.rowid AS r
FROM t_fts5
WHERE t_fts5 MATCH 'SEARCHTEXT'
LIMIT 1000
)
INNER JOIN (
SELECT rowid
FROM t_numeric
WHERE columnLastmod >= 1710025200
AND columnLastmod < 1710111600
)
ON r = rowid;
In this optimized query, the INNER JOIN
allows SQLite to use indexes to speed up the join operation, leading to faster query execution.
5. Consider Using a Hybrid Approach
In some cases, a hybrid approach that combines multiple optimization strategies may be necessary to achieve the best performance. For example, you can use a combination of subqueries, joins, and indexes to optimize a query that combines FTS5 search with numeric filtering.
Consider the following query, which uses a hybrid approach to optimize performance:
SELECT r
FROM (
SELECT t_fts5.rowid AS r
FROM t_fts5
WHERE t_fts5 MATCH 'SEARCHTEXT'
LIMIT 1000
)
INNER JOIN (
SELECT rowid
FROM t_numeric
WHERE columnLastmod >= 1710025200
AND columnLastmod < 1710111600
)
ON r = rowid;
In this query, the subquery filtered
first retrieves the rows that meet the numeric condition, and then the FTS5 search is applied to these filtered rows. The INNER JOIN
ensures that only rows that meet both conditions are returned, while leveraging indexes to speed up the join operation.
6. Monitor and Adjust Query Performance
Finally, it is important to monitor the performance of your queries and make adjustments as needed. SQLite provides several tools for monitoring query performance, including the EXPLAIN QUERY PLAN
statement and the .timer
command. By regularly monitoring query performance, you can identify potential bottlenecks and make adjustments to your schema or queries to improve performance.
For example, you can use the .timer
command to measure the execution time of a query:
.timer on
SELECT t_fts5.rowid
FROM t_fts5
JOIN t_numeric ON t_fts5.rowid = t_numeric.rowid
WHERE t_fts5 MATCH 'SEARCHTEXT'
AND t_numeric.columnLastmod >= 1710025200
AND t_numeric.columnLastmod < 1710111600
LIMIT 1000;
The output of the .timer
command will show you the execution time of the query, allowing you to measure the impact of any optimizations you make.
In conclusion, while FTS5 is a powerful tool for text-based searches in SQLite, it is not optimized for numeric comparisons. By separating numeric data into a standard table, using subqueries, leveraging SQLite’s query planner, and employing other optimization strategies, you can significantly improve the performance of queries that combine FTS5 search with numeric filtering. Regular monitoring and adjustment of query performance will help you maintain optimal performance as your data grows and evolves.