Optimizing Slow Combined Queries in SQLite with FTS and Low-Cardinality Indexes
Understanding the Performance Discrepancy Between Fast and Slow Combined Queries
The core issue revolves around a significant performance discrepancy observed when combining two fast-running queries into a single query. The individual queries are efficient, but their combination results in a drastic slowdown. This phenomenon is particularly puzzling because the individual components of the query—filtering by root_id
, checking for deleted_at
being NULL
, and performing a full-text search using MATCH
—are all optimized and perform well in isolation. However, when these conditions are combined, the query execution time increases from milliseconds to several seconds.
The first query, which counts posts where root_id
is in a specific set and deleted_at
is NULL
, executes in 0.078 seconds. The second query, which performs a full-text search using MATCH
on the posts_fts
table, executes in 0.003 seconds. However, when these conditions are combined, the query execution time balloons to 6.783 seconds. This discrepancy suggests that the interaction between the full-text search and the filtering conditions is causing inefficiencies in query execution.
The alternate version of the query, which uses a subquery to first filter posts based on the full-text search and then applies the root_id
and deleted_at
conditions, performs well (0.002 seconds). However, this approach has a limitation: it does not allow for the use of the rank
column from the posts_fts
table in an ORDER BY
clause, which is a requirement for the user. This limitation necessitates a different approach to optimize the combined query while still being able to use the rank
column.
The Role of Indexes and Low-Cardinality Columns in Query Performance
One of the key insights from the discussion is the role of indexes, particularly on low-cardinality columns, in query performance. The root_id
column, which has only 10 distinct values, is a low-cardinality column. Indexes on such columns can often be counterproductive because they do not provide significant selectivity. In other words, the index does not effectively narrow down the search space, and the database engine may end up scanning a large portion of the index, leading to inefficiencies.
In this case, the presence of an index on the root_id
column was identified as the primary cause of the slowdown in the combined query. When the index was removed, the query execution time dropped to 0.003 seconds, matching the performance of the individual full-text search query. This suggests that the index on root_id
was interfering with the query optimizer’s ability to efficiently combine the conditions from the full-text search and the root_id
filter.
The issue is further compounded by the fact that the query involves a join between the posts
table and the posts_fts
table. Joins can introduce additional complexity, especially when combined with full-text search and filtering conditions. The query optimizer must decide the most efficient way to combine these conditions, and the presence of a low-cardinality index can lead to suboptimal execution plans.
Strategies for Optimizing Combined Queries with Full-Text Search and Low-Cardinality Filters
To address the performance issues in the combined query, several strategies can be employed. The first and most straightforward approach is to remove the index on the low-cardinality root_id
column, as this was shown to significantly improve query performance. However, this may not always be feasible, especially if the index is required for other queries.
Another approach is to use the ANALYZE
command to gather statistics about the distribution of data in the tables. These statistics can help the query optimizer make more informed decisions about how to execute the query. In some cases, running ANALYZE
can lead to significant performance improvements, as the optimizer can better estimate the selectivity of different conditions and choose more efficient execution plans.
Additionally, the use of the .expert
command in SQLite can provide recommendations for indexes that might improve query performance. This command analyzes the query and suggests indexes that could help the optimizer execute the query more efficiently. In some cases, these suggested indexes may differ from the existing indexes and can lead to better performance.
Finally, it is important to consider the structure of the query itself. The alternate version of the query, which uses a subquery to first filter posts based on the full-text search and then applies the root_id
and deleted_at
conditions, performs well but does not allow for the use of the rank
column. To address this limitation, one could consider restructuring the query to first perform the full-text search and then join the results with the posts
table to apply the additional filters. This approach can help ensure that the query optimizer can efficiently combine the conditions while still allowing for the use of the rank
column.
In conclusion, the performance issues in the combined query are primarily due to the interaction between the full-text search and the low-cardinality root_id
index. By removing the index, using ANALYZE
to gather statistics, and restructuring the query, it is possible to achieve significant performance improvements while still meeting the requirements for using the rank
column.