Optimizing Slow Combined Queries in SQLite with FTS and Low-Cardinality Indexes

Understanding the Performance Discrepancy Between Fast and Slow Combined Queries

The core issue revolves around a significant performance discrepancy observed when combining two fast-running queries into a single query. The individual queries are efficient, but their combination results in a drastic slowdown. This phenomenon is particularly puzzling because the individual components of the query—filtering by root_id, checking for deleted_at being NULL, and performing a full-text search using MATCH—are all optimized and perform well in isolation. However, when these conditions are combined, the query execution time increases from milliseconds to several seconds.

The first query, which counts posts where root_id is in a specific set and deleted_at is NULL, executes in 0.078 seconds. The second query, which performs a full-text search using MATCH on the posts_fts table, executes in 0.003 seconds. However, when these conditions are combined, the query execution time balloons to 6.783 seconds. This discrepancy suggests that the interaction between the full-text search and the filtering conditions is causing inefficiencies in query execution.

The alternate version of the query, which uses a subquery to first filter posts based on the full-text search and then applies the root_id and deleted_at conditions, performs well (0.002 seconds). However, this approach has a limitation: it does not allow for the use of the rank column from the posts_fts table in an ORDER BY clause, which is a requirement for the user. This limitation necessitates a different approach to optimize the combined query while still being able to use the rank column.

The Role of Indexes and Low-Cardinality Columns in Query Performance

One of the key insights from the discussion is the role of indexes, particularly on low-cardinality columns, in query performance. The root_id column, which has only 10 distinct values, is a low-cardinality column. Indexes on such columns can often be counterproductive because they do not provide significant selectivity. In other words, the index does not effectively narrow down the search space, and the database engine may end up scanning a large portion of the index, leading to inefficiencies.

In this case, the presence of an index on the root_id column was identified as the primary cause of the slowdown in the combined query. When the index was removed, the query execution time dropped to 0.003 seconds, matching the performance of the individual full-text search query. This suggests that the index on root_id was interfering with the query optimizer’s ability to efficiently combine the conditions from the full-text search and the root_id filter.

The issue is further compounded by the fact that the query involves a join between the posts table and the posts_fts table. Joins can introduce additional complexity, especially when combined with full-text search and filtering conditions. The query optimizer must decide the most efficient way to combine these conditions, and the presence of a low-cardinality index can lead to suboptimal execution plans.

Strategies for Optimizing Combined Queries with Full-Text Search and Low-Cardinality Filters

To address the performance issues in the combined query, several strategies can be employed. The first and most straightforward approach is to remove the index on the low-cardinality root_id column, as this was shown to significantly improve query performance. However, this may not always be feasible, especially if the index is required for other queries.

Another approach is to use the ANALYZE command to gather statistics about the distribution of data in the tables. These statistics can help the query optimizer make more informed decisions about how to execute the query. In some cases, running ANALYZE can lead to significant performance improvements, as the optimizer can better estimate the selectivity of different conditions and choose more efficient execution plans.

Additionally, the use of the .expert command in SQLite can provide recommendations for indexes that might improve query performance. This command analyzes the query and suggests indexes that could help the optimizer execute the query more efficiently. In some cases, these suggested indexes may differ from the existing indexes and can lead to better performance.

Finally, it is important to consider the structure of the query itself. The alternate version of the query, which uses a subquery to first filter posts based on the full-text search and then applies the root_id and deleted_at conditions, performs well but does not allow for the use of the rank column. To address this limitation, one could consider restructuring the query to first perform the full-text search and then join the results with the posts table to apply the additional filters. This approach can help ensure that the query optimizer can efficiently combine the conditions while still allowing for the use of the rank column.

In conclusion, the performance issues in the combined query are primarily due to the interaction between the full-text search and the low-cardinality root_id index. By removing the index, using ANALYZE to gather statistics, and restructuring the query, it is possible to achieve significant performance improvements while still meeting the requirements for using the rank column.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *