SQLite Query Planner Suboptimal Index Selection Issue

SQLite Query Planner Prefers Superset Index Over More Efficient Option

The core issue revolves around the SQLite query planner’s tendency to favor an index that covers more columns in the WHERE clause, even when a more efficient index exists. In this scenario, the query involves joining three tables: Games, Relations, and Classifications. The query planner selects Relations_Index2 over Relations_Index1, despite the latter being statistically more efficient for the given query. This behavior is rooted in SQLite’s internal heuristics, which prioritize indexes that cover more WHERE clause conditions, even when the difference in efficiency is negligible or counterproductive.

The query in question is:

SELECT
 Relations.IDSecondDatum, Games.Result
FROM
 Games, Relations, Classifications
WHERE
 (Games.BlackID=?1) AND
 (Relations.TypeFirstDatum==2) AND
 (Relations.IDFirstDatum=Games.ID) AND
 (Relations.TypeSecondDatum=3) AND
 (Relations.IDSecondDatum=Classifications.ID) AND
 (Classifications.System=?4) AND
 (Classifications.GameType=?5);

The indices on the Relations table are:

CREATE INDEX Relations_Index1 ON Relations (IDFirstDatum, IDSecondDatum, TypeFirstDatum, TypeSecondDatum);
CREATE INDEX Relations_Index2 ON Relations (TypeFirstDatum, TypeSecondDatum, IDFirstDatum, IDSecondDatum);
CREATE INDEX Relations_Index3 ON Relations (IDSecondDatum, IDFirstDatum, TypeSecondDatum, TypeFirstDatum);
CREATE INDEX Relations_Index4 ON Relations (TypeSecondDatum, TypeFirstDatum, IDSecondDatum, IDFirstDatum);

The query planner’s choice of Relations_Index2 over Relations_Index1 is based on the assumption that an index covering more WHERE clause conditions is inherently better. However, in this case, Relations_Index1 is statistically more efficient, as it narrows down the search to a single row with fewer iterations.

SQLite Heuristics and Index Selection Logic

SQLite’s query planner employs a heuristic that favors indexes covering more WHERE clause conditions, even when the statistical data suggests that a simpler index would be more efficient. This heuristic is designed to account for potential variations in data distribution and to ensure that the planner makes a conservative choice. However, this approach can lead to suboptimal performance in cases where the statistical data is accurate and the simpler index is demonstrably more efficient.

The statistical data for the indices on the Relations table is as follows:

Relations|Relations_Index4|80965 80965 80965 40 1
Relations|Relations_Index3|80965 40 1 1 1
Relations|Relations_Index2|80965 80965 80965 1 1
Relations|Relations_Index1|80965 1 1 1 1

The numbers represent the expected number of rows that would be selected using equality constraints on the first, second, third, and fourth terms of the index, respectively. For Relations_Index1, the fourth number is 1, indicating that an equality constraint on all four columns would narrow the search to a single row. Similarly, for Relations_Index2, the fourth number is also 1, suggesting that it would also narrow the search to a single row.

However, the key difference lies in the number of iterations required to reach that single row. Relations_Index1 requires only one iteration, while Relations_Index2 may require up to 80965 iterations in the worst case. This discrepancy is due to the order of the columns in the index and how SQLite traverses the index tree.

Forcing Index Usage and Optimizing Query Performance

To address the issue of suboptimal index selection, SQLite provides the INDEXED BY clause, which allows developers to force the query planner to use a specific index. This can be particularly useful for debugging and performance tuning, although it is generally not recommended for production use due to the potential for changes in data distribution over time.

For example, to force the use of Relations_Index1, the query can be modified as follows:

SELECT
 Relations.IDSecondDatum, Games.Result
FROM
 Games, Relations INDEXED BY Relations_Index1, Classifications
WHERE
 (Games.BlackID=?1) AND
 (Relations.TypeFirstDatum==2) AND
 (Relations.IDFirstDatum=Games.ID) AND
 (Relations.TypeSecondDatum=3) AND
 (Relations.IDSecondDatum=Classifications.ID) AND
 (Classifications.System=?4) AND
 (Classifications.GameType=?5);

In this case, forcing the use of Relations_Index1 results in a significant performance improvement, with the query execution time dropping from 99.35 seconds to 0.0177 seconds. This demonstrates the importance of understanding the underlying statistical data and the potential impact of index selection on query performance.

Additionally, it is crucial to ensure that the statistical data used by the query planner is up-to-date. Running the ANALYZE command can help refresh the statistics and improve the planner’s decision-making process. For example:

ANALYZE;

This command updates the statistical data for all indexes in the database, ensuring that the query planner has the most accurate information available when selecting an index.

In conclusion, while SQLite’s query planner is generally effective at selecting the most efficient index, there are cases where its heuristics can lead to suboptimal performance. By understanding the underlying statistical data and using tools like the INDEXED BY clause and the ANALYZE command, developers can fine-tune their queries to achieve optimal performance.

SQLite Query Planner Suboptimal Index Selection Issue

SQLite Query Planner Prefers Superset Index Over More Efficient Option

SQLite Heuristics and Index Selection Logic

Forcing Index Usage and Optimizing Query Performance

SQLite Database Growth Due to Table Corruption and Embedded NULs

Optimizing SQLite Queries and Tkinter GUI for Large Text Data

SQLite Deterministic Function Flag Misinterpretation and Call Count Issue

Assertion Failure in agginfoPersistExprCb During Complex Aggregation Query

Automatic Index Not Created in SQLite 3.43 with Co-Routine Usage

In-Memory SQLite Single-Thread Configuration Slower Than Multi-Thread with Shared Cache

Leave a Reply Cancel reply

SQLite Query Planner Prefers Superset Index Over More Efficient Option

SQLite Heuristics and Index Selection Logic

Forcing Index Usage and Optimizing Query Performance

Related Guides

Leave a Reply Cancel reply