Crash in SQLite’s isLikeOrGlob() Function Due to NULL Pointer Dereference


Issue Overview: NULL Pointer Dereference in isLikeOrGlob() Function

The core issue revolves around a crash in SQLite caused by a NULL pointer dereference in the isLikeOrGlob() function. This function is part of SQLite’s internal machinery for handling pattern matching operations, specifically the LIKE and GLOB operators. The crash occurs when the function attempts to dereference a NULL pointer (pLeft->y.pTab) while evaluating a complex query involving generated columns, JOINs, and OR optimizations.

The problematic query that triggers the crash is as follows:

CREATE TABLE t0(a PRIMARY KEY, b CHAR(30) AS(1) UNIQUE) WITHOUT ROWID;
SELECT * FROM t0 JOIN t0 AS ra0 ON unlikely(ra0.a=t0.a) AND t0.b='iii' WHERE ra0.a=false OR t0.b LIKE '.' AND t0.a=ra0.b;

The crash is traced to the line in isLikeOrGlob() where the macro IsVirtual(X) is invoked:

#define IsVirtual(X) ((X)->eTabType == TABTYP_VTAB)

Here, pLeft->y.pTab is NULL, and the macro attempts to access the eTabType member of a NULL pointer, resulting in a segmentation fault.

The issue is deeply rooted in the interaction between three SQLite features:

  1. Generated Columns: The table t0 includes a generated column b, which is defined as CHAR(30) AS(1) UNIQUE. Generated columns are computed at runtime and can introduce complexities in query evaluation.
  2. OR Optimization: The query includes an OR condition (ra0.a=false OR t0.b LIKE '.' AND t0.a=ra0.b), which SQLite attempts to optimize. The OR optimizer can sometimes lead to unexpected interactions with other query components.
  3. LIKE Optimization: The LIKE operator is optimized internally by SQLite to improve performance. However, this optimization can fail when combined with other query features, such as generated columns and OR conditions.

The crash was introduced in a specific version of SQLite, as identified by bisecting the codebase to the check-in b99d570131. This check-in likely introduced a regression that did not account for the possibility of a NULL pointer in the isLikeOrGlob() function.


Possible Causes: Interactions Between Query Features and Optimizations

The crash is caused by a combination of factors, each of which contributes to the NULL pointer dereference in isLikeOrGlob(). Below, we explore these factors in detail:

1. Generated Columns and Virtual Tables

Generated columns, such as b CHAR(30) AS(1) UNIQUE, are computed at runtime and are treated similarly to virtual tables in SQLite’s internal representation. When the query planner encounters a generated column, it may attempt to evaluate it as part of a virtual table. However, in this case, the evaluation leads to a NULL pointer (pLeft->y.pTab) because the generated column does not have an associated virtual table structure.

The IsVirtual(X) macro assumes that X is a valid pointer to a table structure. When X is NULL, the macro dereferences the NULL pointer, causing a segmentation fault. This assumption is not always valid, especially when dealing with generated columns or other complex query constructs.

2. OR Optimization and Short-Circuit Evaluation

The OR condition in the query (ra0.a=false OR t0.b LIKE '.' AND t0.a=ra0.b) introduces additional complexity. SQLite’s query optimizer attempts to evaluate OR conditions efficiently by using short-circuit evaluation. If the first part of the OR condition (ra0.a=false) is true, the second part (t0.b LIKE '.' AND t0.a=ra0.b) is not evaluated. However, this optimization can lead to unexpected interactions with other query components, such as the LIKE operator and generated columns.

In this case, the OR optimizer may incorrectly assume that the LIKE operator can be safely evaluated without checking for NULL pointers. This assumption fails when the generated column b is involved, leading to the NULL pointer dereference in isLikeOrGlob().

3. LIKE Optimization and Pattern Matching

The LIKE operator is optimized internally by SQLite to improve performance. This optimization involves pre-processing the pattern and determining whether it can be evaluated using a simple comparison or whether a more complex pattern matching algorithm is required. However, this optimization can fail when combined with other query features, such as generated columns and OR conditions.

In the problematic query, the LIKE operator is applied to the generated column b. The optimizer attempts to evaluate the pattern '.' but encounters a NULL pointer when accessing the table structure associated with the generated column. This NULL pointer dereference triggers the crash.

4. Regression Introduced in Check-in b99d570131

The crash was introduced in a specific version of SQLite, as identified by bisecting the codebase to the check-in b99d570131. This check-in likely introduced a regression that did not account for the possibility of a NULL pointer in the isLikeOrGlob() function. The regression may have been caused by changes to the query optimizer or the handling of generated columns, which inadvertently introduced the NULL pointer dereference.


Troubleshooting Steps, Solutions & Fixes: Addressing the NULL Pointer Dereference

To resolve the crash in isLikeOrGlob(), we need to address the NULL pointer dereference and ensure that the function can handle cases where pLeft->y.pTab is NULL. Below, we outline the steps to troubleshoot and fix the issue:

1. Add NULL Pointer Checks in isLikeOrGlob()

The most straightforward solution is to add NULL pointer checks in the isLikeOrGlob() function before dereferencing pLeft->y.pTab. This ensures that the function does not attempt to access a NULL pointer, preventing the segmentation fault.

The modified code might look like this:

if (pLeft->y.pTab == NULL) {
    return 0; // or handle the NULL case appropriately
}
#define IsVirtual(X) ((X) != NULL && (X)->eTabType == TABTYP_VTAB)

This change ensures that IsVirtual(X) only dereferences X if it is not NULL.

2. Review the Handling of Generated Columns

The crash highlights a potential issue with how SQLite handles generated columns in the query optimizer. Specifically, the optimizer should ensure that generated columns are properly evaluated and that their associated table structures are valid before attempting to access them.

To address this, we can review the code responsible for evaluating generated columns and ensure that it correctly handles cases where the table structure is NULL. This may involve adding additional checks or modifying the evaluation logic to avoid NULL pointer dereferences.

3. Improve OR Optimization Logic

The OR optimization logic should be reviewed to ensure that it correctly handles cases where one or more operands involve generated columns or other complex query constructs. Specifically, the optimizer should ensure that all operands are valid before attempting to evaluate them.

One possible improvement is to add additional checks in the OR optimizer to verify that the operands are valid and that their associated table structures are not NULL. This would prevent the optimizer from attempting to evaluate invalid operands, reducing the risk of NULL pointer dereferences.

4. Enhance LIKE Optimization for Generated Columns

The LIKE optimization logic should be enhanced to handle cases where the operand is a generated column. Specifically, the optimizer should ensure that the generated column is properly evaluated and that its associated table structure is valid before attempting to apply the LIKE operator.

This may involve modifying the pattern matching logic to handle generated columns differently or adding additional checks to ensure that the operand is valid before applying the LIKE operator.

5. Regression Testing and Fuzzing

To prevent similar issues in the future, it is important to implement regression testing and fuzzing as part of the SQLite development process. Regression testing ensures that changes to the codebase do not introduce new bugs, while fuzzing helps identify edge cases and unexpected interactions between query features.

The test case provided in the discussion can be added to SQLite’s regression test suite to ensure that the crash does not reoccur in future versions. Additionally, fuzzing can be used to identify other potential issues involving generated columns, OR optimization, and LIKE optimization.

6. Simplify and Minimize Test Cases

As demonstrated by Richard Hipp, simplifying and minimizing test cases can help identify the root cause of the issue more quickly. By removing unnecessary components from the query, such as the WITHOUT ROWID clause and the unlikely() function, we can isolate the specific interaction that triggers the crash.

This approach can be applied to other bug reports to help developers identify and fix issues more efficiently. By providing a minimal test case, reporters can reduce the time and effort required to reproduce and diagnose the issue.

7. Document Best Practices for Query Optimization

Finally, it is important to document best practices for query optimization, particularly when dealing with generated columns, OR conditions, and pattern matching operators. This documentation can help developers avoid common pitfalls and ensure that their queries are optimized correctly.

For example, developers should be aware of the potential interactions between generated columns and query optimizations and should test their queries thoroughly to ensure that they do not trigger unexpected behavior.


By following these troubleshooting steps and implementing the suggested fixes, we can address the NULL pointer dereference in isLikeOrGlob() and prevent similar issues from occurring in the future. This approach ensures that SQLite remains a robust and reliable database engine, capable of handling complex queries and edge cases with ease.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *