Crash in sqlite3_create_function16 Due to Malformed UTF-16 Function Name Termination
Issue Overview: Out-of-Bounds Read During UTF-16 to UTF-8 Conversion in sqlite3_create_function16
The crash occurs when calling sqlite3_create_function16
after opening a database connection with sqlite3_open
. The root cause is an out-of-bounds memory read during the conversion of a UTF-16 function name (zFunctionName
) to UTF-8. This conversion is performed by sqlite3Utf16to8
, which is called internally by sqlite3_create_function16
.
The problematic code resides in the loop that calculates the length of the UTF-16 string:
for(nByte=0; nByte<=iLimit && (z[nByte] | z[nByte+1]); nByte+=2){}
Here, z
is the input UTF-16 function name, and iLimit
is derived from db->aLimit[SQLITE_LIMIT_LENGTH]
(set during database initialization). If zFunctionName
is not properly null-terminated (i.e., lacks two consecutive zero bytes), the loop reads beyond the allocated buffer, causing undefined behavior, including segmentation faults.
Key Observations:
- Invalid UTF-16 Termination: The provided
zFunctionName
buffer in the test code ends with a single zero byte (0x00
), violating UTF-16’s requirement for two consecutive zero bytes to denote string termination. - Fuzzing-Induced Edge Case: The fuzzer-generated input bypasses proper UTF-16 validation, exposing a scenario where SQLite processes an unterminated string. The loop reads
z[2]
andz[3]
(offsets 2 and 3) after the end of a 2-byte buffer, triggering the crash. - API Contract Violation: SQLite’s C API assumes valid inputs for low-level functions. Passing malformed UTF-16 strings violates this contract, analogous to calling
strlen()
on a non-null-terminated C string.
Technical Breakdown:
- UTF-16 String Handling: UTF-16 strings use 2-byte code units. A valid null terminator requires two consecutive zero bytes (
\x00\x00
). The test code’sv6_tmp
array ends with0x00
alone, making SQLite’s parser overread. - sqlite3Utf16to8 Workflow:
- Determine the length of the input UTF-16 string by scanning for the null terminator.
- Allocate memory for the UTF-8 equivalent.
- Convert each UTF-16 code point to UTF-8.
- Return the UTF-8 string for internal use.
- Crash Context: When the loop exceeds the buffer, it accesses unmapped memory or memory with non-readable permissions, crashing the process.
Possible Causes: Improper String Termination and API Misuse
1. Unterminated UTF-16 Function Name
The primary cause is passing a UTF-16 string without a valid null terminator. The test code’s v6_tmp
array (representing zFunctionName
) is declared as:
i8 v6_tmp[] = {75, -92, ..., 0 };
This array ends with a single 0x00
byte. Since UTF-16 requires two bytes per character, SQLite expects zFunctionName
to end with 0x0000
. The absence of the second zero byte causes the parser to read beyond the buffer.
2. Fuzzer-Generated Inputs Bypassing Validation
Fuzzers often generate inputs that violate API preconditions. In this case:
- The fuzzer created a function name with incorrect termination.
- SQLite’s
sqlite3_create_function16
does not perform exhaustive validation of input strings for performance reasons, assuming developers adhere to the API contract.
3. Misunderstanding SQLite’s String Handling Semantics
Developers might incorrectly assume that SQLite:
- Validates all input strings for proper encoding.
- Automatically truncates or sanitizes malformed strings.
In reality, SQLite’s C API functions mirror standard C library behavior: they trust the caller to provide valid inputs.
4. Incorrect Use of sqlite3_open with Corrupted Databases
The test code writes arbitrary bytes (from v1_tmp
) into a file and attempts to open it as a database. While unrelated to the crash, this could lead to additional undefined behavior if the database header is corrupted. However, the immediate crash is due to the function name issue.
Troubleshooting Steps, Solutions & Fixes: Validating UTF-16 Inputs and Adhering to API Contracts
Step 1: Validate UTF-16 Function Name Termination
Before calling sqlite3_create_function16
, ensure the function name is a valid UTF-16 string with proper null termination:
// Check for two consecutive zero bytes at the end
int is_valid_utf16(const void *zStr, size_t max_len) {
const unsigned char *p = (const unsigned char *)zStr;
for (size_t i = 0; i < max_len; i += 2) {
if (p[i] == 0 && p[i+1] == 0) return 1; // Valid terminator found
}
return 0; // Unterminated string
}
// Usage:
if (!is_valid_utf16(v7, sizeof(v6_tmp))) {
// Handle error
}
Step 2: Correct the Test Case’s Function Name Buffer
Modify the v6_tmp
array to include two zero bytes at the end:
i8 v6_tmp[] = {75, -92, ..., 0, 0 }; // Proper null terminator
Step 3: Use SQLite’s Built-in String Validation (Where Applicable)
For SQLite APIs that accept strings from untrusted sources, use sqlite3_prepare_v2
or higher-level wrappers that handle string validation. However, sqlite3_create_function16
is a low-level API and does not perform such checks.
Step 4: Adjust Fuzzing Strategies to Respect API Preconditions
Fuzzers targeting SQLite’s C APIs must generate inputs that comply with:
- UTF-16 null termination rules.
- Valid function name characters (no embedded nulls except the terminator).
Example fuzzer fix:
def generate_valid_utf16_str():
# Generate random UTF-16 code units, ensuring final two bytes are zero
length = random.randint(1, 100)
data = os.urandom(length * 2)
return data + b'\x00\x00'
Step 5: Defensive Coding with Bounds Checking
Wrap sqlite3_create_function16
in a helper function that performs bounds checking:
int safe_create_function16(
sqlite3 *db,
const void *zFunctionName,
int nArg,
int eTextRep,
void *pApp,
void (*xFunc)(sqlite3_context*,int,sqlite3_value**),
void (*xStep)(sqlite3_context*,int,sqlite3_value**),
void (*xFinal)(sqlite3_context*)
) {
if (!zFunctionName || !is_valid_utf16(zFunctionName, 1024)) {
return SQLITE_MISUSE;
}
return sqlite3_create_function16(db, zFunctionName, nArg, eTextRep, pApp, xFunc, xStep, xFinal);
}
Step 6: Analyze SQLite’s Limit Configuration
The crash involves iLimit = db->aLimit[SQLITE_LIMIT_LENGTH]
, which defaults to 0x7a12000
(128 MB). To prevent excessive memory allocation during conversion:
// Set a lower limit for string length
sqlite3_limit(db, SQLITE_LIMIT_LENGTH, 1024);
Step 7: Address Database File Corruption (Secondary Issue)
The test code writes invalid data to the database file. While unrelated to the crash, this can be mitigated by:
// After sqlite3_open, check if the database is valid
if (sqlite3_exec(db, "SELECT 1", NULL, NULL, NULL) != SQLITE_OK) {
// Handle corrupted database
}
Step 8: Review SQLite’s Documentation on C API Usage
Key excerpts from SQLite’s documentation:
- "The pointers passed into [C APIs] must be valid and properly aligned."
- "UTF-16 strings must be zero-terminated using two consecutive zero bytes."
Final Solution Summary
- For Developers: Always null-terminate UTF-16 strings with two zero bytes.
- For Fuzzing Tools: Ensure generated inputs comply with API preconditions.
- For SQLite Itself: Document expectations for low-level APIs prominently. Consider adding debug-mode assertions for string validation.