Unexpected BLOB Hex Output in SQLite Shell with UTF-16 Encoding and Quote Mode
Issue Overview: UTF-16 Database Encoding Causes BLOB Corruption in Quote Mode
When working with SQLite databases configured to use UTF-16le encoding, BLOB columns may exhibit corrupted or unexpected hexadecimal output when queried using the SQLite shell’s .mode quote
setting. This issue arises specifically when the following conditions coincide:
- The database encoding is explicitly set to
UTF-16le
viaPRAGMA encoding
. - BLOB values are inserted into a table and subsequently retrieved using
.mode quote
. - The SQLite shell version predates the fix implemented in commit
728e9dcc6d211acd
(January 3, 2022).
The problem manifests as a mismatch between the original BLOB data and its displayed hexadecimal representation. For example, a BLOB containing the bytes x'0123456789ABCDEF'
may appear as X'e28c81e69d85eaae89eebf8d'
in quote mode. This discrepancy stems from improper handling of BLOB data during output formatting, particularly when the shell converts binary data into text under UTF-16 encoding. The corruption occurs due to an incorrect assumption that BLOBs can be safely treated as text strings, leading to unintended character encoding transformations.
Possible Causes: Misinterpretation of BLOB Data as Text in Output Modes
The core issue lies in how the SQLite shell processes BLOB data across different output modes (e.g., tabs, insert, quote). Three interrelated factors contribute to the problem:
1. UTF-16 Encoding and BLOB-to-Text Conversion
When the database encoding is set to UTF-16le, the SQLite shell internally represents text strings using 16-bit little-endian encoding. However, BLOB values are raw binary data and should not undergo character encoding transformations. The shell’s exec_prepared_stmt
function erroneously converts BLOB data to text using sqlite3_column_text()
, which interprets the BLOB bytes as UTF-16 text. This forces a re-encoding of the binary data into UTF-8 (the default encoding for the shell’s output), producing garbled hexadecimal values.
2. Mode-Specific Handling of BLOBs
The SQLite shell applies different formatting rules based on the .mode
setting. In insert
mode, BLOBs are formatted as hexadecimal literals (e.g., X'0123...'
), which is correct. However, in quote
mode, the shell’s logic fails to distinguish BLOBs from text strings. Instead of using sqlite3_column_blob()
to retrieve raw bytes, it uses sqlite3_column_text()
, triggering an implicit conversion. This conversion chain—BLOB → UTF-16 text → UTF-8 text → hexadecimal dump—corrupts the original data.
3. Conditional Logic in Shell Output Routines
The shell’s source code (prior to the fix) contained a conditional check to avoid text conversion for BLOBs in insert
mode. However, this check was not extended to quote
mode. The relevant code section in src/shell.c.in
treated BLOBs as text in all modes except insert
, leading to the observed issue. The absence of a special case for quote
mode allowed the faulty conversion to persist.
Troubleshooting Steps, Solutions & Fixes: Correcting BLOB Handling in Output Modes
Step 1: Verify SQLite Version and Apply the Official Fix
First, confirm the SQLite shell version using .version
or sqlite3 --version
. If the version predates January 3, 2022, update to a build that includes commit 728e9dcc6d211acd
. This commit modifies the conditional logic in shell.c.in
to handle BLOBs correctly in both insert
and quote
modes.
Step 2: Modify Shell Code to Avoid BLOB-to-Text Conversion
For users unable to update SQLite immediately, apply the following code patch to src/shell.c.in
:
--- src/shell.c.in
+++ src/shell.c.in
@@ -3364,7 +3364,8 @@
for(i=0; i<nCol; i++){
aiTypes[i] = x = sqlite3_column_type(pStmt, i);
- if( x==SQLITE_BLOB && pArg && pArg->cMode==MODE_Insert ){
+ if( x==SQLITE_BLOB && pArg &&
+ (pArg->cMode==MODE_Insert || pArg->cMode==MODE_Quote) ){
azVals[i] = "";
}else{
azVals[i] = (char*)sqlite3_column_text(pStmt, i);
This change ensures that BLOBs are not converted to text in quote
mode, preserving their raw byte representation.
Step 3: Use Explicit BLOB Retrieval in Custom Queries
When writing queries that involve BLOBs in UTF-16 databases, use sqlite3_column_blob()
instead of sqlite3_column_text()
in application code. For example, in a custom C application:
const void* blob_data = sqlite3_column_blob(stmt, 0);
int blob_size = sqlite3_column_bytes(stmt, 0);
This bypasses text encoding entirely, ensuring binary integrity.
Step 4: Validate Encoding Settings and Output Modes
Ensure that the database encoding matches the intended use case. If BLOBs are involved, prefer UTF-8
encoding unless UTF-16 is strictly required for text data. When using .mode quote
, verify that BLOB columns are not being inadvertently cast to text.
Step 5: Test with Hexadecimal Literals
After applying fixes, re-run the original test case:
PRAGMA encoding='UTF-16le';
CREATE TABLE foo(fooval BLOB NOT NULL);
INSERT INTO foo VALUES(x'0123456789ABCDEF');
.mode quote
SELECT quote(fooval) FROM foo;
The output should now consistently return X'0123456789ABCDEF'
across all modes, confirming correct BLOB handling.
Step 6: Audit Application Code for Implicit Conversions
Review any code that processes BLOBs for unintended text conversions, especially in environments mixing UTF-16 and UTF-8. Use SQLite’s typeof()
function to validate column data types during debugging:
SELECT typeof(fooval) FROM foo; -- Should return 'blob'
By systematically addressing the interaction between database encoding, output modes, and BLOB handling, users can eliminate data corruption and ensure reliable binary data management in SQLite.