Heap Buffer Overflow in zipfileColumn Function When Parsing Malformed ZIP Archives
Root Cause Analysis of zipfileColumn Heap Overflow via Corrupted ZIP Virtual Table
The heap buffer overflow vulnerability in SQLite’s zipfile extension module arises from insufficient validation of ZIP archive metadata during virtual table operations. When processing a malformed ZIP archive via the zipfile
virtual table interface, SQLite’s zipfileColumn()
function attempts to read beyond the allocated memory boundaries of a ZIP entry’s filename field. This occurs due to inconsistencies between declared filename lengths in ZIP central directory headers and actual payload availability, compounded by unsafe string formatting operations during path resolution.
Critical Failure Path: ZIP Entry Parsing and String Formatting
Malformed ZIP Central Directory Header Triggers Incorrect Filename Extraction
The core failure originates in SQLite’s handling of ZIP central directory entries – specifically, how filename lengths are interpreted and used to construct file paths. In the ZIP file format, each central directory entry contains a 2-byte "file name length" field (offset 28) specifying the number of bytes in the filename that follows. The zipfileGetEntry()
function (called during virtual table cursor iteration) reads this value but fails to validate it against both the remaining buffer size and ZIP specification constraints.
When zipfileGetEntry()
processes a central directory entry with an artificially inflated filename length value, it passes this unchecked length to sqlite3_mprintf()
for path construction. This results in a heap-allocated buffer sized for the declared length, but populated with actual filename bytes that may be shorter than the allocated space. Subsequent processing in zipfileColumn()
then attempts to read the full declared length from this buffer, overstepping valid memory regions.
Unsafe Buffer Access During Virtual Table Column Resolution
The zipfileColumn()
function (responsible for returning values for virtual table columns) directly accesses the filename buffer using the unvalidated length from the central directory header. When the name
column is requested (typically via SELECT * FROM zipfile_vtab
), the code indexes into the filename string without verifying that the position exists within the actual allocated buffer. With a malformed filename length value exceeding the true payload size, this leads to a 1-byte read before the buffer’s start address (as seen in the ASAN report at 0x602000000a0f
, just left of an 8-byte region).
Comprehensive Validation and Memory Access Hardening Strategies
Structural Validation of ZIP Entries During Virtual Table Initialization
Modify zipfileGetEntry()
to perform cross-field validation of central directory entries before processing:
- Header Signature Verification: Confirm the presence of the 4-byte central directory header signature (0x02014b50) before parsing subsequent fields.
- Filename Length Bounds Checking: Ensure the declared filename length does not exceed the remaining bytes in the central directory record after fixed-length fields.
- ZIP64 Compatibility Checks: Implement support for ZIP64 extended information fields when file sizes or offsets exceed 0xFFFFFFFF, preventing miscalculations with large malformed values.
- Heap Buffer Guard Pages: Allocate filename buffers with red zones using
sqlite3DbMallocRawNN()
with padding bytes, enabling ASAN to detect overflows during string operations.
Safe String Handling in zipfileColumn Path Resolution
Reimplement filename handling in zipfileColumn()
using bounded string operations:
- Replace
sqlite3_mprintf()
withsqlite3_str_appendf()
from the SQLite string accumulator interface, which provides explicit length tracking and automatic truncation protection. - Add Runtime Buffer Length Tracking: Store both the allocated buffer size and actual valid data length when processing ZIP entries, using a structure like:
typedef struct ZipfileEntry { char *zName; /* File name (UTF-8) */ int nNameAlloc; /* Allocated buffer size for zName */ int nNameValid; /* Actual valid bytes in zName */ /* ... other fields ... */ } ZipfileEntry;
- Implement Mandatory NUL-Termination: Ensure all filename buffers receive explicit NUL-termination regardless of source data, preventing string function overruns.
Virtual Table Cursor Sanitization During Iteration
Enhance the zipfileNext()
cursor advancement function with:
- Central Directory Boundary Checks: Verify that each parsed central directory entry’s starting position and size remain within the memory-mapped ZIP file bounds.
- Entry-to-Entry Offset Validation: Confirm that successive central directory entries do not overlap and maintain proper alignment.
- Maximum Filename Length Enforcement: Reject entries with filename lengths exceeding system-specific path limits (e.g., 4096 bytes for Linux) or ZIP specification maxima.
Patch Implementation and Testing Protocol
Backported Security Fix for ZIP Filename Handling
Apply the following modifications to zipfile.c
(or equivalent shell extension implementation):
/* In zipfileGetEntry(): */
static int zipfileGetEntry(ZipfileTab *pTab, ZipfileEntry *pEntry){
/* ... existing parsing code ... */
/* After reading nFile name length (nName) */
if( (pCd + nName) > pEnd ){
sqlite3ErrorMsg(pTab->db, "ZIP filename extends beyond central directory");
return SQLITE_CORRUPT;
}
/* Allocate with 2 extra bytes for NUL termination and guard byte */
pEntry->zName = sqlite3DbMallocRawNN(pTab->db, nName + 2);
if( !pEntry->zName ) return SQLITE_NOMEM;
pEntry->nNameAlloc = nName + 2;
memcpy(pEntry->zName, pCd, nName);
pEntry->zName[nName] = '\0'; /* Enforce termination */
pEntry->nNameValid = nName;
/* Add guard byte to detect overflows */
pEntry->zName[nName + 1] = 0x55;
/* ... rest of function ... */
}
/* In zipfileColumn(): */
case ZIPFILE_COL_NAME: {
ZipfileEntry *pEntry = pCsr->pCurrent;
if( pEntry->nNameValid <= 0 ){
sqlite3_result_null(ctx);
}else{
/* Verify guard byte before access */
if( pEntry->zName[pEntry->nNameValid + 1] != 0x55 ){
sqlite3_result_error_code(ctx, SQLITE_CORRUPT);
return SQLITE_CORRUPT;
}
sqlite3_result_text(ctx, pEntry->zName, pEntry->nNameValid,
SQLITE_TRANSIENT);
}
break;
}
Differential Fuzzing Test Harness
Implement an automated testing regimen to prevent regression:
- Corpus Generation: Create seed ZIP files with valid/invalid central directory entries using radamsa and zzuf.
- SQL Query Generator: Produce virtual table creation and query statements targeting different columns and filter conditions.
- ASAN Integration: Execute the test harness under AddressSanitizer with:
export SQLITE_TEST_ZIPFILE=1 ./configure CFLAGS="-fsanitize=address -DSQLITE_DEBUG" LDFLAGS="-fsanitize=address" make test
- Crash Triage Automation: Use llvm-symbolizer with ASAN_OPTIONS to automatically generate stack traces for any detected overflows.
Version-Specific Backporting Guidance
For SQLite versions between 3.22.0 (2018-01-15) and 3.42.0:
- Identify Vulnerable Code Path: Locate instances of
sqlite3_mprintf()
usage in ZIP file handling without prior bounds checks. - Apply Memory Accounting: Integrate the
ZipfileEntry
structure changes across all cursor management functions. - Central Directory Validation: Port the boundary checking logic to older versions’
zipfileNext()
implementations.
This comprehensive approach addresses both the immediate heap overflow vulnerability and systemic weaknesses in SQLite’s ZIP virtual table implementation through layered validation, secure memory practices, and rigorous testing protocols.