Implementing Binary and Octal Literal Support in SQLite: Challenges and Solutions
Issue Overview: Patch Implementation Gaps in Binary/Octal Literal Syntax Support
The core objective of this patch is to extend SQLite’s numeric literal support to include binary (e.g., 0b1010
) and octal (e.g., 0o755
) formats, aligning their behavior with existing hexadecimal literal handling. While functional for basic use cases, several critical implementation gaps prevent this patch from meeting SQLite’s rigorous quality standards.
Key Features Implemented
- Literal Prefix Matching: Case-insensitive recognition of
0b
/0B
for binary and0o
/0O
for octal. - 64-Bit Integer Parsing: Conversion of literals to signed 64-bit integers with overflow detection.
- Compile-Time Flags:
SQLITE_OMIT_BINARY_INTEGER
andSQLITE_OMIT_OCTAL_INTEGER
to disable features. - Error Handling: Custom error messages for oversized literals (e.g., "binary literal too big").
Critical Shortcomings
Non-Compliant Character Classification:
- New macros
sqlite3Isbdigit
andsqlite3Isodigit
bypass SQLite’s centralizedsqlite3CtypeMap
system, risking inconsistency with other character-classification logic. - Example:
sqlite3Isbdigit
checks only for'0'
/'1'
via direct comparison, ignoring locale-aware ctype mappings.
- New macros
Function Naming and Scope Pollution:
- The renamed
sqlite3DecOrHexOrOctOrBinToI64()
function introduces verbosity and violates SQLite’s naming conventions for clarity. - Code duplication in
sqlite3GetInt32()
for binary/octal parsing creates maintenance risks.
- The renamed
Incomplete Test Coverage:
- New test files
binlit.test
andoctlit.test
lack edge cases covered inhexlit.test
, such as:- Underscore separators (e.g.,
0b1010_0101
). - Boundary values at
2^63-1
and-2^63
. - Invalid digit detection (e.g.,
0b12
or0o89
).
- Underscore separators (e.g.,
- Insufficient validation of
SQLITE_OMIT_*
flags.
- New test files
JSON5 Extension Incompatibility:
- The JSON5 extension’s numeric parser does not recognize binary/octal literals, breaking consistency with core SQLite.
Underscore Handling Inconsistency:
- The initial implementation allowed underscores in binary literals (e.g.,
0b1000_0001
) but later removed support, creating ambiguity.
- The initial implementation allowed underscores in binary literals (e.g.,
Possible Causes: Architectural and Implementation Misalignments
Cause 1: Ad Hoc Character Classification
The patch introduces standalone macros sqlite3Isbdigit
and sqlite3Isodigit
instead of integrating with SQLite’s sqlite3CtypeMap
bitmap. This likely stems from:
- Misunderstanding of
sqlite3CtypeMap
: The contributor may not have realized that SQLite uses a centralized ctype system to avoid locale dependencies and ensure cross-platform consistency. - Urgency to Avoid Bitmask Complexity: Direct character comparisons (
x == '0'
) are simpler but fail to account for SQLite’s strict definitions of valid digits.
Cause 2: Function Proliferation
The renaming of sqlite3DecOrHexToI64
to sqlite3DecOrHexOrOctOrBinToI64
reflects a tactical fix rather than a strategic redesign. Contributing factors include:
- Lack of Polymorphic Parsing Infrastructure: SQLite’s numeric parsing functions were not designed to handle multiple radices dynamically.
- Fear of Breaking Legacy Code: The contributor avoided refactoring existing hex/decimal logic to minimize regression risks.
Cause 3: Inadequate Test Design
The test suites mirror basic functionality but omit stress tests and error path validation because:
- Overemphasis on Happy Paths: The tests verify correct parsing but neglect invalid syntax (e.g.,
0b
,0o
without digits). - Assumption of Hex Test Completeness: The contributor likely assumed
hexlit.test
coverage was sufficient, missing radix-specific edge cases.
Cause 4: Extension Oversight
The JSON5 extension’s numeric parser was overlooked because:
- Compartmentalized Development: The contributor focused on core SQLite without auditing extensions that depend on shared parsing utilities.
- Lack of Integration Testing: No cross-check between the JSON5 extension and the new literal syntax.
Troubleshooting Steps, Solutions & Fixes
Step 1: Refactor Character Classification Macros
Problem: Custom sqlite3Isbdigit
/sqlite3Isodigit
macros bypass sqlite3CtypeMap
.
Solution: Integrate with SQLite’s existing ctype infrastructure.
- Extend
sqlite3CtypeMap
:
Modify thesqlite3CtypeMap
initialization to include binary and octal digit flags:// In sqlite3InitCtypeMap(): for(i=0; i<256; i++){ if( isdigit(i) ) ctypeMap[i] |= SQLITE_CTYPE_DIGIT; if( isxdigit(i) ) ctypeMap[i] |= SQLITE_CTYPE_XDIGIT; if( i=='0' || i=='1' ) ctypeMap[i] |= SQLITE_CTYPE_BDIGIT; // Binary if( i>='0' && i<='7' ) ctypeMap[i] |= SQLITE_CTYPE_ODIGIT; // Octal }
- Replace Macros with Ctype Checks:
#define sqlite3Isbdigit(x) (sqlite3CtypeMap[(unsigned char)(x)] & SQLITE_CTYPE_BDIGIT) #define sqlite3Isodigit(x) (sqlite3CtypeMap[(unsigned char)(x)] & SQLITE_CTYPE_ODIGIT)
Validation:
- Update test cases to include non-ASCII characters (e.g., full-width digits) and verify they are rejected.
Step 2: Optimize Numeric Parsing Function Architecture
Problem: The bloated sqlite3DecOrHexOrOctOrBinToI64()
function harms readability.
Solution: Implement a radix-agnostic parser with a lookup table.
- Introduce a Radix Descriptor Struct:
typedef struct RadixInfo { const char *prefix; // "0x", "0o", etc. int prefixLen; // 2 int maxDigits; // 64 for binary, 22 for octal int (*isDigit)(u8); // sqlite3Isbdigit, etc. int radix; // 2, 8, 16 } RadixInfo;
- Refactor Parsing Logic:
static int sqlite3GeneralBaseToI64( const char *z, i64 *pOut, const RadixInfo *info ){ if( sqlite3_strnicmp(z, info->prefix, info->prefixLen)!=0 ) return 1; z += info->prefixLen; for(i=0; info->isDigit(z[i]); i++){} if( i > info->maxDigits ) return 2; // Overflow // ... parsing logic ... }
- Unified Entry Point:
int sqlite3DecOrHexOrOctOrBinToI64(const char *z, i64 *pOut){ static const RadixInfo bases[] = { {"0x",2,16,sqlite3Isxdigit,16}, {"0o",2,22,sqlite3Isodigit,8}, {"0b",2,64,sqlite3Isbdigit,2}, {NULL,0,0,NULL,10} // Decimal terminator }; for(int i=0; bases[i].prefix; i++){ if( sqlite3GeneralBaseToI64(z, pOut, &bases[i])==0 ) return 0; } return sqlite3Atoi64(z, pOut, strlen(z), SQLITE_UTF8); }
Benefits:
- Eliminates code duplication.
- Simplates adding new radices in the future.
Step 3: Expand Test Coverage
Problem: Gaps in binlit.test
and octlit.test
.
Solution: Augment tests using hexlit.test
as a template.
- Boundary Value Tests:
-- In binlit.test SELECT 0b0111111111111111111111111111111111111111111111111111111111111111; -- 2^63-1 SELECT 0b1000000000000000000000000000000000000000000000000000000000000000; -- -2^63
- Invalid Syntax Detection:
-- Should raise "malformed binary literal" SELECT 0b2; SELECT 0o9; SELECT 0b; -- No digits
- Underscore Support:
SELECT 0b1000_0001; -- → 129 SELECT 0o7_7_7; -- → 511
- OMIT Flag Verification:
#ifdef SQLITE_OMIT_BINARY_INTEGER // Verify that 0b... literals trigger syntax errors do_test binlit-omit-1 { catchsql { SELECT 0b1010; } } {1 {near "0b1010": syntax error}} #endif
Step 4: Update JSON5 Extension
Problem: JSON5 does not parse binary/octal literals.
Solution: Modify jsonParseNumber()
in src/json.c
to use sqlite3DecOrHexOrOctOrBinToI64()
.
- Refactor Numeric Parsing:
static int jsonParseNumber( JsonParse *pParse, JsonNode *pNode ){ // ... if( zNum[0]=='0' && (zNum[1]=='x' || zNum[1]=='X') ){ // Existing hex logic } else if( zNum[0]=='0' && (zNum[1]=='b' || zNum[1]=='B') ){ rc = sqlite3DecOrHexOrOctOrBinToI64(zNum, &i); } else if( zNum[0]=='0' && (zNum[1]=='o' || zNum[1]=='O') ){ rc = sqlite3DecOrHexOrOctOrBinToI64(zNum, &i); } else { rc = sqlite3Atoi64(zNum, &i, strlen(zNum), SQLITE_UTF8); } // ... }
- Add JSON5-Specific Tests:
SELECT json('0b1010'); -- → 10 SELECT json('0o755'); -- → 493 SELECT json('{"x": 0b111}'); -- → '{"x":7}'
Step 5: Validate Compile-Time Flags
Problem: Inadequate testing of SQLITE_OMIT_BINARY_INTEGER
and SQLITE_OMIT_OCTAL_INTEGER
.
Solution: Extend the test harness.
Configure CI Pipeline:
- Compile SQLite with
-DSQLITE_OMIT_BINARY_INTEGER
and verify:SELECT 0b1010; -- Should throw syntax error
- Repeat for
SQLITE_OMIT_OCTAL_INTEGER
.
- Compile SQLite with
Conditional Compilation Guards:
Ensure all binary/octal logic is wrapped in#ifndef SQLITE_OMIT_...
blocks.
Step 6: Address Underscore Handling
Problem: Partial support for underscores in literals.
Solution: Implement consistent underscore stripping across all radices.
- Preprocess Input String:
static int sqlite3GeneralBaseToI64(/*...*/){ char *zStripped = sqlite3StrDup(z); int j = 0; for(int i=0; z[i]; i++){ if( z[i]!='_' ) zStripped[j++] = z[i]; } zStripped[j] = '\0'; // Proceed with parsing zStripped sqlite3DbFree(db, zStripped); }
- Update Tests:
SELECT 0b1_0_1_0; -- → 10 SELECT 0o1_7_7; -- → 127 SELECT 0x1_0_0; -- → 256 (existing hex)
Final Code Review Checklist
Naming Conventions:
- Rename
sqlite3DecOrHexOrOctOrBinToI64()
tosqlite3Int64FromLiteral()
. - Prefix radix-specific helpers (e.g.,
sqlite3IsRadixDigit()
).
- Rename
Error Message Consistency:
- Standardize messages: "malformed binary literal" vs. "binary literal too big".
Documentation Updates:
- Add binary/octal syntax to https://sqlite.org/lang_expr.html#literal_values.
By systematically addressing these areas, the patch can achieve SQLite’s robustness requirements while maintaining backward compatibility and performance.