Implementing Binary and Octal Literal Support in SQLite: Challenges and Solutions
Issue Overview: Patch Implementation Gaps in Binary/Octal Literal Syntax Support
The core objective of this patch is to extend SQLite’s numeric literal support to include binary (e.g., 0b1010) and octal (e.g., 0o755) formats, aligning their behavior with existing hexadecimal literal handling. While functional for basic use cases, several critical implementation gaps prevent this patch from meeting SQLite’s rigorous quality standards.
Key Features Implemented
- Literal Prefix Matching: Case-insensitive recognition of
0b/0Bfor binary and0o/0Ofor octal. - 64-Bit Integer Parsing: Conversion of literals to signed 64-bit integers with overflow detection.
- Compile-Time Flags:
SQLITE_OMIT_BINARY_INTEGERandSQLITE_OMIT_OCTAL_INTEGERto disable features. - Error Handling: Custom error messages for oversized literals (e.g., "binary literal too big").
Critical Shortcomings
-
Non-Compliant Character Classification:
- New macros
sqlite3Isbdigitandsqlite3Isodigitbypass SQLite’s centralizedsqlite3CtypeMapsystem, risking inconsistency with other character-classification logic. - Example:
sqlite3Isbdigitchecks only for'0'/'1'via direct comparison, ignoring locale-aware ctype mappings.
- New macros
-
Function Naming and Scope Pollution:
- The renamed
sqlite3DecOrHexOrOctOrBinToI64()function introduces verbosity and violates SQLite’s naming conventions for clarity. - Code duplication in
sqlite3GetInt32()for binary/octal parsing creates maintenance risks.
- The renamed
-
Incomplete Test Coverage:
- New test files
binlit.testandoctlit.testlack edge cases covered inhexlit.test, such as:- Underscore separators (e.g.,
0b1010_0101). - Boundary values at
2^63-1and-2^63. - Invalid digit detection (e.g.,
0b12or0o89).
- Underscore separators (e.g.,
- Insufficient validation of
SQLITE_OMIT_*flags.
- New test files
-
JSON5 Extension Incompatibility:
- The JSON5 extension’s numeric parser does not recognize binary/octal literals, breaking consistency with core SQLite.
-
Underscore Handling Inconsistency:
- The initial implementation allowed underscores in binary literals (e.g.,
0b1000_0001) but later removed support, creating ambiguity.
- The initial implementation allowed underscores in binary literals (e.g.,
Possible Causes: Architectural and Implementation Misalignments
Cause 1: Ad Hoc Character Classification
The patch introduces standalone macros sqlite3Isbdigit and sqlite3Isodigit instead of integrating with SQLite’s sqlite3CtypeMap bitmap. This likely stems from:
- Misunderstanding of
sqlite3CtypeMap: The contributor may not have realized that SQLite uses a centralized ctype system to avoid locale dependencies and ensure cross-platform consistency. - Urgency to Avoid Bitmask Complexity: Direct character comparisons (
x == '0') are simpler but fail to account for SQLite’s strict definitions of valid digits.
Cause 2: Function Proliferation
The renaming of sqlite3DecOrHexToI64 to sqlite3DecOrHexOrOctOrBinToI64 reflects a tactical fix rather than a strategic redesign. Contributing factors include:
- Lack of Polymorphic Parsing Infrastructure: SQLite’s numeric parsing functions were not designed to handle multiple radices dynamically.
- Fear of Breaking Legacy Code: The contributor avoided refactoring existing hex/decimal logic to minimize regression risks.
Cause 3: Inadequate Test Design
The test suites mirror basic functionality but omit stress tests and error path validation because:
- Overemphasis on Happy Paths: The tests verify correct parsing but neglect invalid syntax (e.g.,
0b,0owithout digits). - Assumption of Hex Test Completeness: The contributor likely assumed
hexlit.testcoverage was sufficient, missing radix-specific edge cases.
Cause 4: Extension Oversight
The JSON5 extension’s numeric parser was overlooked because:
- Compartmentalized Development: The contributor focused on core SQLite without auditing extensions that depend on shared parsing utilities.
- Lack of Integration Testing: No cross-check between the JSON5 extension and the new literal syntax.
Troubleshooting Steps, Solutions & Fixes
Step 1: Refactor Character Classification Macros
Problem: Custom sqlite3Isbdigit/sqlite3Isodigit macros bypass sqlite3CtypeMap.
Solution: Integrate with SQLite’s existing ctype infrastructure.
- Extend
sqlite3CtypeMap:
Modify thesqlite3CtypeMapinitialization to include binary and octal digit flags:// In sqlite3InitCtypeMap(): for(i=0; i<256; i++){ if( isdigit(i) ) ctypeMap[i] |= SQLITE_CTYPE_DIGIT; if( isxdigit(i) ) ctypeMap[i] |= SQLITE_CTYPE_XDIGIT; if( i=='0' || i=='1' ) ctypeMap[i] |= SQLITE_CTYPE_BDIGIT; // Binary if( i>='0' && i<='7' ) ctypeMap[i] |= SQLITE_CTYPE_ODIGIT; // Octal } - Replace Macros with Ctype Checks:
#define sqlite3Isbdigit(x) (sqlite3CtypeMap[(unsigned char)(x)] & SQLITE_CTYPE_BDIGIT) #define sqlite3Isodigit(x) (sqlite3CtypeMap[(unsigned char)(x)] & SQLITE_CTYPE_ODIGIT)
Validation:
- Update test cases to include non-ASCII characters (e.g., full-width digits) and verify they are rejected.
Step 2: Optimize Numeric Parsing Function Architecture
Problem: The bloated sqlite3DecOrHexOrOctOrBinToI64() function harms readability.
Solution: Implement a radix-agnostic parser with a lookup table.
- Introduce a Radix Descriptor Struct:
typedef struct RadixInfo { const char *prefix; // "0x", "0o", etc. int prefixLen; // 2 int maxDigits; // 64 for binary, 22 for octal int (*isDigit)(u8); // sqlite3Isbdigit, etc. int radix; // 2, 8, 16 } RadixInfo; - Refactor Parsing Logic:
static int sqlite3GeneralBaseToI64( const char *z, i64 *pOut, const RadixInfo *info ){ if( sqlite3_strnicmp(z, info->prefix, info->prefixLen)!=0 ) return 1; z += info->prefixLen; for(i=0; info->isDigit(z[i]); i++){} if( i > info->maxDigits ) return 2; // Overflow // ... parsing logic ... } - Unified Entry Point:
int sqlite3DecOrHexOrOctOrBinToI64(const char *z, i64 *pOut){ static const RadixInfo bases[] = { {"0x",2,16,sqlite3Isxdigit,16}, {"0o",2,22,sqlite3Isodigit,8}, {"0b",2,64,sqlite3Isbdigit,2}, {NULL,0,0,NULL,10} // Decimal terminator }; for(int i=0; bases[i].prefix; i++){ if( sqlite3GeneralBaseToI64(z, pOut, &bases[i])==0 ) return 0; } return sqlite3Atoi64(z, pOut, strlen(z), SQLITE_UTF8); }
Benefits:
- Eliminates code duplication.
- Simplates adding new radices in the future.
Step 3: Expand Test Coverage
Problem: Gaps in binlit.test and octlit.test.
Solution: Augment tests using hexlit.test as a template.
- Boundary Value Tests:
-- In binlit.test SELECT 0b0111111111111111111111111111111111111111111111111111111111111111; -- 2^63-1 SELECT 0b1000000000000000000000000000000000000000000000000000000000000000; -- -2^63 - Invalid Syntax Detection:
-- Should raise "malformed binary literal" SELECT 0b2; SELECT 0o9; SELECT 0b; -- No digits - Underscore Support:
SELECT 0b1000_0001; -- → 129 SELECT 0o7_7_7; -- → 511 - OMIT Flag Verification:
#ifdef SQLITE_OMIT_BINARY_INTEGER // Verify that 0b... literals trigger syntax errors do_test binlit-omit-1 { catchsql { SELECT 0b1010; } } {1 {near "0b1010": syntax error}} #endif
Step 4: Update JSON5 Extension
Problem: JSON5 does not parse binary/octal literals.
Solution: Modify jsonParseNumber() in src/json.c to use sqlite3DecOrHexOrOctOrBinToI64().
- Refactor Numeric Parsing:
static int jsonParseNumber( JsonParse *pParse, JsonNode *pNode ){ // ... if( zNum[0]=='0' && (zNum[1]=='x' || zNum[1]=='X') ){ // Existing hex logic } else if( zNum[0]=='0' && (zNum[1]=='b' || zNum[1]=='B') ){ rc = sqlite3DecOrHexOrOctOrBinToI64(zNum, &i); } else if( zNum[0]=='0' && (zNum[1]=='o' || zNum[1]=='O') ){ rc = sqlite3DecOrHexOrOctOrBinToI64(zNum, &i); } else { rc = sqlite3Atoi64(zNum, &i, strlen(zNum), SQLITE_UTF8); } // ... } - Add JSON5-Specific Tests:
SELECT json('0b1010'); -- → 10 SELECT json('0o755'); -- → 493 SELECT json('{"x": 0b111}'); -- → '{"x":7}'
Step 5: Validate Compile-Time Flags
Problem: Inadequate testing of SQLITE_OMIT_BINARY_INTEGER and SQLITE_OMIT_OCTAL_INTEGER.
Solution: Extend the test harness.
-
Configure CI Pipeline:
- Compile SQLite with
-DSQLITE_OMIT_BINARY_INTEGERand verify:SELECT 0b1010; -- Should throw syntax error - Repeat for
SQLITE_OMIT_OCTAL_INTEGER.
- Compile SQLite with
-
Conditional Compilation Guards:
Ensure all binary/octal logic is wrapped in#ifndef SQLITE_OMIT_...blocks.
Step 6: Address Underscore Handling
Problem: Partial support for underscores in literals.
Solution: Implement consistent underscore stripping across all radices.
- Preprocess Input String:
static int sqlite3GeneralBaseToI64(/*...*/){ char *zStripped = sqlite3StrDup(z); int j = 0; for(int i=0; z[i]; i++){ if( z[i]!='_' ) zStripped[j++] = z[i]; } zStripped[j] = '\0'; // Proceed with parsing zStripped sqlite3DbFree(db, zStripped); } - Update Tests:
SELECT 0b1_0_1_0; -- → 10 SELECT 0o1_7_7; -- → 127 SELECT 0x1_0_0; -- → 256 (existing hex)
Final Code Review Checklist
-
Naming Conventions:
- Rename
sqlite3DecOrHexOrOctOrBinToI64()tosqlite3Int64FromLiteral(). - Prefix radix-specific helpers (e.g.,
sqlite3IsRadixDigit()).
- Rename
-
Error Message Consistency:
- Standardize messages: "malformed binary literal" vs. "binary literal too big".
-
Documentation Updates:
- Add binary/octal syntax to https://sqlite.org/lang_expr.html#literal_values.
By systematically addressing these areas, the patch can achieve SQLite’s robustness requirements while maintaining backward compatibility and performance.