Implementing Binary and Octal Literal Support in SQLite: Challenges and Solutions

Issue Overview: Patch Implementation Gaps in Binary/Octal Literal Syntax Support

The core objective of this patch is to extend SQLite’s numeric literal support to include binary (e.g., 0b1010) and octal (e.g., 0o755) formats, aligning their behavior with existing hexadecimal literal handling. While functional for basic use cases, several critical implementation gaps prevent this patch from meeting SQLite’s rigorous quality standards.

Key Features Implemented

  1. Literal Prefix Matching: Case-insensitive recognition of 0b/0B for binary and 0o/0O for octal.
  2. 64-Bit Integer Parsing: Conversion of literals to signed 64-bit integers with overflow detection.
  3. Compile-Time Flags: SQLITE_OMIT_BINARY_INTEGER and SQLITE_OMIT_OCTAL_INTEGER to disable features.
  4. Error Handling: Custom error messages for oversized literals (e.g., "binary literal too big").

Critical Shortcomings

  1. Non-Compliant Character Classification:

    • New macros sqlite3Isbdigit and sqlite3Isodigit bypass SQLite’s centralized sqlite3CtypeMap system, risking inconsistency with other character-classification logic.
    • Example: sqlite3Isbdigit checks only for '0'/'1' via direct comparison, ignoring locale-aware ctype mappings.
  2. Function Naming and Scope Pollution:

    • The renamed sqlite3DecOrHexOrOctOrBinToI64() function introduces verbosity and violates SQLite’s naming conventions for clarity.
    • Code duplication in sqlite3GetInt32() for binary/octal parsing creates maintenance risks.
  3. Incomplete Test Coverage:

    • New test files binlit.test and octlit.test lack edge cases covered in hexlit.test, such as:
      • Underscore separators (e.g., 0b1010_0101).
      • Boundary values at 2^63-1 and -2^63.
      • Invalid digit detection (e.g., 0b12 or 0o89).
    • Insufficient validation of SQLITE_OMIT_* flags.
  4. JSON5 Extension Incompatibility:

    • The JSON5 extension’s numeric parser does not recognize binary/octal literals, breaking consistency with core SQLite.
  5. Underscore Handling Inconsistency:

    • The initial implementation allowed underscores in binary literals (e.g., 0b1000_0001) but later removed support, creating ambiguity.

Possible Causes: Architectural and Implementation Misalignments

Cause 1: Ad Hoc Character Classification

The patch introduces standalone macros sqlite3Isbdigit and sqlite3Isodigit instead of integrating with SQLite’s sqlite3CtypeMap bitmap. This likely stems from:

  • Misunderstanding of sqlite3CtypeMap: The contributor may not have realized that SQLite uses a centralized ctype system to avoid locale dependencies and ensure cross-platform consistency.
  • Urgency to Avoid Bitmask Complexity: Direct character comparisons (x == '0') are simpler but fail to account for SQLite’s strict definitions of valid digits.

Cause 2: Function Proliferation

The renaming of sqlite3DecOrHexToI64 to sqlite3DecOrHexOrOctOrBinToI64 reflects a tactical fix rather than a strategic redesign. Contributing factors include:

  • Lack of Polymorphic Parsing Infrastructure: SQLite’s numeric parsing functions were not designed to handle multiple radices dynamically.
  • Fear of Breaking Legacy Code: The contributor avoided refactoring existing hex/decimal logic to minimize regression risks.

Cause 3: Inadequate Test Design

The test suites mirror basic functionality but omit stress tests and error path validation because:

  • Overemphasis on Happy Paths: The tests verify correct parsing but neglect invalid syntax (e.g., 0b, 0o without digits).
  • Assumption of Hex Test Completeness: The contributor likely assumed hexlit.test coverage was sufficient, missing radix-specific edge cases.

Cause 4: Extension Oversight

The JSON5 extension’s numeric parser was overlooked because:

  • Compartmentalized Development: The contributor focused on core SQLite without auditing extensions that depend on shared parsing utilities.
  • Lack of Integration Testing: No cross-check between the JSON5 extension and the new literal syntax.

Troubleshooting Steps, Solutions & Fixes

Step 1: Refactor Character Classification Macros

Problem: Custom sqlite3Isbdigit/sqlite3Isodigit macros bypass sqlite3CtypeMap.
Solution: Integrate with SQLite’s existing ctype infrastructure.

  1. Extend sqlite3CtypeMap:
    Modify the sqlite3CtypeMap initialization to include binary and octal digit flags:

    // In sqlite3InitCtypeMap():
    for(i=0; i<256; i++){
      if( isdigit(i) ) ctypeMap[i] |= SQLITE_CTYPE_DIGIT;
      if( isxdigit(i) ) ctypeMap[i] |= SQLITE_CTYPE_XDIGIT;
      if( i=='0' || i=='1' ) ctypeMap[i] |= SQLITE_CTYPE_BDIGIT;  // Binary
      if( i>='0' && i<='7' ) ctypeMap[i] |= SQLITE_CTYPE_ODIGIT;  // Octal
    }
    
  2. Replace Macros with Ctype Checks:
    #define sqlite3Isbdigit(x) (sqlite3CtypeMap[(unsigned char)(x)] & SQLITE_CTYPE_BDIGIT)
    #define sqlite3Isodigit(x) (sqlite3CtypeMap[(unsigned char)(x)] & SQLITE_CTYPE_ODIGIT)
    

Validation:

  • Update test cases to include non-ASCII characters (e.g., full-width digits) and verify they are rejected.

Step 2: Optimize Numeric Parsing Function Architecture

Problem: The bloated sqlite3DecOrHexOrOctOrBinToI64() function harms readability.
Solution: Implement a radix-agnostic parser with a lookup table.

  1. Introduce a Radix Descriptor Struct:
    typedef struct RadixInfo {
      const char *prefix;      // "0x", "0o", etc.
      int prefixLen;           // 2
      int maxDigits;           // 64 for binary, 22 for octal
      int (*isDigit)(u8);      // sqlite3Isbdigit, etc.
      int radix;               // 2, 8, 16
    } RadixInfo;
    
  2. Refactor Parsing Logic:
    static int sqlite3GeneralBaseToI64(
      const char *z, 
      i64 *pOut, 
      const RadixInfo *info
    ){
      if( sqlite3_strnicmp(z, info->prefix, info->prefixLen)!=0 ) return 1;
      z += info->prefixLen;
      for(i=0; info->isDigit(z[i]); i++){}
      if( i > info->maxDigits ) return 2; // Overflow
      // ... parsing logic ...
    }
    
  3. Unified Entry Point:
    int sqlite3DecOrHexOrOctOrBinToI64(const char *z, i64 *pOut){
      static const RadixInfo bases[] = {
        {"0x",2,16,sqlite3Isxdigit,16}, 
        {"0o",2,22,sqlite3Isodigit,8},
        {"0b",2,64,sqlite3Isbdigit,2},
        {NULL,0,0,NULL,10} // Decimal terminator
      };
      for(int i=0; bases[i].prefix; i++){
        if( sqlite3GeneralBaseToI64(z, pOut, &bases[i])==0 ) return 0;
      }
      return sqlite3Atoi64(z, pOut, strlen(z), SQLITE_UTF8);
    }
    

Benefits:

  • Eliminates code duplication.
  • Simplates adding new radices in the future.

Step 3: Expand Test Coverage

Problem: Gaps in binlit.test and octlit.test.
Solution: Augment tests using hexlit.test as a template.

  1. Boundary Value Tests:
    -- In binlit.test
    SELECT 0b0111111111111111111111111111111111111111111111111111111111111111; -- 2^63-1
    SELECT 0b1000000000000000000000000000000000000000000000000000000000000000; -- -2^63
    
  2. Invalid Syntax Detection:
    -- Should raise "malformed binary literal"
    SELECT 0b2;
    SELECT 0o9;
    SELECT 0b; -- No digits
    
  3. Underscore Support:
    SELECT 0b1000_0001; -- → 129
    SELECT 0o7_7_7;     -- → 511
    
  4. OMIT Flag Verification:
    #ifdef SQLITE_OMIT_BINARY_INTEGER
      // Verify that 0b... literals trigger syntax errors
      do_test binlit-omit-1 {
        catchsql { SELECT 0b1010; }
      } {1 {near "0b1010": syntax error}}
    #endif
    

Step 4: Update JSON5 Extension

Problem: JSON5 does not parse binary/octal literals.
Solution: Modify jsonParseNumber() in src/json.c to use sqlite3DecOrHexOrOctOrBinToI64().

  1. Refactor Numeric Parsing:
    static int jsonParseNumber(
      JsonParse *pParse, 
      JsonNode *pNode
    ){
      // ...
      if( zNum[0]=='0' && (zNum[1]=='x' || zNum[1]=='X') ){
        // Existing hex logic
      } else if( zNum[0]=='0' && (zNum[1]=='b' || zNum[1]=='B') ){
        rc = sqlite3DecOrHexOrOctOrBinToI64(zNum, &i);
      } else if( zNum[0]=='0' && (zNum[1]=='o' || zNum[1]=='O') ){
        rc = sqlite3DecOrHexOrOctOrBinToI64(zNum, &i);
      } else {
        rc = sqlite3Atoi64(zNum, &i, strlen(zNum), SQLITE_UTF8);
      }
      // ...
    }
    
  2. Add JSON5-Specific Tests:
    SELECT json('0b1010');       -- → 10
    SELECT json('0o755');        -- → 493
    SELECT json('{"x": 0b111}'); -- → '{"x":7}'
    

Step 5: Validate Compile-Time Flags

Problem: Inadequate testing of SQLITE_OMIT_BINARY_INTEGER and SQLITE_OMIT_OCTAL_INTEGER.
Solution: Extend the test harness.

  1. Configure CI Pipeline:

    • Compile SQLite with -DSQLITE_OMIT_BINARY_INTEGER and verify:
      SELECT 0b1010; -- Should throw syntax error
      
    • Repeat for SQLITE_OMIT_OCTAL_INTEGER.
  2. Conditional Compilation Guards:
    Ensure all binary/octal logic is wrapped in #ifndef SQLITE_OMIT_... blocks.


Step 6: Address Underscore Handling

Problem: Partial support for underscores in literals.
Solution: Implement consistent underscore stripping across all radices.

  1. Preprocess Input String:
    static int sqlite3GeneralBaseToI64(/*...*/){
      char *zStripped = sqlite3StrDup(z);
      int j = 0;
      for(int i=0; z[i]; i++){
        if( z[i]!='_' ) zStripped[j++] = z[i];
      }
      zStripped[j] = '\0';
      // Proceed with parsing zStripped
      sqlite3DbFree(db, zStripped);
    }
    
  2. Update Tests:
    SELECT 0b1_0_1_0;    -- → 10
    SELECT 0o1_7_7;      -- → 127
    SELECT 0x1_0_0;      -- → 256 (existing hex)
    

Final Code Review Checklist

  1. Naming Conventions:

    • Rename sqlite3DecOrHexOrOctOrBinToI64() to sqlite3Int64FromLiteral().
    • Prefix radix-specific helpers (e.g., sqlite3IsRadixDigit()).
  2. Error Message Consistency:

    • Standardize messages: "malformed binary literal" vs. "binary literal too big".
  3. Documentation Updates:

    • Add binary/octal syntax to https://sqlite.org/lang_expr.html#literal_values.

By systematically addressing these areas, the patch can achieve SQLite’s robustness requirements while maintaining backward compatibility and performance.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *