Integrating SQLite dbhash as a C Library for Cross-Platform Database Comparison
Database Hash Computation Architecture in Embedded Systems
The core challenge involves implementing a reliable mechanism to compute identical hashes of SQLite databases across multiple platforms (GNU/Linux, iOS, Android) without relying on external executables. The solution must achieve binary equivalence in hash outputs between local and remote databases while maintaining thread safety, memory efficiency, and cross-platform compatibility. A critical constraint emerges from the need to replicate the specific hashing behavior of SQLite’s dbhash utility through direct C library integration rather than process invocation.
The implementation shown attempts to abstract dbhash functionality through WFDBH_HashTable() function, which combines SHA1 hashing of both table content and schema metadata. Key architectural components include:
- Global State Management: Use of WFDBH_g singleton structure for hash context and database connection
- SHA1 Implementation: Custom cryptographic routines with endianness detection
- SQL Schema Extraction: Combined querying of sqlite_schema system table
- Type-Sensitive Data Hashing: Different handling for INTEGER, FLOAT, TEXT, BLOB, and NULL types
- Buffer Management: Fixed-size output buffer for hexadecimal hash strings
Critical Failure Modes in Embedded Hash Implementation
Thread-Unsafe Global State
The WFDBH_g global variable creates reentrancy risks when multiple threads call WFDBH_HashTable() concurrently. This singleton pattern causes hash state corruption through shared cx member, particularly dangerous in mobile platforms with common async database operations.
Platform-Sensitive Endian Handling
The SHA1Transform() implementation contains architecture-dependent code paths through Rl0/Rb0 macros that test endianness at runtime via (unsigned char)&one. This produces different hash results on big-endian vs little-endian architectures when processing integer columns, violating cross-platform consistency requirements.
Incomplete Schema Hashing
The current schema query only captures table, index, and view definitions but omits critical database artifacts:
SELECT type, name, tbl_name, sql FROM sqlite_schema
WHERE tbl_name LIKE '%s' ORDER BY name COLLATE nocase
This misses trigger associations, virtual table configurations, and collation sequences that affect data interpretation. The COLLATE nocase clause introduces case insensitivity where original schema uses case-sensitive comparisons.
Type Serialization Inconsistencies
The hash_step() implementation shows divergent serialization strategies:
- Integers: Big-endian byte order via manual bit shifting
- Floats: Raw IEEE-754 bits with endian reversal
- Text: Direct byte inclusion without normalization
This creates platform-dependent results when databases contain floating-point NaN values or text columns with different normalization forms.
Buffer Overflow Vulnerabilities
The hash_finish() function blindly copies 41 bytes into caller-provided buffer via strcpy() without validating result_buffer_size parameter. This risks stack/heap corruption when called with undersized buffers, particularly dangerous in mobile environments with ASLR protections.
Schema Injection Vulnerabilities
The hash_one_query() function builds SQL statements through unchecked string concatenation:
hash_one_query("SELECT * FROM %s", table_name);
This allows SQL injection if table_name contains malicious characters like semicolons or quotes. Proper identifier quoting is absent despite SQLite’s printf() functions supporting safe parameter substitution.
Comprehensive Implementation Strategy for Reliable Database Hashing
Global State Elimination and Thread Safety
Replace singleton with context-passing architecture:
typedef struct HashContext {
SHA1Context cx;
sqlite3* db;
ErrorState err;
} HashContext;
int WFDBH_HashTable(HashContext* ctx, const char* zTable, char* output) {
hash_init(&ctx->cx);
// ... rest of implementation ...
}
Implement thread-local storage for platforms requiring shared contexts:
#if defined(__APPLE__)
#include <pthread.h>
static pthread_key_t ctx_key;
#elif defined(_WIN32)
static __declspec(thread) HashContext tls_ctx;
#endif
Platform-Neutral Data Serialization
Standardize column value encoding:
- Integers: Use SQLite’s internal representation via sqlite3_value blob:
sqlite3_value* val = sqlite3_column_value(pStmt, i);
int bytes = sqlite3_value_bytes(val);
hash_step(sqlite3_value_blob(val), bytes);
- Floats: Convert to IEEE-754 big-endian representation:
double d = sqlite3_column_double(pStmt, i);
uint64_t u;
memcpy(&u, &d, sizeof(u));
u = htobe64(u); // Endian conversion
hash_step((unsigned char*)&u, sizeof(u));
- Text: Apply Unicode NFC normalization:
const char* text = sqlite3_column_text(pStmt, i);
size_t norm_len;
char* norm = normalize_nfc(text, &norm_len);
hash_step(norm, norm_len);
free(norm);
Complete Schema Capture
Expand schema query to include all dependent objects:
WITH dependencies AS (
SELECT DISTINCT tbl_name FROM sqlite_schema
WHERE sql LIKE '%' || quote(:table) || '%'
UNION
SELECT :table
)
SELECT type, name, tbl_name, sql, rootpage
FROM sqlite_schema
WHERE tbl_name IN dependencies
ORDER BY name COLLATE BINARY;
Include collation sequences and database configuration:
hash_one_query(ctx, "PRAGMA encoding");
hash_one_query(ctx, "PRAGMA foreign_key_list(%Q)", zTable);
Cryptographic Implementation Robustness
Replace custom SHA1 with formally verified implementation:
- Use OpenSSL’s EVPI interface when available:
#include <openssl/evp.h>
void hash_init(EVP_MD_CTX* ctx) {
EVP_DigestInit_ex(ctx, EVP_sha1(), NULL);
}
- Fallback to RFC 3174-compliant SHA1 for restricted environments:
// Implement RFC 3174 exactly without platform-specific optimizations
- Add hash algorithm negotiation:
enum HashAlgo { ALGO_SHA1, ALGO_XXH3, ALGO_SHA3_256 };
Memory Safety Enforcement
Implement strict buffer contracts:
#define HASH_OUTPUT_SIZE 41 // 40 chars + null
int WFDBH_HashTable(..., char output[HASH_OUTPUT_SIZE]) {
if (output == NULL) return SQLITE_MISUSE;
memset(output, 0, HASH_OUTPUT_SIZE);
// ... computation ...
memcpy(output, zOut, 40); // Guaranteed fit
}
Add boundary checks in hash_step():
void hash_step(HashContext* ctx, const void* data, size_t len) {
if (len > SHA1_MAX_BLOCK) abort();
// ... processing ...
}
SQL Injection Prevention
Use parameterized schema queries with SQLite’s type-safe functions:
sqlite3_stmt* pStmt;
sqlite3_prepare_v2(db, "SELECT * FROM main.sqlite_schema WHERE tbl_name=?", -1, &pStmt, 0);
sqlite3_bind_text(pStmt, 1, zTable, -1, SQLITE_STATIC);
Implement identifier quoting for dynamic DDL:
char* zSafeTable = sqlite3_mprintf("%Q", zTable);
hash_one_query(ctx, "SELECT * FROM %s", zSafeTable);
sqlite3_free(zSafeTable);
Cross-Platform Validation Suite
Develop test harness covering:
- Endianness Matrix
docker run --rm -v $PWD:/src -w /src multiarch/qemu-user-static \
-cpu cortex-a72 -L /usr/aarch64-linux-gnu ./hash_test
- Floating Point Consistency
TEST_ASSERT_EQUAL_HASH(1.0, 1.0);
TEST_ASSERT_DIFFERENT_HASH(0.0, -0.0);
TEST_ASSERT_EQUAL_HASH(NAN, NAN); // Requires NaN canonicalization
- Concurrency Stress Test
#pragma omp parallel for
for (int i=0; i<1000; i++) {
HashContext ctx;
compute_hash(&ctx, "test_table");
}
Performance Optimization
Implement streaming hash updates for large BLOBs:
sqlite3_blob* pBlob;
sqlite3_blob_open(db, "main", zTable, "data", rowid, 0, &pBlob);
void* buf = malloc(BUF_SIZE);
while (offset < total_size) {
sqlite3_blob_read(pBlob, buf, BUF_SIZE, offset);
hash_step(ctx, buf, actual_bytes);
offset += BUF_SIZE;
}
Add column-level hash caching using SQLite’s update hooks:
sqlite3_update_hook(db, on_table_update, hash_cache);
This comprehensive approach addresses all critical failure modes while maintaining the original goal of portable, self-contained database hashing. The solution balances cryptographic rigor with practical performance considerations across constrained mobile environments.