Handling SIGBUS Errors in SQLite Due to Concurrent WAL File Access
Understanding SIGBUS Crashes in Multi-Process SQLite WAL Mode Operations
Issue Overview: SIGBUS During WAL Index Operations
The reported SIGBUS error occurs when SQLite attempts to access memory-mapped regions of the Write-Ahead Log (WAL) index (*-shm file) after another process or thread truncates or modifies the underlying file. This manifests as a crash in functions like walIndexAppend
, walIndexReadHdr
, or memset
when accessing invalid memory addresses. The problem arises in multi-process or multi-threaded Android environments where SQLite databases are stored on external storage and accessed concurrently. Key symptoms include:
- Stack traces involving WAL index operations (e.g.,
walIndexAppend
,walTryBeginRead
). - Crashes at specific memory offsets (e.g., 4096-byte boundaries).
- Use of non-SQLite file operations (e.g., Java
File
API) to delete or modify database files. - Intermittent occurrence due to race conditions between processes closing connections and truncating shared files.
The WAL mechanism relies on memory-mapped regions of the *-shm file to coordinate concurrent access. When a process closes its database connection, SQLite may truncate the *-shm file if it believes no other processes are using it. If another process still has an active memory mapping to the truncated region, accessing that memory triggers a SIGBUS. This violates SQLite’s assumption that all processes coordinate through advisory locks and file descriptors.
Root Causes: POSIX Locks, File Descriptor Leaks, and Manual File Deletion
Three primary factors contribute to this issue:
1. POSIX Advisory Lock Misbehavior in Multi-Threaded Environments
SQLite uses POSIX advisory locks to coordinate access to the database. On Linux/Android, these locks are associated with file descriptors, not processes or threads. If a thread closes a file descriptor (e.g., via close()
or sqlite3_close()
), all locks held by that descriptor are released, even if other threads in the same process still expect them. This leads to:
- Premature lock release: Thread A closes a connection, releasing locks that Thread B still relies on.
- Truncation of shared memory files: A new connection in Thread C may truncate the *-shm file, invalidating Thread B’s memory mappings.
This violates SQLite’s thread-safety guarantees when using the same database handle across threads.
2. Direct File Manipulation Bypassing SQLite APIs
Deleting or modifying SQLite files (e.g., *.db, *.wal, *.shm) using external tools (e.g., Java File.delete()
) while connections are open creates inconsistencies:
- Orphaned memory mappings: If the *-shm file is deleted while a process has it memory-mapped, subsequent accesses to the mapping cause SIGBUS.
- Lock state corruption: SQLite relies on the presence of the *-shm file to detect active connections. Deleting it manually breaks this coordination.
3. File Descriptor Proliferation in Multi-Process Scenarios
Each process opening the same database creates its own file descriptors for the *-shm file. If these descriptors are not managed properly (e.g., not closed before process termination), subsequent truncation by another process can invalidate active mappings.
Resolving SIGBUS Errors: File Handling Discipline and Lock Coordination
Step 1: Enforce Strict SQLite File Management
- Avoid manual file operations: Never delete or truncate SQLite files directly. Use SQLite APIs like
sqlite3_close()
,VACUUM
, orPRAGMA wal_checkpoint
to manage files. - Use
SQLITE_FCNTL_PERSIST_WAL
pragma: Prevent SQLite from truncating the *-shm file on close by setting:sqlite3_file_control(db, SQLITE_FCNTL_PERSIST_WAL, 1);
This keeps the *-shm file intact until the last connection closes.
Step 2: Fix POSIX Lock Coordination Across Threads
- Isolate database connections per thread: Each thread should open and close its own
sqlite3*
handle. Avoid sharing handles between threads. - Audit file descriptor usage: Ensure no thread closes a database connection while others are actively using it. Use reference counting if necessary.
- Enable SQLite’s “unix-excl” VFS: This forces exclusive locking mode, bypassing POSIX advisory locks. Add
?mode=rwc
to the URI when opening the database:SQLiteDatabase.openDatabase("file:/mnt/external_storage/db.db?mode=rwc", ...);
Step 3: Diagnose Memory Mapping Corruption
- *Inspect -shm file state: After a crash, check the size of the *-shm file. If it’s smaller than expected (e.g., 3 bytes), a truncation race occurred.
- Monitor file descriptors: Use
lsof
or/proc/<pid>/fd
to identify leaked descriptors holding locks. - Enable SQLite’s defensive mode: Set
PRAGMA cell_size_check=ON
andPRAGMA page_size=4096
to catch inconsistencies early.
Step 4: Implement Atomic File Replacement
If you must replace a database (e.g., during updates), use SQLite’s ATTACH
/DETACH
mechanism or atomic renames:
- Create a new database with a temporary name.
- Populate it with the desired data.
- Use
REPLACE
orVACUUM INTO
to atomically replace the old database:VACUUM INTO '/mnt/external_storage/db_new.db';
- Rename the new file over the old one using
Files.move()
withStandardCopyOption.ATOMIC_MOVE
.
Step 5: Validate Multi-Process Coordination
- Use
SQLITE_OPEN_WAL
mode: Ensure all processes use the same journal mode. Mixed modes (e.g., WAL in one process, DELETE in another) corrupt the database. - Check
SQLITE_BUSY
handling: Implement retry logic with exponential backoff when encounteringSQLITE_BUSY
errors.
Step 6: Address Android-Specific Edge Cases
- External storage limitations: Android external storage (e.g., SD cards) may not fully support POSIX locks. Prefer internal storage for critical databases.
- MediaStore interactions: If the database is exposed via
MediaStore
, other apps may open it unexpectedly. UsenoMedia
files or hidden directories to prevent this.
Final Recommendations for Stability
- Avoid multi-process access: Redesign the app to use a single process or a centralized service for database access.
- Migrate to Room with caution: Room’s automatic WAL management can help, but ensure all
@Dao
methods properly close cursors and transactions. - Enable SQLite’s debug logs: Compile SQLite with
-DSQLITE_DEBUG
and monitorsqlite3_log()
output for lock/unlock events.
By adhering to these practices, the risk of SIGBUS errors due to WAL file truncation and lock mismanagement is minimized. Persistent issues warrant deeper inspection of the SQLite build configuration and kernel-level file system behavior on the target Android devices.