Database Corruption in Multi-Process WAL Mode on Linux
Concurrency in WAL Mode and Unexpected File Lock Interactions
The core issue revolves around database corruption observed when two or more processes interact with an SQLite database in Write-Ahead Logging (WAL) mode on Linux. The corruption manifests when one process (Process A) writes data to the database and then closes it while another process (Process B) is actively appending new data to the WAL file. This scenario does not reproduce on Windows and is mitigated by enabling the SQLITE_SBCONFIG_NO_CKPT_ON_CLOSE
option, which skips the checkpoint operation during connection closure. The root cause lies in the interplay between SQLite’s WAL checkpointing logic, POSIX advisory locks, and filesystem-specific behaviors (e.g., F2FS). The corruption occurs when Process A truncates the WAL file after a checkpoint, while Process B continues writing to the original (now truncated) WAL file, creating a "blank space" in the WAL structure. This mismatch between the WAL index and the actual WAL file contents leads to SQLITE_CORRUPT
errors during subsequent operations.
File Locking Semantics and Checkpoint Timing During Connection Closure
The problem arises from three primary factors:
- POSIX Advisory Lock Release via
close()
: When an application directly invokesclose()
on a database file descriptor (e.g., using POSIX file operations), all advisory locks held by the process are released, regardless of which file descriptor or thread acquired them. This violates SQLite’s assumption that locks are maintained until explicitly released viasqlite3_close_v2()
. - Checkpoint-on-Close Behavior: By default, SQLite performs a passive checkpoint during
sqlite3_close_v2()
if the connection is in WAL mode. This checkpoint attempts to synchronize the WAL file with the main database. If another process is actively writing to the WAL during this checkpoint, the truncation of the WAL file by Process A can leave Process B writing to an invalid file offset. - Filesystem-Specific WAL Handling: The F2FS filesystem’s behavior with atomic writes (
SQLITE_ENABLE_BATCH_ATOMIC_WRITE
) or write reordering may exacerbate the issue. While not directly causative, filesystem characteristics influence how WAL truncation and appends are interleaved.
A critical code path is sqlite3WalClose()
, which transitions the WAL handle to exclusive mode and invokes sqlite3WalCheckpoint()
. If the checkpoint completes successfully, SQLite truncates the WAL file unless the "persist WAL" option is enabled. However, if Process B has already opened a new transaction and begun writing to the WAL, the truncation creates a discontinuity in the WAL file. Process B’s subsequent writes reference outdated WAL index metadata, leading to corruption.
Mitigating Corruption via Configuration, Lock Management, and Safe Closure Practices
Step 1: Eliminate Direct File Descriptor Manipulation
- Avoid POSIX
open()
/close()
on Database Files: SQLite manages file descriptors internally. Manually closing descriptors viaclose()
invalidates locks and bypasses SQLite’s internal state tracking. Replace direct file operations with SQLite APIs (e.g.,sqlite3_open_v2()
,sqlite3_close_v2()
). - Validate Locking Mechanisms: Use
PRAGMA locking_mode=EXCLUSIVE
if feasible, though this reduces concurrency. For multi-process setups, ensure all processes use the same SQLite connection lifecycle management.
Step 2: Configure Checkpoint Behavior on Closure
- Enable
SQLITE_SBCONFIG_NO_CKPT_ON_CLOSE
: This option skips the checkpoint during connection closure, preventing WAL truncation while other processes are active. Configure it viasqlite3_db_config(db, SQLITE_DBCONFIG_NO_CKPT_ON_CLOSE, 1, 0)
. - Manual Checkpoint Scheduling: If checkpoints are necessary, execute them during application idle periods using
PRAGMA wal_checkpoint(TRUNCATE)
orsqlite3_wal_checkpoint_v2()
, ensuring no concurrent writes are ongoing.
Step 3: Filesystem and Compile-Time Option Validation
- Disable
SQLITE_ENABLE_BATCH_ATOMIC_WRITE
on F2FS: This option optimizes write operations but may conflict with WAL’s append-only semantics on certain filesystems. Test with this option disabled. - Verify Filesystem Coherency: Use
PRAGMA quick_check
to detect low-level corruption. Consider switching to a filesystem with stricter POSIX compliance (e.g., ext4) for comparison.
Step 4: Implement Process Synchronization
- Cross-Process Notification: Use IPC mechanisms (e.g., named pipes, signals) to notify other processes before closing a database connection. This allows active processes to finalize transactions and release WAL resources.
- Retry Logic for Busy Conditions: Wrap write operations in retry loops with
sqlite3_busy_handler()
orsqlite3_busy_timeout()
to handle transient lock conflicts during checkpoint/close sequences.
Step 5: Audit Connection Handling Across Forks
- Isolate Connections Post-
fork()
: Database connections must not be shared across processes created viafork()
. Ensure each child process opens its own connection after forking.
Step 6: Enable Debugging and Diagnostics
- Log WAL Index State: Instrument the application to log
sqlite3_wal_checkpoint()
results and WAL file sizes before/after close operations. - Trace File Descriptor Lifecycles: Use tools like
strace
orlsof
to monitor which processes hold database file descriptors and locks.
By systematically addressing file descriptor management, checkpoint timing, and filesystem interactions, the corruption scenario can be resolved. The key is to align SQLite’s WAL mechanics with the application’s concurrency model while respecting POSIX locking semantics.