Database Corruption in Multi-Process WAL Mode on Linux


Concurrency in WAL Mode and Unexpected File Lock Interactions

The core issue revolves around database corruption observed when two or more processes interact with an SQLite database in Write-Ahead Logging (WAL) mode on Linux. The corruption manifests when one process (Process A) writes data to the database and then closes it while another process (Process B) is actively appending new data to the WAL file. This scenario does not reproduce on Windows and is mitigated by enabling the SQLITE_SBCONFIG_NO_CKPT_ON_CLOSE option, which skips the checkpoint operation during connection closure. The root cause lies in the interplay between SQLite’s WAL checkpointing logic, POSIX advisory locks, and filesystem-specific behaviors (e.g., F2FS). The corruption occurs when Process A truncates the WAL file after a checkpoint, while Process B continues writing to the original (now truncated) WAL file, creating a "blank space" in the WAL structure. This mismatch between the WAL index and the actual WAL file contents leads to SQLITE_CORRUPT errors during subsequent operations.


File Locking Semantics and Checkpoint Timing During Connection Closure

The problem arises from three primary factors:

  1. POSIX Advisory Lock Release via close(): When an application directly invokes close() on a database file descriptor (e.g., using POSIX file operations), all advisory locks held by the process are released, regardless of which file descriptor or thread acquired them. This violates SQLite’s assumption that locks are maintained until explicitly released via sqlite3_close_v2().
  2. Checkpoint-on-Close Behavior: By default, SQLite performs a passive checkpoint during sqlite3_close_v2() if the connection is in WAL mode. This checkpoint attempts to synchronize the WAL file with the main database. If another process is actively writing to the WAL during this checkpoint, the truncation of the WAL file by Process A can leave Process B writing to an invalid file offset.
  3. Filesystem-Specific WAL Handling: The F2FS filesystem’s behavior with atomic writes (SQLITE_ENABLE_BATCH_ATOMIC_WRITE) or write reordering may exacerbate the issue. While not directly causative, filesystem characteristics influence how WAL truncation and appends are interleaved.

A critical code path is sqlite3WalClose(), which transitions the WAL handle to exclusive mode and invokes sqlite3WalCheckpoint(). If the checkpoint completes successfully, SQLite truncates the WAL file unless the "persist WAL" option is enabled. However, if Process B has already opened a new transaction and begun writing to the WAL, the truncation creates a discontinuity in the WAL file. Process B’s subsequent writes reference outdated WAL index metadata, leading to corruption.


Mitigating Corruption via Configuration, Lock Management, and Safe Closure Practices

Step 1: Eliminate Direct File Descriptor Manipulation

  • Avoid POSIX open()/close() on Database Files: SQLite manages file descriptors internally. Manually closing descriptors via close() invalidates locks and bypasses SQLite’s internal state tracking. Replace direct file operations with SQLite APIs (e.g., sqlite3_open_v2(), sqlite3_close_v2()).
  • Validate Locking Mechanisms: Use PRAGMA locking_mode=EXCLUSIVE if feasible, though this reduces concurrency. For multi-process setups, ensure all processes use the same SQLite connection lifecycle management.

Step 2: Configure Checkpoint Behavior on Closure

  • Enable SQLITE_SBCONFIG_NO_CKPT_ON_CLOSE: This option skips the checkpoint during connection closure, preventing WAL truncation while other processes are active. Configure it via sqlite3_db_config(db, SQLITE_DBCONFIG_NO_CKPT_ON_CLOSE, 1, 0).
  • Manual Checkpoint Scheduling: If checkpoints are necessary, execute them during application idle periods using PRAGMA wal_checkpoint(TRUNCATE) or sqlite3_wal_checkpoint_v2(), ensuring no concurrent writes are ongoing.

Step 3: Filesystem and Compile-Time Option Validation

  • Disable SQLITE_ENABLE_BATCH_ATOMIC_WRITE on F2FS: This option optimizes write operations but may conflict with WAL’s append-only semantics on certain filesystems. Test with this option disabled.
  • Verify Filesystem Coherency: Use PRAGMA quick_check to detect low-level corruption. Consider switching to a filesystem with stricter POSIX compliance (e.g., ext4) for comparison.

Step 4: Implement Process Synchronization

  • Cross-Process Notification: Use IPC mechanisms (e.g., named pipes, signals) to notify other processes before closing a database connection. This allows active processes to finalize transactions and release WAL resources.
  • Retry Logic for Busy Conditions: Wrap write operations in retry loops with sqlite3_busy_handler() or sqlite3_busy_timeout() to handle transient lock conflicts during checkpoint/close sequences.

Step 5: Audit Connection Handling Across Forks

  • Isolate Connections Post-fork(): Database connections must not be shared across processes created via fork(). Ensure each child process opens its own connection after forking.

Step 6: Enable Debugging and Diagnostics

  • Log WAL Index State: Instrument the application to log sqlite3_wal_checkpoint() results and WAL file sizes before/after close operations.
  • Trace File Descriptor Lifecycles: Use tools like strace or lsof to monitor which processes hold database file descriptors and locks.

By systematically addressing file descriptor management, checkpoint timing, and filesystem interactions, the corruption scenario can be resolved. The key is to align SQLite’s WAL mechanics with the application’s concurrency model while respecting POSIX locking semantics.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *