Resolving SQLITE_LOCKED Errors After Checkpoint and VACUUM Operations in Multi-Database WAL Environments
Understanding the Contention Between WAL Checkpointing, VACUUM, and Connection Pooling
Issue Overview
The SQLITE_LOCKED error arises when a database connection attempts to acquire a lock on a database file that is already held by another process or connection in a conflicting mode. In this scenario, the error occurs during the seeding phase (Step 10) after a sequence of operations involving WAL checkpoints, VACUUM commands, and connection pool management across three attached databases (main
, config
, and raw
). The critical observations are:
- WAL Mode Configuration: The databases use Write-Ahead Logging (WAL) with
journal_mode = WAL
andsynchronous = NORMAL
. This allows concurrent reads and writes but introduces complexities during checkpointing. - Checkpoint and VACUUM Sequence: After migrations, the application explicitly triggers
wal_checkpoint(TRUNCATE)
on all databases and runsVACUUM
onmain
andconfig
. - Connection Pool Dynamics: A connection pool of size 10 is initialized, with connections acquired and released during migrations, checkpoints, and vacuum operations.
- Locking Hierarchy: The
POOL lock
is held during pool initialization, migrations, checkpoints, and vacuum, but released before seeding.
The core conflict lies in how SQLite manages locks during maintenance operations (checkpointing and vacuuming) and how connections are reused or retained within the pool. The VACUUM
command rebuilds the entire database file, requiring an exclusive lock. Meanwhile, checkpointing in WAL mode involves transferring data from the WAL file to the main database. If connections from the pool retain references to outdated WAL states or fail to relinquish locks properly, subsequent operations (e.g., seeding) will contend for access.
Diagnosing Lock Retention During Maintenance and Pool Recycling
Possible Causes
Unfinalized Prepared Statements:
SQLite connections retain locks if prepared statements are not finalized. During migrations, checkpoints, or vacuuming, if a statement handle (e.g., fromsql_query
) is not explicitly closed, the connection may retain a lock even after the operation appears complete. This is exacerbated in pooled environments where connections are reused.VACUUM Locking Semantics:
TheVACUUM
command requires an exclusive lock on the entire database. If any connection (even from the same process) holds a read lock or has an open transaction,VACUUM
will fail or block until locks are released. Thebusy_timeout
setting (30 seconds here) may delay but not resolve contention if locks persist.Checkpoint Aggressiveness:
Usingwal_checkpoint(TRUNCATE)
truncates the WAL file after checkpointing. However, if other connections are in read mode, the checkpoint may not fully complete, leaving the WAL file partially untruncated. Subsequent operations may encounter unexpected WAL state.Connection Pool Contamination:
After runningVACUUM
, the database file is rewritten, invalidating any existing page caches in other connections. If the pool reuses connections that have cached pages from the pre-VACUUM state, those connections may hold references to obsolete data structures, leading to lock conflicts.Transaction Boundaries:
Implicit transactions (e.g., auto-commit mode) may not be properly closed after migrations or checkpoints. SQLite’s locking hierarchy (UNLOCKED → SHARED → RESERVED → EXCLUSIVE) requires that all read locks are released before a write lock can be acquired. Stray transactions can retain locks across operations.Attached Database Lock Coordination:
When multiple databases are attached (main
,config
,raw
), operations on one database may implicitly lock others if transactions span multiple databases. For example, aVACUUM
onmain
could trigger locks onconfig
if foreign keys or shared cache modes are involved.
Resolving Lock Contention Through Connection Hygiene and WAL Management
Troubleshooting Steps, Solutions & Fixes
1. Finalize All Prepared Statements Explicitly
- Problem: ORM frameworks or raw SQL execution may leave prepared statements unresolved, retaining locks.
- Solution: Audit the migration and vacuum code to ensure all statement handles are finalized. In Rust, using
diesel::sql_query
requires explicit cleanup. Wrap operations in blocks that enforce statement finalization:pub fn checkpoint(conn: &mut SqliteConnection, db: &str) -> Result<(), MigrationError> { let query = format!("PRAGMA {}.wal_checkpoint(TRUNCATE);", db); let statement = conn.prepare(&query)?; statement.execute()?; // Explicitly finalize statement.finalize()?; Ok(()) }
- Verification: Enable SQLite’s
SQLITE_DEBUG
logging to track unfinalized statements.
2. Isolate VACUUM Operations on Dedicated Connections
- Problem:
VACUUM
rewrites the database file, invalidating existing connections’ page caches. - Solution: Run
VACUUM
on a standalone connection (not from the pool) and close it afterward. This prevents pool connections from referencing obsolete database states:pub fn vacuum(db_path: &str, db: &str) -> Result<(), MigrationError> { let mut dedicated_conn = SqliteConnection::establish(db_path)?; dedicated_conn.execute(&format!("VACUUM {}", db))?; // Explicitly close the connection drop(dedicated_conn); Ok(()) }
- Verification: After vacuuming, check
sqlite3_db_status
for active connections to the database.
3. Stagger Checkpointing and VACUUM Execution
- Problem: Concurrent checkpoints across attached databases may create transient locks.
- Solution: Checkpoint databases sequentially and verify completion using
PRAGMA wal_checkpoint(PASSIVE)
to ensure no active readers are blocking truncation:let (log, ckpt) = conn.query_row( &format!("PRAGMA {}.wal_checkpoint(PASSIVE);", db), [], |row| Ok((row.get::<_, i32>(0)?, row.get::<_, i32>(1)?)) )?; if log == 0 && ckpt == 0 { // Safe to truncate conn.execute(&format!("PRAGMA {}.wal_checkpoint(TRUNCATE);", db))?; }
- Verification: Monitor the
sqlite3_wal_checkpoint
return codes to confirm full checkpoints.
4. Reset Pool Connections After Maintenance
- Problem: Pooled connections may retain outdated page caches or locks post-VACUUM.
- Solution: After vacuuming, reset or recycle all pool connections. For example, using
r2d2
in Rust:pool.reset()?; // Reinitializes all connections
- Verification: Check connection statuses using
PRAGMA schema_version
to ensure they reflect post-VACUUM state.
5. Enforce Synchronization Between Attached Databases
- Problem: Locks on one attached database may propagate to others.
- Solution: Execute
VACUUM
and checkpoints within transactions that span all attached databases, ensuring atomic lock acquisition:conn.execute("BEGIN IMMEDIATE;")?; vacuum(&mut conn, "main")?; vacuum(&mut conn, "config")?; conn.execute("COMMIT;")?;
- Verification: Use
PRAGMA lock_status
to confirm no attached databases retain locks.
6. Audit Foreign Key and Trigger Interactions
- Problem: Foreign key constraints (
foreign_keys = ON
) may trigger implicit transactions across databases. - Solution: Disable foreign keys during maintenance operations if feasible:
conn.execute("PRAGMA foreign_keys = OFF;")?; vacuum(&mut conn, "main")?; conn.execute("PRAGMA foreign_keys = ON;")?;
- Verification: Test the workflow with foreign keys disabled to isolate constraint-related locks.
7. Monitor SQLite’s Locking Hierarchy with Debug Tools
- Problem: Ambiguity in which operation holds the lock.
- Solution: Compile SQLite with
-DSQLITE_DEBUG
and usesqlite3_db_status()
to track lock states. Alternatively, usesqlite3_unlock_notify
to debug contention. - Verification: Log lock transitions during migrations and seeding to identify the exact contention point.
8. Adjust WAL Configuration for High-Concurrency Workloads
- Problem: Large
journal_size_limit
(1GB) may delay checkpointing, increasing lock retention time. - Solution: Reduce
journal_size_limit
to force more frequent checkpoints:PRAGMA main.journal_size_limit = 67108864; -- 64MB
- Verification: Monitor WAL file size and checkpoint frequency during migrations.
Final Recommendation:
Isolate VACUUM
operations on dedicated connections, enforce statement finalization, and reset the connection pool after maintenance. This eliminates cache invalidation and lock retention issues. Additionally, verify that all attached databases release locks using PRAGMA lock_status
and stagger checkpointing to avoid cross-database contention.