Read-Only SQLite Connections and Database Copy Safety: Transaction Risks Explained

Understanding Read-Only Connections and Database Copy Safety During Transactions

Issue Overview: The Intersection of Read-Only Transactions and File State Consistency

SQLite databases are widely used for their portability and reliability, but certain operational nuances can lead to data corruption if misunderstood. A critical question arises when combining read-only connections with file-copy operations: Does an active transaction on a read-only connection leave the database file in a "not-safe-to-copy" state?

Core Concepts: Safe-to-Copy States and Transaction Mechanics

A SQLite database enters a "safe-to-copy" state when:

  1. No transactions are active.
  2. The write-ahead log (WAL) is fully checkpointed into the main database file.
  3. No process holds exclusive locks (e.g., during writes).

The SQLite documentation explicitly warns that copying a database during an active transaction risks corruption. This applies to write transactions by default, as they modify the database or WAL file. However, the behavior of read-only transactions (e.g., SELECT queries wrapped in BEGIN...COMMIT) is less clear.

The Read-Only Connection Paradox

A read-only connection opened via mode=ro URI parameter or SQLITE_OPEN_READONLY flag cannot modify the database. However, SQLite’s locking and transaction system operates independently of file permissions:

  • Filesystem-level read-only: The OS prevents writes to the file. SQLite may still attempt to create/shared locks.
  • SQLite mode=ro: The engine enforces read-only behavior at the application layer, bypassing filesystem restrictions.

Key observations from the original discussion:

  1. Read-only connections do not alter the database file’s modification timestamp during transactions.
  2. Despite this, the documentation does not explicitly exempt read-only transactions from creating "not-safe-to-copy" states.

Hidden Risks with WAL Mode

In WAL mode, even read-only transactions interact with auxiliary files (-wal and -shm). A read-only connection may:

  • Read uncheckpointed data from the WAL file.
  • Hold shared locks to maintain transaction isolation.
  • Prevent checkpointing if other connections are active.

Copying only the main database file while ignoring WAL/shared-memory files in this state creates an inconsistent snapshot. This risk persists even if the connection is read-only.


Possible Causes: Why Read-Only Transactions Might Compromise Copy Safety

1. Lock Acquisition and File Descriptor Behavior

SQLite uses a locking protocol to manage concurrency. Read-only connections acquire shared locks during transactions:

  • Shared lock (SQLITE_LOCK_SHARED): Allows multiple readers to coexist.
  • Reserved lock (SQLITE_LOCK_RESERVED): Held by writers preparing to commit.

When a read-only transaction starts:

  • The connection obtains a shared lock on the database file.
  • In WAL mode, it may access the WAL index in the -shm file.
  • The OS retains an open file descriptor to the database, WAL, and SHM files.

Implication: A file copy operation (e.g., cp, rsync) executed while these locks/file descriptors are active may capture partially written pages or orphaned WAL entries.

2. Filesystem Permissions vs. SQLite’s Internal Logic

A common misconception is that filesystem-level read-only permissions eliminate copy risks. This is false because:

  • WAL/shared-memory files are separate: Even if the main database file is read-only, SQLite may write to -wal/-shm files if they reside in a writable directory.
  • Locking is independent of file modification: Locks are managed via the filesystem’s advisory locking mechanism, not file content changes.

Example Scenario:

  • Database app.db is marked read-only at the OS level.
  • A read-only connection opens app.db in WAL mode.
  • The connection starts a transaction, reading data from app.db-wal.
  • Another user copies app.db but not app.db-wal.
  • The copied database is missing uncheckpointed data, rendering it corrupt.

3. Transaction Isolation and Snapshot Validity

SQLite’s read transactions operate under "repeatable read" isolation by default. To achieve this:

  • The connection maintains a snapshot of the database state at the transaction’s start.
  • In WAL mode, this snapshot relies on the WAL file’s contents.

Critical Insight: The snapshot is logical, not physical. Copying the database file does not replicate the transaction’s snapshot context. If the original database’s WAL is modified post-copy, the copied file becomes inconsistent.


Resolving the Dilemma: Safe Copy Strategies and Mitigation Techniques

Step 1: Use the SQLite Online Backup API

The sqlite3_backup_* API is the only method guaranteed to produce a consistent copy, regardless of connection type. It works by:

  1. Reading database pages directly from the source connection.
  2. Bypassing filesystem-level locks and WAL complexities.
  3. Operating incrementally to avoid blocking other connections.

Implementation Example (C API):

sqlite3 *src_db;  
sqlite3_open_v2("file:app.db?mode=ro", &src_db, SQLITE_OPEN_READONLY, NULL);  
sqlite3 *dst_db;  
sqlite3_open("backup.db", &dst_db);  

sqlite3_backup *backup = sqlite3_backup_init(dst_db, "main", src_db, "main");  
if (backup) {  
    sqlite3_backup_step(backup, -1); // Copy all pages  
    sqlite3_backup_finish(backup);  
}  

Step 2: Enforce Checkpointing Before Manual Copies

If the backup API is unavailable (e.g., in shell scripts), force a WAL checkpoint:

-- From a read-write connection:  
PRAGMA wal_checkpoint(FULL);  

Post-Checkpoint Verification:

  • Confirm app.db-wal is 0 bytes.
  • Ensure no transactions are active (check sqlite3_db_status() or PRAGMA lock_status).

Step 3: Validate Filesystem-Level Copy Procedures

When manual copying is unavoidable:

  1. Include all auxiliary files:
    cp app.db app.db-wal app.db-shm backup/  
    
  2. Use atomic filesystem operations:
    sqlite3 app.db ".backup backup.db"  
    
  3. Leverage VACUUM INTO for self-contained copies:
    VACUUM INTO 'backup.db';  
    

Step 4: Monitor Active Transactions and Locks

Implement pre-copy checks to avoid unsafe states:

  • Query sqlite3_temp_directory: Ensure no transient objects exist.
  • Check PRAGMA lock_status:
    -- Returns held locks for all attached databases  
    PRAGMA lock_status;  
    
  • Audit open file descriptors (Linux):
    lsof +D /path/to/database  
    

Step 5: Test with Long-Running Simulations

Replicate the original user’s testing strategy:

  1. Spawn multiple read-only connections executing SELECT transactions.
  2. Run concurrent copy processes using cp, rsync, and backup APIs.
  3. Validate copied databases via integrity checks:
    PRAGMA quick_check;  
    

Sample Test Framework (Python):

import sqlite3  
import threading  
import shutil  

def run_read_transaction(db_path):  
    conn = sqlite3.connect(f'file:{db_path}?mode=ro', uri=True)  
    cursor = conn.cursor()  
    cursor.execute("BEGIN")  
    cursor.execute("SELECT COUNT(*) FROM some_table")  
    cursor.fetchall()  
    cursor.execute("COMMIT")  
    conn.close()  

def copy_db(src, dst):  
    shutil.copy(src, dst)  
    # Verify integrity  
    verify_conn = sqlite3.connect(dst)  
    verify_conn.execute("PRAGMA quick_check").fetchone()  
    verify_conn.close()  

# Repeat these in parallel threads  

Final Recommendations for Developers

  1. Default to the backup API for all production-critical copies.
  2. Document user responsibilities when allowing manual file operations.
  3. Prefer VACUUM INTO over filesystem copies for simplicity.
  4. Monitor SQLite’s error logs for SQLITE_CORRUPT or SQLITE_READONLY warnings.

By addressing the interplay of transactions, locking, and file states, developers can mitigate corruption risks while accommodating legitimate use cases for read-only connections.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *