File Descriptor Leak in SQLite Read-Only Fallback with WAL Mode
Issue Overview: File Descriptor Accumulation During Read-Only Fallback in WAL-Mode Databases
SQLite employs a file descriptor management strategy to maintain locking integrity when operating in Write-Ahead Logging (WAL) mode. In WAL mode, database connections hold shared locks via fcntl()
to coordinate concurrent access. These locks are tied to file descriptors (FDs), and SQLite retains FDs even after closing database connections if locks remain active. This mechanism prevents unintended lock release due to FD closure, ensuring transaction consistency.
The leak occurs under a specific combination of conditions:
- Persistent WAL-Mode Connections: Long-lived connections maintain shared locks, keeping FDs cached.
- Short-Lived Connections with Permissions Mismatch: Applications attempt to open the database in read-write (RW) mode but lack write permissions, forcing SQLite to fall back to read-only (RO) mode.
- FD Reuse Bypass in Fallback: SQLite skips FD reuse during RO fallback, creating new FDs instead of leveraging cached ones.
This creates a file descriptor leak when transient connections repeatedly open/close the database in RW mode (with insufficient permissions) while persistent connections hold locks. Each fallback to RO mode spawns new FDs, which accumulate over time. The issue is exacerbated in applications like PowerDNS (PDNS), where backend instances create/destroy connections frequently.
Possible Causes: Lock Retention, Fallback Logic, and FD Management
1. WAL-Mode Lock Retention and FD Caching
In WAL mode, SQLite retains a shared lock on the database file for the lifetime of the connection. This lock is implemented via fcntl(F_SETLK)
, which associates the lock with the process, not the FD. If an FD pointing to the database is closed, the lock is released even if another FD for the same file exists. To avoid accidental lock release, SQLite caches FDs when closing connections with active locks. These cached FDs are reused for subsequent connections to the same database if the open mode (RW/RO) matches.
Problem: The FD cache is not consulted when a connection falls back from RW to RO mode due to permission issues. This forces SQLite to open a new FD instead of reusing a cached one.
2. Read-Write Open Fallback to Read-Only
When an application attempts to open a database in RW mode without sufficient permissions, SQLite automatically retries in RO mode. However, this fallback logic does not check the FD cache for reusable descriptors. Instead, it opens a new FD for the RO connection. If the application frequently opens/closes connections in RW mode (without permissions), each fallback generates a new FD, leading to leakage.
3. Interaction Between Persistent and Transient Connections
Persistent connections (e.g., PDNS backend instances) maintain cached FDs to preserve locks. Transient connections that trigger RO fallback create new FDs, which are not added to the cache. Over time, the process accumulates unrecycled FDs, exhausting system limits.
Troubleshooting Steps, Solutions & Fixes: Diagnosing and Resolving FD Leaks
Step 1: Confirm FD Leak Symptoms
- Monitor FD Usage: Use OS tools (
lsof -p <PID>
,/proc/<PID>/fd/
) to track FD count for the process. A steady increase during RW open attempts (with fallback) confirms the leak. - Reproduce the Scenario:
- Open a persistent connection to a WAL-mode database.
- Repeatedly open/close connections in RW mode without write permissions.
- Observe FD count growth.
Step 2: Apply the Official SQLite Patch
The root cause—FD cache bypass during fallback—is addressed in SQLite commit a678e85402af08c1. The fix modifies unixOpen()
to check the FD cache even during RO fallback.
Implementation Details:
- The
openMode
parameter now prioritizes cache reuse when falling back to RO mode. - Cached FDs are validated for compatibility with the effective open mode (RO).
Integration:
- Rebuild SQLite with the patched code.
- Ensure applications link against the updated library.
Step 3: Adjust Application Logic
- Explicit Read-Only Opens: If RW access is unnecessary, open databases in RO mode (
SQLITE_OPEN_READONLY
) to bypass fallback logic. - Permissions Hardening: Ensure processes have write access before attempting RW opens.
Step 4: Journal Mode Considerations
- Switch to DELETE/TRUNCATE Journal Modes: These modes do not require persistent locks, eliminating FD caching. However, this sacrifices WAL’s concurrency benefits.
- Weigh Tradeoffs: Use WAL only if high concurrency is critical.
Step 5: Connection Pooling
- Reuse Connections: Minimize transient connections by pooling long-lived handles.
- Tune Pool Size: Balance concurrency needs against FD limits.
Step 6: System Configuration Adjustments
- Increase FD Limits: Temporarily mitigate leaks via
ulimit -n
(Unix) or registry edits (Windows). - Monitor with Automation: Deploy tools like
systemd
orcron
jobs to restart processes nearing FD limits.
This guide provides a comprehensive pathway to diagnose, resolve, and prevent FD leaks in SQLite WAL-mode deployments. By addressing cache behavior, application patterns, and system constraints, users can maintain stable database operations even under high concurrency.