SQLite .backup vs .clone: Corruption Recovery Differences
Core Functional Differences Between SQLite .backup and .clone Commands
The SQLite command-line shell (CLI) provides two commands, .backup
and .clone
, for creating copies of databases. While both commands appear similar in their objective—to duplicate a database—their underlying mechanisms, use cases, and outcomes differ significantly. The critical distinction lies in how they handle database corruption and the strategies they employ to read data. Misunderstanding these differences can lead to incomplete backups, failed recovery attempts, or unintended data loss. This guide explores the technical nuances of .backup
and .clone
, their failure modes, and best practices for leveraging them effectively.
Mechanisms of Data Duplication: Backup API vs. Corruption Recovery Strategies
1. The SQLite Backup API and .backup Command
The .backup
command relies on the SQLite Backup API, a robust and transactional method for creating live backups. This API operates by iterating through the database pages in a structured manner, ensuring that the backup reflects a consistent snapshot of the database at the moment the backup begins. The Backup API locks the database in WAL
(Write-Ahead Logging) mode or uses a shared lock in ROLLBACK
mode, ensuring no writes are lost during the backup process.
For a healthy database, .backup
is the preferred method due to its efficiency and atomicity. However, if the source database is corrupted, the Backup API will fail because it cannot read damaged pages or invalid structures. The API prioritizes data integrity over recovery, meaning it will abort the backup upon encountering corruption.
2. The .clone Command and Its Recovery-Oriented Design
The .clone
command is designed explicitly for data recovery scenarios. Instead of using the Backup API, .clone
employs a series of heuristics and low-level data extraction techniques to salvage as much data as possible from a corrupted database. These strategies include:
- Reading database pages directly, bypassing SQLite’s consistency checks.
- Reconstructing records from fragmented or partially overwritten pages.
- Ignoring schema validation errors to extract raw table data.
This approach makes .clone
more resilient to corruption but less reliable for producing a fully consistent database. The cloned database may contain orphaned records, invalid indexes, or incomplete transactions. However, it is invaluable for extracting data from a damaged database that would otherwise be inaccessible.
3. Documentation Ambiguity and Source Code Insights
A key point of confusion arises from the lack of explicit documentation for .clone
’s recovery behavior. The official CLI documentation describes .clone
tersely as "Clone data into NEWDB from the existing database," with no mention of corruption handling. In contrast, the .backup
command is well-documented as using the Backup API.
The recovery logic of .clone
is buried in SQLite’s source code, specifically in the tryToCloneData()
function within shell.c.in
. This function includes comments indicating its intent to "recover as much content as possible" from a corrupted database. Developers relying solely on CLI documentation may overlook this critical distinction, leading to misuse of the commands.
Scenarios Leading to Divergent Outcomes: Corruption, API Limitations, and User Misconceptions
1. Database Corruption and Inaccessible Pages
The most common scenario where .backup
and .clone
produce different results is when the source database has physical corruption, such as:
- Invalid page headers or checksums.
- Unreadable sectors due to storage media errors.
- Partially written pages from abrupt shutdowns.
In such cases, .backup
will terminate with an error (e.g., SQLITE_CORRUPT
), while .clone
will proceed, extracting readable data and skipping corrupted regions.
2. Schema Validation Failures
SQLite’s Backup API validates the database schema before initiating a backup. If the sqlite_master
table (which stores schema metadata) is corrupted, .backup
will fail immediately. .clone
, however, bypasses schema validation, allowing it to recover table data even if the schema is partially lost.
3. User Misconceptions About Command Behavior
Many users assume .clone
is merely a synonym for .backup
with different syntax. This misconception leads to:
- Using
.clone
for routine backups, unaware that it may produce inconsistent copies. - Relying on
.backup
for disaster recovery, only to find it fails on corrupted databases.
Resolving Backup Failures and Maximizing Data Recovery Success
1. Diagnosing Database Corruption
Before choosing between .backup
and .clone
, verify the integrity of the source database:
PRAGMA integrity_check;
If this returns errors, the database is corrupt. Proceed with .clone
for recovery.
2. Using .clone for Corrupted Databases
To recover data from a corrupted database:
.clone recovered.db
After cloning, inspect recovered.db
for completeness:
- Run
PRAGMA integrity_check;
to identify remaining issues. - Use
.dump
to export SQL statements, which may reveal missing tables or indexes.
3. Combining .clone with .recover for Maximum Salvage
For severe corruption, pair .clone
with the .recover
command, which reconstructs the database by scanning all pages:
.recover > dump.sql
.read dump.sql
This generates a SQL script of recoverable data, which can be imported into a new database.
4. Routine Backups with .backup
For healthy databases, use .backup
to ensure transactional consistency:
.backup backup.db
Schedule regular backups and validate them periodically using PRAGMA integrity_check;
.
5. Consulting Source Code for Advanced Scenarios
When documentation is insufficient, refer to SQLite’s source code to understand command behavior:
- The
tryToCloneData()
function inshell.c.in
reveals.clone
’s recovery logic. - The Backup API implementation in
backup.c
clarifies.backup
’s transactional guarantees.
6. Mitigating Corruption Risks
Prevent corruption by:
- Avoiding unsafe shutdowns during writes.
- Using battery-backed storage or UPS systems.
- Enabling SQLite’s
PRAGMA journal_mode=WAL;
to reduce contention.
By understanding the distinct roles of .backup
and .clone
, developers can choose the right tool for data duplication, ensuring both reliability in routine backups and resilience in disaster recovery.