SQLite .backup vs .clone: Corruption Recovery Differences


Core Functional Differences Between SQLite .backup and .clone Commands

The SQLite command-line shell (CLI) provides two commands, .backup and .clone, for creating copies of databases. While both commands appear similar in their objective—to duplicate a database—their underlying mechanisms, use cases, and outcomes differ significantly. The critical distinction lies in how they handle database corruption and the strategies they employ to read data. Misunderstanding these differences can lead to incomplete backups, failed recovery attempts, or unintended data loss. This guide explores the technical nuances of .backup and .clone, their failure modes, and best practices for leveraging them effectively.


Mechanisms of Data Duplication: Backup API vs. Corruption Recovery Strategies

1. The SQLite Backup API and .backup Command

The .backup command relies on the SQLite Backup API, a robust and transactional method for creating live backups. This API operates by iterating through the database pages in a structured manner, ensuring that the backup reflects a consistent snapshot of the database at the moment the backup begins. The Backup API locks the database in WAL (Write-Ahead Logging) mode or uses a shared lock in ROLLBACK mode, ensuring no writes are lost during the backup process.

For a healthy database, .backup is the preferred method due to its efficiency and atomicity. However, if the source database is corrupted, the Backup API will fail because it cannot read damaged pages or invalid structures. The API prioritizes data integrity over recovery, meaning it will abort the backup upon encountering corruption.

2. The .clone Command and Its Recovery-Oriented Design

The .clone command is designed explicitly for data recovery scenarios. Instead of using the Backup API, .clone employs a series of heuristics and low-level data extraction techniques to salvage as much data as possible from a corrupted database. These strategies include:

  • Reading database pages directly, bypassing SQLite’s consistency checks.
  • Reconstructing records from fragmented or partially overwritten pages.
  • Ignoring schema validation errors to extract raw table data.

This approach makes .clone more resilient to corruption but less reliable for producing a fully consistent database. The cloned database may contain orphaned records, invalid indexes, or incomplete transactions. However, it is invaluable for extracting data from a damaged database that would otherwise be inaccessible.

3. Documentation Ambiguity and Source Code Insights

A key point of confusion arises from the lack of explicit documentation for .clone’s recovery behavior. The official CLI documentation describes .clone tersely as "Clone data into NEWDB from the existing database," with no mention of corruption handling. In contrast, the .backup command is well-documented as using the Backup API.

The recovery logic of .clone is buried in SQLite’s source code, specifically in the tryToCloneData() function within shell.c.in. This function includes comments indicating its intent to "recover as much content as possible" from a corrupted database. Developers relying solely on CLI documentation may overlook this critical distinction, leading to misuse of the commands.


Scenarios Leading to Divergent Outcomes: Corruption, API Limitations, and User Misconceptions

1. Database Corruption and Inaccessible Pages

The most common scenario where .backup and .clone produce different results is when the source database has physical corruption, such as:

  • Invalid page headers or checksums.
  • Unreadable sectors due to storage media errors.
  • Partially written pages from abrupt shutdowns.

In such cases, .backup will terminate with an error (e.g., SQLITE_CORRUPT), while .clone will proceed, extracting readable data and skipping corrupted regions.

2. Schema Validation Failures

SQLite’s Backup API validates the database schema before initiating a backup. If the sqlite_master table (which stores schema metadata) is corrupted, .backup will fail immediately. .clone, however, bypasses schema validation, allowing it to recover table data even if the schema is partially lost.

3. User Misconceptions About Command Behavior

Many users assume .clone is merely a synonym for .backup with different syntax. This misconception leads to:

  • Using .clone for routine backups, unaware that it may produce inconsistent copies.
  • Relying on .backup for disaster recovery, only to find it fails on corrupted databases.

Resolving Backup Failures and Maximizing Data Recovery Success

1. Diagnosing Database Corruption

Before choosing between .backup and .clone, verify the integrity of the source database:

PRAGMA integrity_check;

If this returns errors, the database is corrupt. Proceed with .clone for recovery.

2. Using .clone for Corrupted Databases

To recover data from a corrupted database:

.clone recovered.db

After cloning, inspect recovered.db for completeness:

  • Run PRAGMA integrity_check; to identify remaining issues.
  • Use .dump to export SQL statements, which may reveal missing tables or indexes.

3. Combining .clone with .recover for Maximum Salvage

For severe corruption, pair .clone with the .recover command, which reconstructs the database by scanning all pages:

.recover > dump.sql
.read dump.sql

This generates a SQL script of recoverable data, which can be imported into a new database.

4. Routine Backups with .backup

For healthy databases, use .backup to ensure transactional consistency:

.backup backup.db

Schedule regular backups and validate them periodically using PRAGMA integrity_check;.

5. Consulting Source Code for Advanced Scenarios

When documentation is insufficient, refer to SQLite’s source code to understand command behavior:

  • The tryToCloneData() function in shell.c.in reveals .clone’s recovery logic.
  • The Backup API implementation in backup.c clarifies .backup’s transactional guarantees.

6. Mitigating Corruption Risks

Prevent corruption by:

  • Avoiding unsafe shutdowns during writes.
  • Using battery-backed storage or UPS systems.
  • Enabling SQLite’s PRAGMA journal_mode=WAL; to reduce contention.

By understanding the distinct roles of .backup and .clone, developers can choose the right tool for data duplication, ensuring both reliability in routine backups and resilience in disaster recovery.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *