Recovering Data from Generated Columns in SQLite Ends Up in Lost_and_Found Table

Issue Overview: Recovery Process Fails to Reconstruct Generated Columns

When attempting to recover a SQLite database containing tables with generated columns using the .recover command, the reconstructed data appears in a lost_and_found table rather than restoring the original table structure with computed values. This occurs because generated columns (both VIRTUAL and STORED types) present unique challenges during database recovery operations. The core issue stems from SQLite’s recovery mechanism treating generated columns as regular persistent data columns rather than understanding their computational nature.

In the demonstrated scenario, the t1 table contains:

  • Two persistent columns (a, b, c)
  • Two generated columns:
    • d: Virtual column calculating a*abs(b)
    • e: Stored column computing substr(c,b,b+1)

After recovery attempts, the database engine creates a lost_and_found table containing raw stored values from generated columns but fails to:

  1. Rebuild the original table schema with column generation logic
  2. Recalculate virtual column values during data insertion
  3. Preserve the relationship between base columns and their generated counterparts

This results in permanent data loss for virtual columns and fragmented recovery of stored columns without their computational context. The recovery process essentially converts generated columns into static data fields, stripping away their defining expressions and generation types.

Possible Causes: Schema Disconnect in Recovery Workflow

1. Metadata Decoupling During Recovery

SQLite’s .recover command operates at the storage layer rather than the schema layer. While it successfully extracts raw database page contents, it cannot:

  • Reconstruct column generation expressions
  • Differentiate between stored and virtual generated columns
  • Preserve constraints or computed relationships

The recovery process treats all columns as regular data columns, ignoring the GENERATED ALWAYS AS clauses in the original table definition. This metadata-storage decoupling is particularly problematic for generated columns because their values exist in different states:

  • Virtual columns: Never stored (computed on-the-fly)
  • Stored columns: Persisted but dependent on other columns

2. Page-Level Data Extraction Limitations

The recovery mechanism works by scanning database pages for recognizable data patterns. For generated columns:

  • Virtual columns: Leave no persistent storage footprint, making them unrecoverable through page scanning
  • Stored columns: Contain persisted values but without their generation context

In the sample recovery output:

INSERT INTO "lost_and_found" VALUES(2, 2, 4, 1, NULL, 2, 'Hello SQLite3', 'ell')

The NULL value represents the missing virtual column (d), while the stored column (e) retains its last computed value (‘ell’) but loses the underlying substr() expression.

3. Insertion Logic Shortcomings

The recovery script generates standard INSERT statements that:

  • Attempt to insert raw values directly
  • Cannot execute expressions during data restoration
  • Lack context about column generation rules

This results in two critical failures:

  1. Virtual columns receive NULL values (as they require computation during insertion)
  2. Stored columns retain their last computed value but lose regeneration capability

Troubleshooting Steps: Manual Schema Reconstruction with Value Regeneration

1. Full Schema Extraction and Analysis

Recover the original table definition from available sources:

CREATE TABLE t1(
  a INTEGER PRIMARY KEY,
  b INT,
  c TEXT,
  d INT GENERATED ALWAYS AS (a*abs(b)) VIRTUAL,
  e TEXT GENERATED ALWAYS AS (substr(c,b,b+1)) STORED
);

Preserve this schema separately before attempting any recovery operations. For databases where the original schema is unavailable:

Step 1.1: Extract schema fragments from WAL files

sqlite3 damaged.db .schema 2>/dev/null > recovered_schema.sql

Step 1.2: Search backup files for CREATE TABLE statements

grep -i 'CREATE TABLE t1' *.bak *.sql

2. Controlled Data Migration Process

Step 2.1: Create new table with original generated columns

CREATE TABLE recovered_t1(
  a INTEGER PRIMARY KEY,
  b INT,
  c TEXT,
  d INT GENERATED ALWAYS AS (a*abs(b)) VIRTUAL,
  e TEXT GENERATED ALWAYS AS (substr(c,b,b+1)) STORED
);

Step 2.2: Import base columns from lost_and_found

INSERT INTO recovered_t1(a, b, c) 
SELECT c1, c2, c3 FROM lost_and_found;

Step 2.3: Regenerate stored columns

-- For SQLite 3.35.5+
UPDATE recovered_t1 SET e = substr(c,b,b+1);

3. Virtual Column Value Restoration

Since virtual columns leave no persistent data trace, recompute values using the original generation formula:

Step 3.1: Add temporary column for computation

ALTER TABLE recovered_t1 ADD COLUMN d_computed INT;

Step 3.2: Populate using generation logic

UPDATE recovered_t1 SET d_computed = a * abs(b);

Step 3.3: Verify against recovered data (if available)

SELECT d_computed, lost_and_found.c0 
FROM recovered_t1 
JOIN lost_and_found ON recovered_t1.a = lost_and_found.id;

4. Stored Column Integrity Validation

Ensure stored column values match their original generation logic:

Step 4.1: Compute expected values

SELECT e, substr(c,b,b+1) AS expected_e 
FROM recovered_t1;

Step 4.2: Identify discrepancies

SELECT * FROM (
  SELECT e, substr(c,b,b+1) AS expected_e 
  FROM recovered_t1
) WHERE e != expected_e;

5. Prevention Strategies for Future Recovery

Strategy 5.1: Regular schema exports

sqlite3 production.db .schema > schema_backup_$(date +%s).sql

Strategy 5.2: Custom dump format preserving generated columns

sqlite3 production.db <<EOF
.output data_dump.sql
.mode insert
SELECT * FROM t1;
EOF

Strategy 5.3: Use triggers as persistent alternatives

-- For virtual column simulation
CREATE TRIGGER t1_d_calc 
AFTER INSERT ON t1 
BEGIN
  UPDATE t1 SET d = NEW.a * abs(NEW.b) WHERE a = NEW.a;
END;

6. Advanced Recovery Using Write-Ahead Log (WAL)

Step 6.1: Locate WAL file

ls -l /path/to/database/-wal

Step 6.2: Attempt WAL-based recovery

sqlite3 recovered.db 
PRAGMA journal_mode = DELETE;
ATTACH 'damaged.db' AS damaged;
ATTACH './gencol2.db-wal' AS wal;

Step 6.3: Extract committed transactions

INSERT INTO main.recovered_t1 
SELECT * FROM damaged.t1 
WHERE rowid IN (
  SELECT rowid FROM wal.t1
);

7. Binary Recovery Techniques

For severely damaged databases:

Step 7.1: Extract raw pages

dd if=gencol2.db of=page2.raw bs=4096 count=1 skip=1

Step 7.2: Analyze page structure

sqlite3_analyzer page2.raw

Step 7.3: Manual record reconstruction

import struct
with open('page2.raw', 'rb') as f:
    header = f.read(100)
    # Parse SQLite page header format
    page_type, first_freeblock, cell_count = struct.unpack('>BHH', header[0:5])

8. SQLite Internals Considerations

Understanding how SQLite stores generated columns:

Column TypeStorage LocationRecovery Potential
VIRTUALNot storedRequires recompute
STOREDInternal B-treeRaw value recovery

Key implementation details:

  • Virtual columns use no storage space (computed at SELECT time)
  • Stored columns occupy physical storage like regular columns
  • Both types store generation expressions in sqlite_schema table

9. Alternative Recovery Pathways

Path 9.1: Use PRAGMA integrity_check before recovery

PRAGMA integrity_check;

Path 9.2: Leverage sqlite3_dbconfig defensive options

sqlite3_db_config(db, SQLITE_DBCONFIG_DEFENSIVE, 1, 0);

Path 9.3: Employ FTS5 shadow tables for auxiliary data

CREATE VIRTUAL TABLE t1_fts USING fts5(a, b, c);

10. Post-Recovery Validation Framework

Check 10.1: Schema consistency verification

SELECT name, sql FROM sqlite_schema 
WHERE type='table' AND name='t1';

Check 10.2: Data integrity assessment

SELECT count(*) FROM t1 
WHERE d != a * abs(b) 
   OR e != substr(c, b, b+1);

Check 10.3: Index consistency checks

PRAGMA quick_check;
PRAGMA foreign_key_check;

This comprehensive approach addresses both immediate data recovery needs and implements safeguards against future generated column recovery failures. The techniques combine SQLite’s native capabilities with low-level data manipulation strategies, ensuring maximum recoverability while maintaining the computational integrity of generated columns.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *