Unexpected Database File Size Increase After VACUUM Following ALTER TABLE DROP COLUMN

Issue Overview: ALTER TABLE DROP COLUMN + VACUUM Fails to Reduce File Size

A recurring pattern emerges when developers attempt to reclaim storage space after schema modifications in SQLite. The specific scenario involves executing ALTER TABLE ... DROP COLUMN followed by VACUUM, expecting database file size reduction proportional to removed data. However, empirical observations reveal paradoxical behavior where file size either remains unchanged or increases. This contradicts conventional SQLite storage management expectations, as vacuuming normally rebuilds database files to eliminate fragmentation and free unused pages.

The phenomenon manifests most acutely when tables contain columns with REAL/NUMERIC type affinity storing integer values. Initial database configurations with columns declared as REAL (but storing whole numbers) exhibit suboptimal space reclamation after column removal and vacuuming. Dumping database contents via .dump and reimporting produces significantly smaller files compared to standard vacuum operations. This indicates hidden storage inefficiencies persist through schema changes and vacuuming but resolve during full data rehydration.

Key technical factors at play include:

  1. SQLite’s type affinity system and numeric value storage optimization
  2. Page-level storage mechanics during ALTER TABLE operations
  3. VACUUM’s rebuilding strategy versus .dump/.import serialization
  4. Column removal implementation details in SQLite’s ALTER TABLE subsystem

The storage anomaly arises from SQLite’s automatic value type optimization during insertion versus schema modification workflows. When columns store integer values in REAL-typed columns, SQLite initially uses compact integer encoding (0-8 byte variable storage). However, during ALTER TABLE DROP COLUMN + VACUUM operations, these values get re-encoded as 8-byte IEEE floats despite containing integral values. This type affinity mismatch creates storage bloat that persists through vacuuming but disappears during full database reconstruction via dump/reload.

Possible Causes: REAL Column Storage Degradation During Schema Modifications

Three primary factors contribute to unexpected file size expansion after DROP COLUMN + VACUUM sequences:

1. Type Affinity vs. Storage Class Mismatch

  • Columns declared with REAL affinity store values using SQLite’s dynamic typing system
  • INTEGER values inserted into REAL columns initially receive optimized storage (1-8 byte integers)
  • Schema modifications bypass value revalidation, preserving original storage classes
  • VACUUM rewrites pages without re-evaluating value storage optimization criteria

2. ALTER TABLE Implementation Constraints

  • SQLite’s ALTER TABLE DROP COLUMN creates new table with modified schema
  • Data copying occurs at storage class level rather than value re-evaluation
  • Original numeric storage formats persist through table reconstruction
  • No automatic conversion from integer-optimized storage to proper REAL encoding

3. VACUUM’s Page Rebuilding Limitations

  • VACUUM operates at page granularity rather than cell-level optimization
  • Existing value encodings transfer directly to new pages during rebuild
  • No type affinity rechecking during page compaction
  • REAL columns maintain oversized integer storage from original insertions

A critical implementation detail emerges: columns declared with REAL affinity that store integer values maintain their original compact integer storage format until modified. ALTER TABLE operations copy these values verbatim during schema changes, preserving their storage class. Subsequent VACUUM operations rebuild pages without reassessing whether values could be stored more efficiently given current schema constraints.

The storage degradation becomes measurable when dropping columns from tables containing:

  • REAL-typed columns storing integer values
  • Large datasets (10k+ rows)
  • Multiple numeric columns with mixed affinities
  • No subsequent UPDATE operations to trigger value re-encoding

In contrast, dumping and reloading forces complete value re-serialization. During .dump, all values output as SQL literals. Re-importing parses these literals afresh, allowing SQLite to apply current schema type affinities and optimize storage classes during insertion. This process eliminates legacy storage artifacts from previous schema versions.

Troubleshooting Steps, Solutions & Fixes: Resolving Storage Bloat Post Schema Changes

Diagnostic Protocol

  1. Verify Actual Schema Changes

    • Execute .schema <table> before/after ALTER TABLE
    • Confirm dropped column disappears from schema
    • Check for lingering indexes, triggers, or views referencing removed column
  2. Analyze Storage Efficiency

    • Compare PRAGMA page_count; before/after operations
    • Calculate storage per value with:
      SELECT 
        count(*) AS rows,
        sum(length(hex(<column>))/2) AS total_bytes,
        avg(length(hex(<column>))/2) AS bytes_per_value
      FROM <table>;
      
    • Identify columns with suboptimal byte-per-value ratios
  3. Profile Numeric Storage Formats

    • Detect REAL columns storing integers:
      SELECT 
        count(*) AS integral_values,
        count(*) * 100.0 / (SELECT count(*) FROM <table>) AS percent
      FROM <table>
      WHERE typeof(<column>) = 'integer';
      
    • High percentages indicate potential optimization opportunities

Corrective Measures

  1. Schema Affinity Adjustment

    • Change column affinity from REAL to NUMERIC/NONE:
      CREATE TABLE new_table(... <column> NUMERIC ...);
      INSERT INTO new_table SELECT * FROM old_table;
      DROP TABLE old_table;
      ALTER TABLE new_table RENAME TO old_table;
      
    • NUMERIC affinity allows flexible integer/float storage
    • Forces value re-evaluation during data migration
  2. Value Storage Refresh

    • Trigger storage re-optimization without changing data:
      UPDATE <table> SET <column> = <column>;
      VACUUM;
      
    • UPDATE forces value rewrite with current type affinity rules
    • Especially effective for REAL→NUMERIC converted columns
  3. Controlled Vacuum Sequencing

    • Perform interim VACUUM before column removal:
      VACUUM;
      ALTER TABLE ... DROP COLUMN ...;
      VACUUM;
      
    • Isolates storage effects of schema changes
    • Prevents page fragmentation accumulation
  4. Selective Dump/Reload

    • Export/import specific tables via:
      .mode insert <new_table>
      SELECT * FROM <old_table>;
      
    • More efficient than full database dump
    • Maintains transaction integrity for partial reloads

Preventative Best Practices

  1. Type Affinity Design Guidelines

    • Prefer NUMERIC over REAL for columns storing mixed integers/floats
    • Reserve REAL affinity for strictly floating-point data
    • Use INTEGER affinity for whole number storage
  2. Schema Change Workflow

    • Always VACUUM before structural modifications
    • Follow ALTER TABLE with:
      UPDATE <table> SET <modified_columns> = <modified_columns>;
      
    • Rebuild indexes after column removal:
      REINDEX;
      
  3. Storage Optimization Monitoring

    • Track page_count growth ratios after DML operations
    • Establish baseline storage efficiency metrics:
      CREATE TABLE storage_metrics AS
      SELECT 
        name AS table_name,
        sum(pgsize) AS total_bytes,
        count(*) AS pages_used
      FROM dbstat
      GROUP BY 1;
      
    • Compare metrics across schema versions

Advanced Techniques

  1. Incremental Vacuuming

    • Configure auto_vacuum=FULL|INCREMENTAL
    • Manage free page list dynamically:
      PRAGMA incremental_vacuum(<pages>);
      
    • Reduces single vacuum operation overhead
  2. Page Size Optimization

    • Align page size with storage patterns:
      PRAGMA page_size = 8192;
      VACUUM;
      
    • Larger pages improve storage efficiency for big BLOBs
    • Smaller pages benefit row-oriented access patterns
  3. WAL Mode Considerations

    • Checkpoint WAL before vacuuming:
      PRAGMA wal_checkpoint(TRUNCATE);
      
    • Prevents WAL file interference with size measurements
    • Ensures accurate post-VACUUM size calculations

Debugging Framework

Implement this verification sequence after schema changes:

  1. Pre-operation baseline:

    PRAGMA page_count;
    PRAGMA schema_version;
    SELECT md5sum(data) FROM sqlite_dbpage;
    
  2. Post-ALTER TABLE check:

    PRAGMA integrity_check;
    SELECT md5sum(data) FROM sqlite_dbpage;
    
  3. Post-VACUUM validation:

    ANALYZE;
    EXPLAIN QUERY PLAN SELECT * FROM <modified_table>;
    
  4. Storage efficiency audit:

    SELECT 
      name, 
      sum(payload) AS payload_bytes,
      sum(pgsize) AS total_bytes,
      sum(pgsize - payload - unused) AS overhead_bytes
    FROM dbstat
    GROUP BY name;
    

This comprehensive approach addresses both the immediate storage anomaly and establishes preventative guardrails against similar issues. The core resolution involves recognizing SQLite’s type affinity storage optimizations require explicit value rewrites when modifying schemas containing columns with suboptimal affinity declarations. By combining schema design best practices with strategic data refresh operations, developers maintain storage efficiency through database evolution cycles.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *