Missing Rows on Integer PK Queries Due to Corrupted Index

Primary Key Index Mismatch and Query Plan Inconsistency in SQLite

Issue Overview: Unexpected Missing Rows When Filtering by Integer Primary Key

The core issue involves a scenario where rows present in a SQLite table are not returned when queried via an equality filter (WHERE PK = <value>) that leverages a user-created unique index. However, the same rows are returned when using a range-based query (BETWEEN) that directly accesses the primary key. The table in question uses an INTEGER primary key column (tijdstip), declared with WITHOUT ROWID, and includes a redundant unique index on the same column.

Key observations from the problem description:

  1. Schema Design:

    • The standen table uses tijdstip (epoch timestamp) as its primary key.
    • The table is declared WITHOUT ROWID, forcing the primary key to act as the physical storage key.
    • A redundant unique index (standen_pk) is explicitly created on tijdstip, despite SQLite automatically creating an index for primary keys.
  2. Query Behavior:

    • Filtering with tijdstip = 1671276603 returns no results when the query planner uses the standen_pk index.
    • Filtering with tijdstip BETWEEN 1671276603 AND 1671276603 (equivalent to tijdstip = 1671276603) returns the expected row when the query planner uses the primary key directly.
  3. Data Integrity Check:

    • Running PRAGMA integrity_check; reveals inconsistencies in the standen_pk index, including missing rows and incorrect entry counts.
  4. Resolution:

    • Recreating the standen_pk index resolves the missing rows issue, confirming index corruption.

This problem highlights critical interactions between SQLite’s index management, schema design choices (e.g., WITHOUT ROWID), and the risks of redundant indexing.

Possible Causes: Index Corruption, Redundant Indexing, and Storage Engine Nuances

1. Index Corruption Due to Redundant Indexing

SQLite automatically creates an index for primary keys. Explicitly creating a unique index on the same column introduces redundancy, doubling the maintenance overhead for write operations (inserts, updates, deletes). If the database or application crashes during a write operation, or if there’s a filesystem error, one index might update successfully while the other does not. This creates an inconsistency between the primary key’s implicit index and the user-created index.

2. Query Planner Index Selection Pitfalls

SQLite’s query planner selects the most "efficient" index based on statistics and constraints. When multiple indexes exist on the same column, the planner might choose a corrupted index over the primary key’s implicit index. This leads to incorrect query results because the corrupted index lacks entries for specific rows.

3. WITHOUT ROWID Storage Implications

Tables declared WITHOUT ROWID store data clustered by the primary key. While this improves query performance for primary key lookups, it introduces complexity when secondary indexes reference the primary key. Secondary indexes in WITHOUT ROWID tables store the full primary key value, which can lead to larger index sizes and increased opportunities for corruption if not managed properly.

4. Implicit Data Type Conversion in Default Values

The tijdstip column is declared as INTEGER but uses a default value of strftime('%s','now'), which returns a text string. SQLite typically converts this text to an integer, but implicit conversions during insertion could theoretically lead to inconsistencies if the conversion fails or is interrupted.

5. Vacuum Operations and Index Stability

The user expressed concern about VACUUM operations altering primary key values in rowid tables. While unfounded (SQLite does not modify INTEGER PRIMARY KEY values during VACUUM), this misconception led to the use of WITHOUT ROWID, which inadvertently increased the risk of index corruption due to redundant indexing.

Troubleshooting Steps, Solutions, and Fixes

Step 1: Confirm Index Corruption with PRAGMA integrity_check

Run the following command to check for database inconsistencies:

PRAGMA integrity_check;  

If the output reports missing rows or mismatched entry counts in the standen_pk index, this confirms index corruption.

Example Output Analysis:

row 3187 missing from index standen_pk  
row 3188 missing from index standen_pk  
row 3189 missing from index standen_pk  
wrong # of entries in index standen_pk  

This indicates that the standen_pk index lacks entries for specific rows, causing queries relying on it to miss data.

Step 2: Remove Redundant Indexes

Drop the redundant standen_pk index to eliminate unnecessary write overhead and reduce corruption risks:

DROP INDEX standen_pk;  

The primary key’s implicit index will handle all lookups.

Step 3: Rebuild Corrupted Indexes

If the index is required for specific query patterns, rebuild it:

REINDEX standen_pk;  
-- Or  
DROP INDEX standen_pk;  
CREATE INDEX standen_pk ON standen(tijdstip);  

Rebuilding repopulates the index from the current table data, resolving missing entries.

Step 4: Evaluate WITHOUT ROWID Necessity

Unless the table has a composite primary key or requires clustering for performance, avoid WITHOUT ROWID. Convert to a rowid table:

CREATE TABLE standen_new (  
  tijdstip INTEGER PRIMARY KEY DEFAULT (unixepoch()),  
  ...  
);  
INSERT INTO standen_new SELECT * FROM standen;  
DROP TABLE standen;  
ALTER TABLE standen_new RENAME TO standen;  

This simplifies storage and reduces index management complexity.

Step 5: Use INTEGER PRIMARY KEY Correctly

Ensure primary keys are declared as INTEGER PRIMARY KEY to leverage SQLite’s rowid optimizations. Avoid default values that return text, as implicit conversions can introduce edge-case errors.

Step 6: Monitor for Corruption Triggers

  • Filesystem Stability: Ensure the database file resides on a stable filesystem with no unexpected interruptions.
  • Application Crashes: Implement proper transaction handling and error recovery to avoid partial writes.
  • Backup Strategy: Regularly back up the database and consider using SQLite’s backup API for live backups.

Step 7: Analyze Query Plans with .eqp

Use SQLite’s EXPLAIN QUERY PLAN to verify index usage:

.eqp on  
SELECT * FROM standen WHERE tijdstip = 1671276603;  

Ensure critical queries use the primary key index instead of secondary indexes.

Step 8: Upgrade SQLite Version

The user mentioned using SQLite 3.40.0 (2022-11-16). Newer versions include optimizations and corruption recovery enhancements. If possible, upgrade to at least SQLite 3.41.0 (2023-02-21), which introduced improvements to index integrity checks.

Step 9: Implement Preventive Maintenance

Schedule periodic integrity checks and index rebuilds:

-- Weekly check  
PRAGMA integrity_check;  

-- Monthly reindex  
REINDEX;  

Step 10: Avoid Implicit Data Type Conversions

Explicitly cast default values to integers to prevent conversion errors:

CREATE TABLE standen (  
  tijdstip INTEGER PRIMARY KEY DEFAULT (CAST(strftime('%s','now') AS INTEGER))  
) WITHOUT ROWID;  

Conclusion

The missing rows issue stemmed from a corrupted secondary index created redundantly on a primary key column. By eliminating redundant indexes, adhering to SQLite best practices for primary keys, and performing regular integrity checks, users can prevent such issues. Corrupted indexes require rebuilding, while schema design choices like WITHOUT ROWID should be reserved for specific performance-critical scenarios.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *