Missing Rows on Integer PK Queries Due to Corrupted Index
Primary Key Index Mismatch and Query Plan Inconsistency in SQLite
Issue Overview: Unexpected Missing Rows When Filtering by Integer Primary Key
The core issue involves a scenario where rows present in a SQLite table are not returned when queried via an equality filter (WHERE PK = <value>
) that leverages a user-created unique index. However, the same rows are returned when using a range-based query (BETWEEN
) that directly accesses the primary key. The table in question uses an INTEGER
primary key column (tijdstip
), declared with WITHOUT ROWID
, and includes a redundant unique index on the same column.
Key observations from the problem description:
Schema Design:
- The
standen
table usestijdstip
(epoch timestamp) as its primary key. - The table is declared
WITHOUT ROWID
, forcing the primary key to act as the physical storage key. - A redundant unique index (
standen_pk
) is explicitly created ontijdstip
, despite SQLite automatically creating an index for primary keys.
- The
Query Behavior:
- Filtering with
tijdstip = 1671276603
returns no results when the query planner uses thestanden_pk
index. - Filtering with
tijdstip BETWEEN 1671276603 AND 1671276603
(equivalent totijdstip = 1671276603
) returns the expected row when the query planner uses the primary key directly.
- Filtering with
Data Integrity Check:
- Running
PRAGMA integrity_check;
reveals inconsistencies in thestanden_pk
index, including missing rows and incorrect entry counts.
- Running
Resolution:
- Recreating the
standen_pk
index resolves the missing rows issue, confirming index corruption.
- Recreating the
This problem highlights critical interactions between SQLite’s index management, schema design choices (e.g., WITHOUT ROWID
), and the risks of redundant indexing.
Possible Causes: Index Corruption, Redundant Indexing, and Storage Engine Nuances
1. Index Corruption Due to Redundant Indexing
SQLite automatically creates an index for primary keys. Explicitly creating a unique index on the same column introduces redundancy, doubling the maintenance overhead for write operations (inserts, updates, deletes). If the database or application crashes during a write operation, or if there’s a filesystem error, one index might update successfully while the other does not. This creates an inconsistency between the primary key’s implicit index and the user-created index.
2. Query Planner Index Selection Pitfalls
SQLite’s query planner selects the most "efficient" index based on statistics and constraints. When multiple indexes exist on the same column, the planner might choose a corrupted index over the primary key’s implicit index. This leads to incorrect query results because the corrupted index lacks entries for specific rows.
3. WITHOUT ROWID
Storage Implications
Tables declared WITHOUT ROWID
store data clustered by the primary key. While this improves query performance for primary key lookups, it introduces complexity when secondary indexes reference the primary key. Secondary indexes in WITHOUT ROWID
tables store the full primary key value, which can lead to larger index sizes and increased opportunities for corruption if not managed properly.
4. Implicit Data Type Conversion in Default Values
The tijdstip
column is declared as INTEGER
but uses a default value of strftime('%s','now')
, which returns a text string. SQLite typically converts this text to an integer, but implicit conversions during insertion could theoretically lead to inconsistencies if the conversion fails or is interrupted.
5. Vacuum Operations and Index Stability
The user expressed concern about VACUUM
operations altering primary key values in rowid tables. While unfounded (SQLite does not modify INTEGER PRIMARY KEY
values during VACUUM
), this misconception led to the use of WITHOUT ROWID
, which inadvertently increased the risk of index corruption due to redundant indexing.
Troubleshooting Steps, Solutions, and Fixes
Step 1: Confirm Index Corruption with PRAGMA integrity_check
Run the following command to check for database inconsistencies:
PRAGMA integrity_check;
If the output reports missing rows or mismatched entry counts in the standen_pk
index, this confirms index corruption.
Example Output Analysis:
row 3187 missing from index standen_pk
row 3188 missing from index standen_pk
row 3189 missing from index standen_pk
wrong # of entries in index standen_pk
This indicates that the standen_pk
index lacks entries for specific rows, causing queries relying on it to miss data.
Step 2: Remove Redundant Indexes
Drop the redundant standen_pk
index to eliminate unnecessary write overhead and reduce corruption risks:
DROP INDEX standen_pk;
The primary key’s implicit index will handle all lookups.
Step 3: Rebuild Corrupted Indexes
If the index is required for specific query patterns, rebuild it:
REINDEX standen_pk;
-- Or
DROP INDEX standen_pk;
CREATE INDEX standen_pk ON standen(tijdstip);
Rebuilding repopulates the index from the current table data, resolving missing entries.
Step 4: Evaluate WITHOUT ROWID
Necessity
Unless the table has a composite primary key or requires clustering for performance, avoid WITHOUT ROWID
. Convert to a rowid table:
CREATE TABLE standen_new (
tijdstip INTEGER PRIMARY KEY DEFAULT (unixepoch()),
...
);
INSERT INTO standen_new SELECT * FROM standen;
DROP TABLE standen;
ALTER TABLE standen_new RENAME TO standen;
This simplifies storage and reduces index management complexity.
Step 5: Use INTEGER PRIMARY KEY
Correctly
Ensure primary keys are declared as INTEGER PRIMARY KEY
to leverage SQLite’s rowid optimizations. Avoid default values that return text, as implicit conversions can introduce edge-case errors.
Step 6: Monitor for Corruption Triggers
- Filesystem Stability: Ensure the database file resides on a stable filesystem with no unexpected interruptions.
- Application Crashes: Implement proper transaction handling and error recovery to avoid partial writes.
- Backup Strategy: Regularly back up the database and consider using SQLite’s backup API for live backups.
Step 7: Analyze Query Plans with .eqp
Use SQLite’s EXPLAIN QUERY PLAN
to verify index usage:
.eqp on
SELECT * FROM standen WHERE tijdstip = 1671276603;
Ensure critical queries use the primary key index instead of secondary indexes.
Step 8: Upgrade SQLite Version
The user mentioned using SQLite 3.40.0 (2022-11-16). Newer versions include optimizations and corruption recovery enhancements. If possible, upgrade to at least SQLite 3.41.0 (2023-02-21), which introduced improvements to index integrity checks.
Step 9: Implement Preventive Maintenance
Schedule periodic integrity checks and index rebuilds:
-- Weekly check
PRAGMA integrity_check;
-- Monthly reindex
REINDEX;
Step 10: Avoid Implicit Data Type Conversions
Explicitly cast default values to integers to prevent conversion errors:
CREATE TABLE standen (
tijdstip INTEGER PRIMARY KEY DEFAULT (CAST(strftime('%s','now') AS INTEGER))
) WITHOUT ROWID;
Conclusion
The missing rows issue stemmed from a corrupted secondary index created redundantly on a primary key column. By eliminating redundant indexes, adhering to SQLite best practices for primary keys, and performing regular integrity checks, users can prevent such issues. Corrupted indexes require rebuilding, while schema design choices like WITHOUT ROWID
should be reserved for specific performance-critical scenarios.