Unexpected Non-NULL Results from LIMIT OFFSET with Large Offsets in SQLite
Understanding Subquery Behavior with Excessive OFFSET Values
Unexpected Non-NULL Returns in Subqueries with Large OFFSET
This issue occurs when a subquery using LIMIT 1 OFFSET N returns a non-NULL value even when N exceeds the number of available rows. The problem exhibits inconsistent behavior based on three critical factors:
- Sort Direction: Queries with
ORDER BY ... DESCreturn unexpected non-NULL values, whileORDER BY ... ASCreturns NULL - Distinct Value Count: Adding distinct values to the dataset resolves the issue
- Row Duplication: Duplicate values in the dataset trigger the anomalous behavior
Technical Manifestation
Given a table with 4 rows containing two distinct dates (2021-01-02 and 2021-01-03), the following query returns 2021-01-03 instead of NULL when using OFFSET 400:
SELECT (
SELECT DISTINCT date(acqtime)
FROM series
ORDER BY date(acqtime) DESC
LIMIT 1 OFFSET 400
) AS subquery_result
FROM series;
Contradictory Behaviors
- Changing
DESCtoASCreturns NULL - Adding a third distinct date (e.g.,
2021-01-04) makes the subquery return NULL - Using non-distinct values (e.g., all identical dates) triggers the anomaly
Root Causes of Erroneous LIMIT-OFFSET Execution
1. Query Optimizer Short-Circuiting with Duplicate Values
SQLite’s query planner uses an optimization called "flattening" for subqueries with ORDER BY and LIMIT. When all rows have identical values after DISTINCT, the optimizer incorrectly assumes:
- A single unique value exists
OFFSETbecomes irrelevant (since there’s "only one row")- Returns the first available value regardless of
OFFSETmagnitude
This optimization bypasses the normal row-counting logic that would detect oversized offsets.
Proof of Concept
CREATE TABLE t1(x);
INSERT INTO t1 VALUES(1),(1),(1); -- Three identical values
SELECT (SELECT DISTINCT x FROM t1 ORDER BY +x LIMIT 1 OFFSET 100) FROM t1;
-- Returns three 1's instead of NULL
2. Sort Direction Impact on Index Selection
The DESC sort order triggers different index traversal logic compared to ASC. When duplicate values exist:
ASCsorts may use a B-tree index’s natural ascending orderDESCsorts might utilize an optimized backward scan- Backward scans interact unpredictably with
DISTINCTandLIMITclauses
This explains why changing sort direction alters the outcome.
3. DISTINCT + ORDER BY Execution Pipeline
SQLite processes DISTINCT before applying LIMIT/OFFSET. The sequence is:
- Compute all distinct values
- Sort them per
ORDER BY - Apply
LIMITandOFFSET
When duplicates exist:
- Step 1 reduces the dataset to fewer rows than
OFFSET - The engine should return NULL but fails due to flawed row counting
Resolving LIMIT-OFFSET Anomalies and Workarounds
Step 1: Verify SQLite Version and Apply Patches
The bug was fixed in SQLite 3.37.2 (2022-01-12) via check-in 9282bcde301cee2a. Confirm your version:
SELECT sqlite_version();
If Unpatched:
- Upgrade to 3.37.2+
- For embedded systems: Recompile with the latest amalgamation
Step 2: Rewrite Queries to Bypass the Optimizer
Approach A: Prevent DISTINCT-ORDER BY Optimization
Add a no-op expression to disable short-circuiting:
SELECT (
SELECT DISTINCT date(acqtime) || ''
FROM series
ORDER BY date(acqtime) DESC
LIMIT 1 OFFSET 400
) FROM series;
The || '' forces re-evaluation of distinctness.
Approach B: Use a Wrapper Subquery
SELECT (
SELECT val FROM (
SELECT DISTINCT date(acqtime) AS val
FROM series
ORDER BY val DESC
) LIMIT 1 OFFSET 400
) FROM series;
Nesting subqueries disables premature optimization.
Step 3: Implement Edge-Case Handling
Solution 1: Pre-Calculate Row Counts
Use a CTE to verify offset validity:
WITH dates AS (
SELECT DISTINCT date(acqtime) AS d
FROM series
ORDER BY d DESC
),
total AS (
SELECT COUNT(*) AS cnt FROM dates
)
SELECT
CASE WHEN 400 < cnt THEN d ELSE NULL END
FROM dates, total
LIMIT 1 OFFSET 400;
Solution 2: COALESCE with Fallback
SELECT COALESCE(
(SELECT ... LIMIT 1 OFFSET 400),
(SELECT NULL WHERE 400 >= (SELECT COUNT(DISTINCT date(acqtime)) FROM series))
);
Step 4: Indexing Strategies to Avoid Duplicate Issues
Create a covering index to influence the optimizer’s behavior:
CREATE INDEX idx_series_acqtime_date
ON series(date(acqtime) DESC, acqtime);
This forces the engine to:
- Materialize computed dates
- Process
DISTINCTmore accurately - Apply
LIMIT/OFFSETcorrectly
Step 5: Permanent Schema Design Adjustments
Normalize Date Values
Create a separate dates table to eliminate duplicates:
CREATE TABLE acquisition_dates (
date_id INTEGER PRIMARY KEY,
acq_date TEXT UNIQUE
);
INSERT INTO acquisition_dates (acq_date)
SELECT DISTINCT date(acqtime) FROM series;
ALTER TABLE series ADD COLUMN date_id REFERENCES acquisition_dates(date_id);
Query using the normalized table:
SELECT (
SELECT acq_date
FROM acquisition_dates
ORDER BY acq_date DESC
LIMIT 1 OFFSET 400
) FROM series;
Step 6: Regression Testing for LIMIT-OFFSET Logic
Implement unit tests using SQLite’s .test commands:
-- File: limit_offset.test
CREATE TABLE t1(x);
INSERT INTO t1 VALUES(1),(1),(1);
SELECT CASE WHEN (
SELECT DISTINCT x FROM t1 ORDER BY x DESC LIMIT 1 OFFSET 100
) IS NULL THEN 'Pass' ELSE 'FAIL' END;
-- Expected output: Pass
Run tests via:
sqlite3 < limit_offset.test
Final Recommendation: Upgrade + Defensive Coding
- Mandatory Upgrade: Apply SQLite 3.37.2+
- Query Sanitization: Use
COALESCE/CASEto handle NULL expectations - Index Optimization: Precompute frequently used expressions like
date(acqtime) - Monitoring: Log queries with large
OFFSETvalues for audit
This comprehensive approach addresses both the immediate bug and underlying design flaws that expose systems to similar edge cases.