Unexpected Non-NULL Results from LIMIT OFFSET with Large Offsets in SQLite
Understanding Subquery Behavior with Excessive OFFSET Values
Unexpected Non-NULL Returns in Subqueries with Large OFFSET
This issue occurs when a subquery using LIMIT 1 OFFSET N
returns a non-NULL value even when N
exceeds the number of available rows. The problem exhibits inconsistent behavior based on three critical factors:
- Sort Direction: Queries with
ORDER BY ... DESC
return unexpected non-NULL values, whileORDER BY ... ASC
returns NULL - Distinct Value Count: Adding distinct values to the dataset resolves the issue
- Row Duplication: Duplicate values in the dataset trigger the anomalous behavior
Technical Manifestation
Given a table with 4 rows containing two distinct dates (2021-01-02
and 2021-01-03
), the following query returns 2021-01-03
instead of NULL when using OFFSET 400
:
SELECT (
SELECT DISTINCT date(acqtime)
FROM series
ORDER BY date(acqtime) DESC
LIMIT 1 OFFSET 400
) AS subquery_result
FROM series;
Contradictory Behaviors
- Changing
DESC
toASC
returns NULL - Adding a third distinct date (e.g.,
2021-01-04
) makes the subquery return NULL - Using non-distinct values (e.g., all identical dates) triggers the anomaly
Root Causes of Erroneous LIMIT-OFFSET Execution
1. Query Optimizer Short-Circuiting with Duplicate Values
SQLite’s query planner uses an optimization called "flattening" for subqueries with ORDER BY
and LIMIT
. When all rows have identical values after DISTINCT
, the optimizer incorrectly assumes:
- A single unique value exists
OFFSET
becomes irrelevant (since there’s "only one row")- Returns the first available value regardless of
OFFSET
magnitude
This optimization bypasses the normal row-counting logic that would detect oversized offsets.
Proof of Concept
CREATE TABLE t1(x);
INSERT INTO t1 VALUES(1),(1),(1); -- Three identical values
SELECT (SELECT DISTINCT x FROM t1 ORDER BY +x LIMIT 1 OFFSET 100) FROM t1;
-- Returns three 1's instead of NULL
2. Sort Direction Impact on Index Selection
The DESC
sort order triggers different index traversal logic compared to ASC
. When duplicate values exist:
ASC
sorts may use a B-tree index’s natural ascending orderDESC
sorts might utilize an optimized backward scan- Backward scans interact unpredictably with
DISTINCT
andLIMIT
clauses
This explains why changing sort direction alters the outcome.
3. DISTINCT + ORDER BY Execution Pipeline
SQLite processes DISTINCT
before applying LIMIT/OFFSET
. The sequence is:
- Compute all distinct values
- Sort them per
ORDER BY
- Apply
LIMIT
andOFFSET
When duplicates exist:
- Step 1 reduces the dataset to fewer rows than
OFFSET
- The engine should return NULL but fails due to flawed row counting
Resolving LIMIT-OFFSET Anomalies and Workarounds
Step 1: Verify SQLite Version and Apply Patches
The bug was fixed in SQLite 3.37.2 (2022-01-12) via check-in 9282bcde301cee2a. Confirm your version:
SELECT sqlite_version();
If Unpatched:
- Upgrade to 3.37.2+
- For embedded systems: Recompile with the latest amalgamation
Step 2: Rewrite Queries to Bypass the Optimizer
Approach A: Prevent DISTINCT-ORDER BY Optimization
Add a no-op expression to disable short-circuiting:
SELECT (
SELECT DISTINCT date(acqtime) || ''
FROM series
ORDER BY date(acqtime) DESC
LIMIT 1 OFFSET 400
) FROM series;
The || ''
forces re-evaluation of distinctness.
Approach B: Use a Wrapper Subquery
SELECT (
SELECT val FROM (
SELECT DISTINCT date(acqtime) AS val
FROM series
ORDER BY val DESC
) LIMIT 1 OFFSET 400
) FROM series;
Nesting subqueries disables premature optimization.
Step 3: Implement Edge-Case Handling
Solution 1: Pre-Calculate Row Counts
Use a CTE to verify offset validity:
WITH dates AS (
SELECT DISTINCT date(acqtime) AS d
FROM series
ORDER BY d DESC
),
total AS (
SELECT COUNT(*) AS cnt FROM dates
)
SELECT
CASE WHEN 400 < cnt THEN d ELSE NULL END
FROM dates, total
LIMIT 1 OFFSET 400;
Solution 2: COALESCE with Fallback
SELECT COALESCE(
(SELECT ... LIMIT 1 OFFSET 400),
(SELECT NULL WHERE 400 >= (SELECT COUNT(DISTINCT date(acqtime)) FROM series))
);
Step 4: Indexing Strategies to Avoid Duplicate Issues
Create a covering index to influence the optimizer’s behavior:
CREATE INDEX idx_series_acqtime_date
ON series(date(acqtime) DESC, acqtime);
This forces the engine to:
- Materialize computed dates
- Process
DISTINCT
more accurately - Apply
LIMIT/OFFSET
correctly
Step 5: Permanent Schema Design Adjustments
Normalize Date Values
Create a separate dates
table to eliminate duplicates:
CREATE TABLE acquisition_dates (
date_id INTEGER PRIMARY KEY,
acq_date TEXT UNIQUE
);
INSERT INTO acquisition_dates (acq_date)
SELECT DISTINCT date(acqtime) FROM series;
ALTER TABLE series ADD COLUMN date_id REFERENCES acquisition_dates(date_id);
Query using the normalized table:
SELECT (
SELECT acq_date
FROM acquisition_dates
ORDER BY acq_date DESC
LIMIT 1 OFFSET 400
) FROM series;
Step 6: Regression Testing for LIMIT-OFFSET Logic
Implement unit tests using SQLite’s .test
commands:
-- File: limit_offset.test
CREATE TABLE t1(x);
INSERT INTO t1 VALUES(1),(1),(1);
SELECT CASE WHEN (
SELECT DISTINCT x FROM t1 ORDER BY x DESC LIMIT 1 OFFSET 100
) IS NULL THEN 'Pass' ELSE 'FAIL' END;
-- Expected output: Pass
Run tests via:
sqlite3 < limit_offset.test
Final Recommendation: Upgrade + Defensive Coding
- Mandatory Upgrade: Apply SQLite 3.37.2+
- Query Sanitization: Use
COALESCE
/CASE
to handle NULL expectations - Index Optimization: Precompute frequently used expressions like
date(acqtime)
- Monitoring: Log queries with large
OFFSET
values for audit
This comprehensive approach addresses both the immediate bug and underlying design flaws that expose systems to similar edge cases.