Unexpected Non-NULL Results from LIMIT OFFSET with Large Offsets in SQLite


Understanding Subquery Behavior with Excessive OFFSET Values

Unexpected Non-NULL Returns in Subqueries with Large OFFSET

This issue occurs when a subquery using LIMIT 1 OFFSET N returns a non-NULL value even when N exceeds the number of available rows. The problem exhibits inconsistent behavior based on three critical factors:

  1. Sort Direction: Queries with ORDER BY ... DESC return unexpected non-NULL values, while ORDER BY ... ASC returns NULL
  2. Distinct Value Count: Adding distinct values to the dataset resolves the issue
  3. Row Duplication: Duplicate values in the dataset trigger the anomalous behavior

Technical Manifestation
Given a table with 4 rows containing two distinct dates (2021-01-02 and 2021-01-03), the following query returns 2021-01-03 instead of NULL when using OFFSET 400:

SELECT (
  SELECT DISTINCT date(acqtime) 
  FROM series 
  ORDER BY date(acqtime) DESC 
  LIMIT 1 OFFSET 400
) AS subquery_result
FROM series;

Contradictory Behaviors

  • Changing DESC to ASC returns NULL
  • Adding a third distinct date (e.g., 2021-01-04) makes the subquery return NULL
  • Using non-distinct values (e.g., all identical dates) triggers the anomaly

Root Causes of Erroneous LIMIT-OFFSET Execution

1. Query Optimizer Short-Circuiting with Duplicate Values

SQLite’s query planner uses an optimization called "flattening" for subqueries with ORDER BY and LIMIT. When all rows have identical values after DISTINCT, the optimizer incorrectly assumes:

  • A single unique value exists
  • OFFSET becomes irrelevant (since there’s "only one row")
  • Returns the first available value regardless of OFFSET magnitude

This optimization bypasses the normal row-counting logic that would detect oversized offsets.

Proof of Concept

CREATE TABLE t1(x);
INSERT INTO t1 VALUES(1),(1),(1); -- Three identical values
SELECT (SELECT DISTINCT x FROM t1 ORDER BY +x LIMIT 1 OFFSET 100) FROM t1;
-- Returns three 1's instead of NULL

2. Sort Direction Impact on Index Selection

The DESC sort order triggers different index traversal logic compared to ASC. When duplicate values exist:

  • ASC sorts may use a B-tree index’s natural ascending order
  • DESC sorts might utilize an optimized backward scan
  • Backward scans interact unpredictably with DISTINCT and LIMIT clauses

This explains why changing sort direction alters the outcome.

3. DISTINCT + ORDER BY Execution Pipeline

SQLite processes DISTINCT before applying LIMIT/OFFSET. The sequence is:

  1. Compute all distinct values
  2. Sort them per ORDER BY
  3. Apply LIMIT and OFFSET

When duplicates exist:

  • Step 1 reduces the dataset to fewer rows than OFFSET
  • The engine should return NULL but fails due to flawed row counting

Resolving LIMIT-OFFSET Anomalies and Workarounds

Step 1: Verify SQLite Version and Apply Patches

The bug was fixed in SQLite 3.37.2 (2022-01-12) via check-in 9282bcde301cee2a. Confirm your version:

SELECT sqlite_version();

If Unpatched:

  • Upgrade to 3.37.2+
  • For embedded systems: Recompile with the latest amalgamation

Step 2: Rewrite Queries to Bypass the Optimizer

Approach A: Prevent DISTINCT-ORDER BY Optimization
Add a no-op expression to disable short-circuiting:

SELECT (
  SELECT DISTINCT date(acqtime) || '' 
  FROM series 
  ORDER BY date(acqtime) DESC 
  LIMIT 1 OFFSET 400
) FROM series;

The || '' forces re-evaluation of distinctness.

Approach B: Use a Wrapper Subquery

SELECT (
  SELECT val FROM (
    SELECT DISTINCT date(acqtime) AS val 
    FROM series 
    ORDER BY val DESC
  ) LIMIT 1 OFFSET 400
) FROM series;

Nesting subqueries disables premature optimization.


Step 3: Implement Edge-Case Handling

Solution 1: Pre-Calculate Row Counts
Use a CTE to verify offset validity:

WITH dates AS (
  SELECT DISTINCT date(acqtime) AS d 
  FROM series 
  ORDER BY d DESC
), 
total AS (
  SELECT COUNT(*) AS cnt FROM dates
)
SELECT 
  CASE WHEN 400 < cnt THEN d ELSE NULL END 
FROM dates, total 
LIMIT 1 OFFSET 400;

Solution 2: COALESCE with Fallback

SELECT COALESCE(
  (SELECT ... LIMIT 1 OFFSET 400),
  (SELECT NULL WHERE 400 >= (SELECT COUNT(DISTINCT date(acqtime)) FROM series))
);

Step 4: Indexing Strategies to Avoid Duplicate Issues

Create a covering index to influence the optimizer’s behavior:

CREATE INDEX idx_series_acqtime_date 
ON series(date(acqtime) DESC, acqtime);

This forces the engine to:

  1. Materialize computed dates
  2. Process DISTINCT more accurately
  3. Apply LIMIT/OFFSET correctly

Step 5: Permanent Schema Design Adjustments

Normalize Date Values
Create a separate dates table to eliminate duplicates:

CREATE TABLE acquisition_dates (
  date_id INTEGER PRIMARY KEY,
  acq_date TEXT UNIQUE
);

INSERT INTO acquisition_dates (acq_date)
SELECT DISTINCT date(acqtime) FROM series;

ALTER TABLE series ADD COLUMN date_id REFERENCES acquisition_dates(date_id);

Query using the normalized table:

SELECT (
  SELECT acq_date 
  FROM acquisition_dates 
  ORDER BY acq_date DESC 
  LIMIT 1 OFFSET 400
) FROM series;

Step 6: Regression Testing for LIMIT-OFFSET Logic

Implement unit tests using SQLite’s .test commands:

-- File: limit_offset.test
CREATE TABLE t1(x);
INSERT INTO t1 VALUES(1),(1),(1);

SELECT CASE WHEN (
  SELECT DISTINCT x FROM t1 ORDER BY x DESC LIMIT 1 OFFSET 100
) IS NULL THEN 'Pass' ELSE 'FAIL' END;

-- Expected output: Pass

Run tests via:

sqlite3 < limit_offset.test

Final Recommendation: Upgrade + Defensive Coding

  1. Mandatory Upgrade: Apply SQLite 3.37.2+
  2. Query Sanitization: Use COALESCE/CASE to handle NULL expectations
  3. Index Optimization: Precompute frequently used expressions like date(acqtime)
  4. Monitoring: Log queries with large OFFSET values for audit

This comprehensive approach addresses both the immediate bug and underlying design flaws that expose systems to similar edge cases.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *