Unexpected MAX() Window Function Results Due to Range Preceding Edge Case


Unexpected MAX() Window Function Behavior with Specific Range Preceding Values

The core issue involves unexpected results from the MAX() window function when applied to a dataset with a specific range-based window frame in SQLite. A user observed that a window defined with RANGE 365 PRECEDING produced a lower value than a window defined with RANGE 364 PRECEDING for the same row, despite the larger window logically including the same maximum value (1.0) present in the smaller window. This contradicts the expectation that expanding the window size should not reduce the computed maximum value unless the additional rows introduced by the larger window contain smaller values. In this case, all additional rows introduced by the larger window had a value of 0.0, which should not override the existing 1.0 maximum.

The dataset in question contains two columns:

  • c1: A strictly monotonically increasing integer with gaps (e.g., 4, 5, 6, …, 544).
  • c2: A real number column containing 0.0 for all rows except one (c1=537, c2=1.0).

The query computes two windowed maximums over c2:

  1. MAX(c2) OVER (ORDER BY c1 RANGE 364 PRECEDING)
  2. MAX(c2) OVER (ORDER BY c1 RANGE 365 PRECEDING)

For the row with c1=544, the first window correctly returns 1.0, while the second returns 0.0. This anomaly occurs despite both windows including the row c1=537 where c2=1.0. The root cause lies in how SQLite calculates the lower bound of the window frame for RANGE PRECEDING clauses under specific edge conditions, leading to incorrect exclusion of the critical row containing c2=1.0 in the larger window.


Boundary Calculation Errors in RANGE-Based Window Frames

The unexpected behavior stems from a miscalculation in determining the lower bound of the window frame when using RANGE PRECEDING with integer-ordered data. SQLite’s window function engine uses a binary search to identify the start of the window frame for RANGE clauses. This search relies on the ORDER BY column’s data type (integer in this case) and the range value to compute the lower bound. A bug in the boundary calculation caused the larger range (365 PRECEDING) to exclude the row c1=537 when computing the window for c1=544, even though 544 - 365 = 179, and the row c1=537 (which is greater than 179) should logically be included.

Key technical details:

  1. Range Preceding Logic:
    For a row with c1=X, the window frame includes all rows where c1 is between X - N and X, where N is the range value. For X=544, the lower bound should be 544 - 365 = 179. All rows with c1 >= 179 and c1 <= 544 should be included. The row c1=537 (which is within this range) contains c2=1.0 and should influence the maximum value.

  2. Binary Search Precision:
    SQLite uses a binary search on the ORDER BY column (c1) to find the first row greater than or equal to the lower bound. The bug caused the search to incorrectly identify the starting row, skipping c1=537 in the 365 PRECEDING window. This was due to an off-by-one error in the comparison logic when handling integer ranges.

  3. Data Type Handling:
    The c1 column is an integer, but the range calculation involves subtracting a value (365) from the current row’s c1. If the subtraction operation is not precisely bounded, it can lead to incorrect comparisons. For example, 544 - 365 = 179, but due to integer division or truncation in intermediate steps, the actual lower bound might have been computed as 180, excluding c1=179 (if present) and shifting the window frame.

  4. Assertion Failures in Debug Builds:
    The bug triggered an assertion failure in debug builds of SQLite, indicating an internal inconsistency in the window frame boundaries. In release builds (with assertions disabled), the miscalculation persisted silently, returning incorrect results.


Resolving Boundary Miscalculations in RANGE Windows

Step 1: Verify SQLite Version and Apply the Fix
The issue was patched in SQLite’s source code repository (see commit e042eb024738a83e). Users experiencing this problem should:

  1. Check their SQLite version using SELECT sqlite_version();.
  2. Upgrade to a version containing the fix (versions released after August 29, 2024).

Step 2: Workarounds for Unpatched Versions
If upgrading is not immediately feasible, apply these workarounds:

  • Use ROWS Instead of RANGE:
    Replace RANGE PRECEDING with ROWS PRECEDING if the ordering column (c1) has no duplicates and the goal is to include a fixed number of rows. For example:

    MAX(c2) OVER (ORDER BY c1 ROWS 365 PRECEDING)
    

    This ensures the window includes exactly 365 preceding rows, avoiding boundary calculation issues.

  • Adjust the Range Value:
    Increase the range slightly to ensure the critical row is included. For example, using RANGE 366 PRECEDING instead of 365 might bypass the miscalculation.

  • Materialize the Window Frame:
    Explicitly define the window frame using a subquery to compute the lower and upper bounds:

    SELECT 
      c1, 
      c2, 
      (SELECT MAX(c2) FROM weird WHERE c1 BETWEEN current.c1 - 364 AND current.c1) AS max_364,
      (SELECT MAX(c2) FROM weird WHERE c1 BETWEEN current.c1 - 365 AND current.c1) AS max_365
    FROM weird AS current;
    

Step 3: Validate Window Frame Boundaries
To debug window frames, manually compute the expected bounds and verify which rows are included:

-- For the row with c1=544:
SELECT 
  MIN(c1) AS lower_bound, 
  MAX(c1) AS upper_bound 
FROM weird 
WHERE c1 BETWEEN 544 - 365 AND 544;

If the result does not include c1=537, this indicates a deeper issue with data import or schema definitions.

Step 4: Audit Data Types and Indexes
Ensure the ORDER BY column (c1) is indexed and explicitly typed as INTEGER. SQLite may treat untyped columns as BLOB in some import scenarios, leading to unexpected comparisons. Recreate the table with strict typing:

CREATE TABLE weird (c1 INTEGER PRIMARY KEY, c2 REAL) STRICT;

Step 5: Test Edge Cases with Minimal Datasets
Reproduce the issue with a custom dataset to isolate boundary conditions:

INSERT INTO weird (c1, c2) VALUES
  (179, 0.0),
  (537, 1.0),
  (544, 0.0);

Run the window function query to see if the maximum value is correctly computed for c1=544.

Step 6: Monitor Debug Build Assertions
Compile SQLite with debugging enabled (-DSQLITE_DEBUG) to catch assertion failures during query execution. This helps identify unresolved boundary calculation issues.


By addressing the boundary calculation logic, ensuring proper data typing, and leveraging workarounds, users can mitigate this issue until a patched SQLite version is deployed. The fix underscores the importance of rigorous testing for window function edge cases, particularly with integer-ordered data and range-based frames.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *