SUM Function Returns NULL Instead of Infinity When Filter Applied in SQLite 3.43+


Floating-Point Overflow Handling Discrepancy in Aggregation with Conditional Filters

Issue Overview

The problem arises when using the SUM aggregate function in SQLite 3.43.0 and later versions under specific conditions involving floating-point overflow. The core issue manifests as an unexpected NULL result instead of an infinity (Inf) value when summing extremely large floating-point numbers if a conditional filter (e.g., WHERE clause) is applied to the aggregation. This behavior contrasts with unfiltered queries, where the same summation correctly returns Inf to indicate overflow.

The discrepancy is tied to the Kahan-Babushka-Neumaier summation algorithm introduced in SQLite 3.43.0 to improve numerical accuracy for SUM and AVG operations. The algorithm’s internal error compensation mechanism fails to handle overflow scenarios gracefully when the filtered dataset includes valid rows but the summation process encounters an intermediate overflow during compensation steps.

For example, consider a table t0 with a DOUBLE column c0 containing values 1, 2, and 3. Executing SELECT SUM(1.7976931348623157E308) FROM t0 (an unfiltered query) returns Inf, as the constant value 1.7976931348623157E308 (very close to the maximum representable 64-bit floating-point value) is summed three times, exceeding the IEEE 754 double-precision limit. However, adding a filter like WHERE c0 > 1 causes the same summation to return NULL, even though two rows meet the condition. This inconsistency violates the expectation that filtered and unfiltered aggregations should behave identically when the filter does not exclude all rows.

The root cause lies in how the Kahan-Babushka-Neumaier algorithm initializes and updates its internal state during summation. When a WHERE clause is present, the algorithm’s error compensation logic may prematurely reset or miscompute intermediate values during overflow, leading to an internal state that SQLite interprets as undefined (hence NULL). This occurs despite the filtered rows being valid contributors to the sum.


Algorithmic Edge Case in Kahan-Babushka-Neumaier Summation with Filtered Datasets

Possible Causes

  1. Intermediate Overflow Handling in Kahan-Babushka-Neumaier Algorithm
    The Kahan-Babushka-Neumaier summation algorithm tracks two variables: a running sum (sum) and a compensation term (err) that captures low-order bits lost due to rounding errors. For each value val added to the sum, the algorithm computes:
    temp = sum + val
    If |sum| >= |val|, then err += (sum - temp) + val
    Else, err += (val - temp) + sum
    sum = temp

    When summing values at the extreme upper limit of the floating-point range (e.g., 1.7976931348623157E308), adding even one such value to a zero-initialized sum immediately sets sum to 1.7976931348623157E308. Adding a second identical value overflows to infinity (Inf). However, the algorithm’s compensation logic may fail to propagate this overflow correctly when the WHERE clause alters the order of operations or skips rows. Specifically, if the first row processed by the filtered query triggers an overflow, subsequent rows may not contribute to the err term as expected, leaving the final sum in an undefined state.

  2. Interaction Between Filtered Rows and Aggregation Initialization
    SQLite’s SUM function initializes the aggregation buffer to NULL and updates it iteratively. When a WHERE clause filters out rows, the initial state of the aggregation buffer depends on whether any rows pass the filter. In the example, two rows pass the c0 > 1 filter. However, the Kahan-Babushka-Neumaier algorithm’s compensation logic may treat the first valid row as causing an overflow, which interacts poorly with the aggregation buffer’s initialization. The overflow state is not carried forward correctly, leading to an internal inconsistency where the algorithm resets the sum to NULL instead of propagating Inf.

  3. Type Affinity and Overflow Detection in SQLite
    SQLite uses dynamic typing, but columns declared as DOUBLE have REAL affinity. When summing literals or expressions, SQLite may treat values as integers or floating-point numbers based on context. The constant 1.7976931348623157E308 is explicitly a floating-point literal, but the aggregation logic must still detect overflow during arithmetic operations. The Kahan-Babushka-Neumaier algorithm’s compensation steps might bypass SQLite’s usual overflow detection mechanisms, leading to an undefined result instead of a forced conversion to Inf.


Resolving Incorrect NULL Results in SUM Queries with Floating-Point Overflow

Troubleshooting Steps, Solutions & Fixes

Step 1: Confirm SQLite Version and Reproduction Steps
Verify that the issue occurs in SQLite 3.43.0 or later. Execute the following queries to confirm the behavior:

-- Create test table
CREATE TABLE t0 (c0 DOUBLE);
INSERT INTO t0(c0) VALUES (1), (2), (3);

-- Unfiltered SUM (returns Inf)
SELECT SUM(1.7976931348623157E308) as aggr FROM t0;

-- Filtered SUM (returns NULL)
SELECT SUM(1.7976931348623157E308) as aggr FROM t0 WHERE c0 > 1;

If the filtered query returns NULL, the issue is present.

Step 2: Understand the Role of the WHERE Clause
The WHERE clause does not merely filter rows; it alters the sequence in which rows are processed by the aggregation algorithm. In the unfiltered case, all three rows are summed, causing an immediate overflow. In the filtered case, only two rows are processed. The Kahan-Babushka-Neumaier algorithm’s error compensation logic may mishandle the transition from the first to the second row when the first addition already overflows to Inf.

Step 3: Update SQLite to a Patched Version
The SQLite development team has addressed this issue in the trunk and branch versions following the bug report. Upgrade to a version where the fix is included. Check the SQLite changelog for fixes related to the Kahan-Babushka-Neumaier algorithm in SUM/AVG.

Step 4: Modify the Query to Avoid Intermediate Overflow
If upgrading is not feasible, rewrite the query to prevent the summation from hitting the overflow threshold. For example, scale down the values before summing and adjust the result accordingly:

SELECT SUM(1.7976931348623157E308 / 2) * 2 as aggr FROM t0 WHERE c0 > 1;

This distributes the large value across smaller terms, reducing the risk of overflow. However, this approach sacrifices precision and may not be viable for all use cases.

Step 5: Use CAST to Enforce Numeric Handling
Explicitly cast the summed expression to REAL to ensure consistent floating-point handling:

SELECT SUM(CAST(1.7976931348623157E308 AS REAL)) as aggr FROM t0 WHERE c0 > 1;

While this may not resolve the overflow issue, it ensures SQLite treats the value as a floating-point number throughout the aggregation.

Step 6: Implement a Custom Summation UDF
For critical applications requiring precise overflow handling, create a user-defined function (UDF) in SQLite using the C API. This UDF can implement a simpler summation logic that propagates Inf correctly:

#include <sqlite3ext.h>
SQLITE_EXTENSION_INIT1

static void udf_sum_inf(sqlite3_context *context, int argc, sqlite3_value **argv) {
    double sum = 0.0;
    for (int i = 0; i < argc; i++) {
        sum += sqlite3_value_double(argv[i]);
        if (isinf(sum)) {
            sqlite3_result_double(context, sum);
            return;
        }
    }
    sqlite3_result_double(context, sum);
}

// Register the UDF in extension loading
int sqlite3_extension_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi) {
    SQLITE_EXTENSION_INIT2(pApi);
    sqlite3_create_function(db, "udf_sum_inf", 1, SQLITE_UTF8, NULL, udf_sum_inf, NULL, NULL);
    return SQLITE_OK;
}

Compile this as a loadable extension and use it in place of SUM:

SELECT udf_sum_inf(1.7976931348623157E308) as aggr FROM t0 WHERE c0 > 1;

Step 7: Adjust Data Types or Constraints
If the large values are not essential to the dataset, consider storing scaled-down values or using TEXT to represent them as strings, converting to REAL only when necessary. This avoids triggering the overflow during aggregation.

Step 8: Monitor Future SQLite Updates
Subscribe to SQLite release announcements and test new versions in a staging environment to confirm whether the fix for this edge case has been backported to stable releases.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *