SUM Function Returns NULL Instead of Infinity When Filter Applied in SQLite 3.43+
Floating-Point Overflow Handling Discrepancy in Aggregation with Conditional Filters
Issue Overview
The problem arises when using the SUM
aggregate function in SQLite 3.43.0 and later versions under specific conditions involving floating-point overflow. The core issue manifests as an unexpected NULL
result instead of an infinity (Inf
) value when summing extremely large floating-point numbers if a conditional filter (e.g., WHERE
clause) is applied to the aggregation. This behavior contrasts with unfiltered queries, where the same summation correctly returns Inf
to indicate overflow.
The discrepancy is tied to the Kahan-Babushka-Neumaier summation algorithm introduced in SQLite 3.43.0 to improve numerical accuracy for SUM
and AVG
operations. The algorithm’s internal error compensation mechanism fails to handle overflow scenarios gracefully when the filtered dataset includes valid rows but the summation process encounters an intermediate overflow during compensation steps.
For example, consider a table t0
with a DOUBLE
column c0
containing values 1
, 2
, and 3
. Executing SELECT SUM(1.7976931348623157E308) FROM t0
(an unfiltered query) returns Inf
, as the constant value 1.7976931348623157E308
(very close to the maximum representable 64-bit floating-point value) is summed three times, exceeding the IEEE 754 double-precision limit. However, adding a filter like WHERE c0 > 1
causes the same summation to return NULL
, even though two rows meet the condition. This inconsistency violates the expectation that filtered and unfiltered aggregations should behave identically when the filter does not exclude all rows.
The root cause lies in how the Kahan-Babushka-Neumaier algorithm initializes and updates its internal state during summation. When a WHERE
clause is present, the algorithm’s error compensation logic may prematurely reset or miscompute intermediate values during overflow, leading to an internal state that SQLite interprets as undefined (hence NULL
). This occurs despite the filtered rows being valid contributors to the sum.
Algorithmic Edge Case in Kahan-Babushka-Neumaier Summation with Filtered Datasets
Possible Causes
Intermediate Overflow Handling in Kahan-Babushka-Neumaier Algorithm
The Kahan-Babushka-Neumaier summation algorithm tracks two variables: a running sum (sum
) and a compensation term (err
) that captures low-order bits lost due to rounding errors. For each valueval
added to the sum, the algorithm computes:
temp = sum + val
If|sum| >= |val|
, thenerr += (sum - temp) + val
Else,err += (val - temp) + sum
sum = temp
When summing values at the extreme upper limit of the floating-point range (e.g.,
1.7976931348623157E308
), adding even one such value to a zero-initializedsum
immediately setssum
to1.7976931348623157E308
. Adding a second identical value overflows to infinity (Inf
). However, the algorithm’s compensation logic may fail to propagate this overflow correctly when theWHERE
clause alters the order of operations or skips rows. Specifically, if the first row processed by the filtered query triggers an overflow, subsequent rows may not contribute to theerr
term as expected, leaving the final sum in an undefined state.Interaction Between Filtered Rows and Aggregation Initialization
SQLite’sSUM
function initializes the aggregation buffer toNULL
and updates it iteratively. When aWHERE
clause filters out rows, the initial state of the aggregation buffer depends on whether any rows pass the filter. In the example, two rows pass thec0 > 1
filter. However, the Kahan-Babushka-Neumaier algorithm’s compensation logic may treat the first valid row as causing an overflow, which interacts poorly with the aggregation buffer’s initialization. The overflow state is not carried forward correctly, leading to an internal inconsistency where the algorithm resets thesum
toNULL
instead of propagatingInf
.Type Affinity and Overflow Detection in SQLite
SQLite uses dynamic typing, but columns declared asDOUBLE
haveREAL
affinity. When summing literals or expressions, SQLite may treat values as integers or floating-point numbers based on context. The constant1.7976931348623157E308
is explicitly a floating-point literal, but the aggregation logic must still detect overflow during arithmetic operations. The Kahan-Babushka-Neumaier algorithm’s compensation steps might bypass SQLite’s usual overflow detection mechanisms, leading to an undefined result instead of a forced conversion toInf
.
Resolving Incorrect NULL Results in SUM Queries with Floating-Point Overflow
Troubleshooting Steps, Solutions & Fixes
Step 1: Confirm SQLite Version and Reproduction Steps
Verify that the issue occurs in SQLite 3.43.0 or later. Execute the following queries to confirm the behavior:
-- Create test table
CREATE TABLE t0 (c0 DOUBLE);
INSERT INTO t0(c0) VALUES (1), (2), (3);
-- Unfiltered SUM (returns Inf)
SELECT SUM(1.7976931348623157E308) as aggr FROM t0;
-- Filtered SUM (returns NULL)
SELECT SUM(1.7976931348623157E308) as aggr FROM t0 WHERE c0 > 1;
If the filtered query returns NULL
, the issue is present.
Step 2: Understand the Role of the WHERE Clause
The WHERE
clause does not merely filter rows; it alters the sequence in which rows are processed by the aggregation algorithm. In the unfiltered case, all three rows are summed, causing an immediate overflow. In the filtered case, only two rows are processed. The Kahan-Babushka-Neumaier algorithm’s error compensation logic may mishandle the transition from the first to the second row when the first addition already overflows to Inf
.
Step 3: Update SQLite to a Patched Version
The SQLite development team has addressed this issue in the trunk and branch versions following the bug report. Upgrade to a version where the fix is included. Check the SQLite changelog for fixes related to the Kahan-Babushka-Neumaier algorithm in SUM
/AVG
.
Step 4: Modify the Query to Avoid Intermediate Overflow
If upgrading is not feasible, rewrite the query to prevent the summation from hitting the overflow threshold. For example, scale down the values before summing and adjust the result accordingly:
SELECT SUM(1.7976931348623157E308 / 2) * 2 as aggr FROM t0 WHERE c0 > 1;
This distributes the large value across smaller terms, reducing the risk of overflow. However, this approach sacrifices precision and may not be viable for all use cases.
Step 5: Use CAST to Enforce Numeric Handling
Explicitly cast the summed expression to REAL
to ensure consistent floating-point handling:
SELECT SUM(CAST(1.7976931348623157E308 AS REAL)) as aggr FROM t0 WHERE c0 > 1;
While this may not resolve the overflow issue, it ensures SQLite treats the value as a floating-point number throughout the aggregation.
Step 6: Implement a Custom Summation UDF
For critical applications requiring precise overflow handling, create a user-defined function (UDF) in SQLite using the C API. This UDF can implement a simpler summation logic that propagates Inf
correctly:
#include <sqlite3ext.h>
SQLITE_EXTENSION_INIT1
static void udf_sum_inf(sqlite3_context *context, int argc, sqlite3_value **argv) {
double sum = 0.0;
for (int i = 0; i < argc; i++) {
sum += sqlite3_value_double(argv[i]);
if (isinf(sum)) {
sqlite3_result_double(context, sum);
return;
}
}
sqlite3_result_double(context, sum);
}
// Register the UDF in extension loading
int sqlite3_extension_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi) {
SQLITE_EXTENSION_INIT2(pApi);
sqlite3_create_function(db, "udf_sum_inf", 1, SQLITE_UTF8, NULL, udf_sum_inf, NULL, NULL);
return SQLITE_OK;
}
Compile this as a loadable extension and use it in place of SUM
:
SELECT udf_sum_inf(1.7976931348623157E308) as aggr FROM t0 WHERE c0 > 1;
Step 7: Adjust Data Types or Constraints
If the large values are not essential to the dataset, consider storing scaled-down values or using TEXT
to represent them as strings, converting to REAL
only when necessary. This avoids triggering the overflow during aggregation.
Step 8: Monitor Future SQLite Updates
Subscribe to SQLite release announcements and test new versions in a staging environment to confirm whether the fix for this edge case has been backported to stable releases.