AVG() and SUM() Handling Infinity in SQLite 3.43.0: Behavior Changes and Fixes


Issue Overview: AVG() and SUM() Functions Returning NULL for Infinity in SQLite 3.43.0

In SQLite version 3.43.0, a significant change was introduced in the behavior of the AVG() and SUM() aggregate functions when dealing with infinity (Infinity or -Infinity) in floating-point calculations. This change was a result of an experimental enhancement to the accuracy of the SUM() function using the Kahan-Babuska-Neumaier algorithm, which inadvertently altered how infinity values are processed. Specifically, the AVG() function began returning NULL when encountering infinity, whereas in previous versions (e.g., 3.42.0), it would return a valid floating-point result, such as 9.0e+999.

The issue was first reported when a user observed that the following SQL query produced different results between SQLite 3.42.0 and 3.43.0:

CREATE TABLE foo (bar FLOAT);
INSERT INTO foo VALUES (37.5);
INSERT INTO foo VALUES (1e10000); -- This value is treated as infinity
SELECT AVG(bar) FROM foo;

In SQLite 3.42.0, the query returned 9.0e+999, while in SQLite 3.43.0, it returned NULL. This behavior was unexpected and raised concerns about backward compatibility and the mathematical correctness of aggregate functions in SQLite.

Further investigation revealed that the issue was not limited to AVG(). The SUM() function also exhibited similar behavior when dealing with infinity in certain contexts, particularly in window functions. For example, the following query demonstrated incorrect results across multiple versions of SQLite:

WITH t1(x) AS (VALUES(1.0),(2.0),(3.0),(+9.0e+999),(5.0),(6.0),(7.0))
SELECT SUM(x) OVER (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM t1;

The root cause of the issue lies in how SQLite handles floating-point arithmetic and the representation of infinity. When summing values that include both positive and negative infinity, the result can become NaN (Not a Number), which SQLite translates to NULL. This behavior is consistent with the IEEE 754 floating-point standard but can be problematic for users who rely on SQLite to handle large numbers gracefully.


Possible Causes: Kahan-Babuska-Neumaier Algorithm and Infinity Handling

The change in behavior was introduced in a commit aimed at improving the accuracy of the SUM() function using the Kahan-Babuska-Neumaier algorithm. This algorithm is designed to reduce numerical errors in floating-point summation by maintaining a compensation variable that accounts for lost precision during addition. While this enhancement improved the accuracy of SUM() for many use cases, it inadvertently affected how infinity values are processed.

In SQLite, infinity is represented as a special floating-point value that exceeds the maximum representable finite number. When the SUM() or AVG() function encounters such values, it must decide how to handle them. The Kahan-Babuska-Neumaier algorithm, while effective for finite values, does not inherently account for the special properties of infinity. As a result, the algorithm can produce unexpected results, such as returning NULL when infinity is involved.

Another contributing factor is the way SQLite handles the combination of positive and negative infinity. When adding +Infinity and -Infinity, the result is mathematically undefined, leading to NaN. SQLite translates NaN to NULL, which is consistent with its handling of other undefined or invalid operations. However, this behavior can be counterintuitive for users who expect SQLite to return a valid result, such as +Infinity or -Infinity, depending on the context.

The issue is further complicated by the use of window functions, where the SUM() function operates over a sliding window of rows. In such cases, the presence of infinity can lead to incorrect results due to the way intermediate sums are computed and propagated. This problem has existed in SQLite for a long time and is not easily resolved without significant changes to the underlying arithmetic logic.


Troubleshooting Steps, Solutions & Fixes: Addressing Infinity Handling in SQLite

To address the issue of AVG() and SUM() returning NULL for infinity in SQLite 3.43.0, several steps can be taken. These include updating to the latest version of SQLite, modifying queries to handle infinity explicitly, and understanding the limitations of floating-point arithmetic in SQLite.

1. Update to the Latest Version of SQLite

The first and most straightforward solution is to update to the latest version of SQLite. The issue was fixed in the trunk version and the branch-3.43 branch following the bug report. Users can verify the fix by running the following queries:

CREATE TABLE foo (bar FLOAT);
INSERT INTO foo VALUES (37.5);
INSERT INTO foo VALUES (1e10000);
SELECT SUM(bar) FROM foo;
SELECT AVG(bar) FROM foo;

In the fixed version, both SUM() and AVG() should return valid results, such as 9.0e+999, instead of NULL. Users can download the latest version from the SQLite website or build it from source.

2. Handle Infinity Explicitly in Queries

If updating SQLite is not an option, users can modify their queries to handle infinity explicitly. This involves checking for infinity values before applying aggregate functions and adjusting the logic accordingly. For example, the following query calculates the average while ignoring infinity values:

SELECT AVG(CASE WHEN bar = 1e10000 THEN NULL ELSE bar END) FROM foo;

This approach ensures that infinity values do not interfere with the calculation of the average. However, it requires careful consideration of the data and the specific requirements of the application.

3. Understand the Limitations of Floating-Point Arithmetic

Users should be aware of the limitations of floating-point arithmetic in SQLite, particularly when dealing with very large numbers or infinity. The IEEE 754 standard, which SQLite follows, defines specific rules for handling infinity and NaN, but these rules may not always align with user expectations. For example, adding +Infinity and -Infinity results in NaN, which SQLite translates to NULL.

To avoid unexpected results, users can normalize their data to avoid infinity values or use alternative data types, such as integers or decimals, where appropriate. Additionally, users can implement custom logic to handle edge cases, such as checking for infinity before performing arithmetic operations.

4. Use Window Functions with Caution

When using window functions, such as SUM() OVER, users should be cautious when dealing with infinity. The sliding window mechanism can amplify the impact of infinity values, leading to incorrect results. To mitigate this, users can filter out infinity values or use conditional logic to handle them appropriately. For example:

WITH t1(x) AS (VALUES(1.0),(2.0),(3.0),(+9.0e+999),(5.0),(6.0),(7.0))
SELECT SUM(CASE WHEN x = +9.0e+999 THEN 0 ELSE x END) OVER (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM t1;

This query replaces infinity values with 0 before applying the window function, ensuring that the result is not affected by undefined arithmetic.

5. Monitor Future Updates and Documentation

SQLite is actively maintained, and future updates may introduce further improvements to the handling of infinity and floating-point arithmetic. Users should monitor the SQLite changelog and documentation for updates related to aggregate functions and arithmetic behavior. The SQLite documentation now includes a section on handling infinities in SUM(), which provides additional context and guidance.

By following these steps, users can address the issue of AVG() and SUM() returning NULL for infinity in SQLite 3.43.0 and ensure that their queries produce accurate and reliable results. Understanding the underlying causes and limitations of floating-point arithmetic is key to avoiding similar issues in the future.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *