GCC Floating-Point Precision Loss in 32-bit Optimized Builds Affecting Dekker Algorithm

Floating-Point Calculation Discrepancies in x86-32 GCC-Optimized Code Using Extended Precision Registers

The core issue involves unexpected numerical inaccuracies in SQLite’s floating-point operations when compiled for 32-bit x86 targets using GCC with optimization levels ≥O1. This manifests specifically in the dekkerMul2 function, which implements Dekker’s multiplication algorithm for precise floating-point error compensation. Under optimization, the computed r[1] value deviates from the expected result by ~1.6 trillion due to excess intermediate precision retained in x87 FPU registers. The problem vanishes when:

  • Compiling for x86-64 (SSE2-based floating-point by default)
  • Disabling optimization (-O0)
  • Forcing intermediate value spills to memory (-ffloat-store)
  • Adding debug instrumentation (e.g., printf) that implicitly limits register reuse

This behavior is rooted in historical x87 FPU architecture and GCC’s adherence to C99/C11 floating-point contraction rules. The SQLite project encountered this during test suite validation, where minor floating-point errors caused assertion failures in algorithms requiring high precision. The problem persists across GCC versions (confirmed in 5.4.0–11.3.0) and is classified as a known compiler behavior rather than a defect, as per GCC Bugzilla #323.

Technical Context: Dekker’s Algorithm and Precision Requirements

Dekker’s algorithm splits floating-point values into high/low components to compute exact error terms. For 64-bit double types, it assumes all operations are performed in standard double-precision (53-bit mantissa). However, x87 FPUs perform arithmetic in 80-bit extended precision by default, retaining extra bits across operations unless explicitly rounded. GCC’s optimizer exploits this by keeping intermediate values in registers, violating the algorithm’s precision assumptions.

Example failure mode in dekkerMul2:

p = hx * hy;          // Product computed with 80-bit precision  
q = hx * ty + tx * hy;  
c = p + q;            // Sum retains 80-bit precision  
cc = p - c + q;       // Error term miscalculated due to unrounded c  

When c remains in an 80-bit register, cc loses lower-order bits critical for error correction, propagating inaccuracies to x[1].

Root Causes: x87 FPU Behavior, GCC Optimization Policies, and Algorithmic Assumptions

Three interrelated factors contribute to the issue:

1. x87 FPU Extended Precision in 32-bit Mode

The x87 FPU uses 80-bit registers for all floating-point operations, preserving 64 bits of mantissa versus 53 bits for IEEE 754 double. When targeting 32-bit x86, GCC defaults to x87 instructions for floating-point math. Intermediate values remain in registers with extended precision unless explicitly stored to memory (truncating to 64 bits). This violates Dekker’s algorithm’s requirement for strict double-precision rounding at each step.

2. GCC’s Floating-Point Contraction and Register Allocation

GCC’s optimizer (-O1 and above) merges operations and minimizes memory spills. Under C99/C11, it may contract expressions like a*b + c into fused multiply-add (FMA) operations, though x87 lacks native FMA support. More critically, it retains intermediate results in extended-precision registers across multiple statements, deferring rounding indefinitely. This directly conflicts with algorithms expecting per-operation rounding.

3. Inadequate Compiler Directives for Precision Control

While C99 provides #pragma STDC FP_CONTRACT to control expression contraction, GCC does not fully honor this for x87 targets. The -fexcess-precision=standard flag forces rounding at sequence points (as mandated by C99), but this is not the default for historical compatibility. SQLite’s cross-platform requirements preclude mandating non-default compiler flags, necessitating source-level workarounds.

Resolution Strategies: Enforcing Double-Precision Semantics in x87 Code Paths

1. Forcing Intermediate Value Spills with volatile

Marking variables as volatile forces the compiler to spill them to memory after each write, truncating to 64-bit precision. This mimics the -ffloat-store behavior without global compiler flag dependencies.

Modified Code from Discussion:

void dekkerMul2(volatile double *x, double y, double yy) {
  volatile double hx, tx, hy, ty, p, q, c, cc;
  // ... rest of algorithm unchanged ...
}  

Advantages:

  • No compiler-specific pragmas or flags required
  • Applies only to performance-critical floating-point code
  • Portable across compilers (Clang, ICC, etc.)

Disadvantages:

  • Slightly larger codegen due to redundant loads/stores
  • Overuse risks performance degradation in non-FPU-bound code

Validation:
Verify r[1] matches the expected +1.618612575499918457031250000000e+12 in the test harness.

2. Compiler Flags for Standards-Compliant Rounding

Explicitly enable C99 rounding rules via command-line flags:

gcc bug.c -m32 -O1 -fexcess-precision=standard -std=c99

Mechanism:

  • -fexcess-precision=standard: Round intermediate results to semantic type precision at sequence points (statement boundaries).
  • -std=c99: Enforce C99 semantics, which implicitly disables unsafe contractions.

Deployment Considerations:

  • SQLite cannot mandate custom flags for amalgamated sqlite3.c.
  • Effective for local builds but unsuitable for distributed source code.

3. Targeting SSE2 for 32-bit Floating-Point

Force GCC to use SSE instructions instead of x87 for 32-bit builds:

gcc bug.c -m32 -O1 -mfpmath=sse -msse2

Requirements:

  • CPU must support SSE2 (universal on x86 CPUs since ~2004).
  • -mfpmath=sse sets SSE as default FPU; -msse2 enables instruction set.

Benefits:

  • SSE uses fixed 64-bit precision for double, eliminating x87 excess bits.
  • No source changes required.

Drawbacks:

  • Not applicable to legacy x86 CPUs without SSE2 (rare in 2024).
  • May alter ABI for floating-point parameters.

4. Selective FPU Control with #pragma GCC optimize

Apply optimization settings to specific functions via pragmas:

#pragma GCC push_options  
#pragma GCC optimize("float-store")  
void dekkerMul2(double *x, double y, double yy) { ... }  
#pragma GCC pop_options  

Effect:

  • Disables register retention for floating-point variables in dekkerMul2.
  • Less intrusive than function-wide volatile.

Limitations:

  • GCC-specific; breaks portability to MSVC, ICC, etc.
  • May interact unpredictably with LTO (Link-Time Optimization).

5. Algorithmic Adjustment for Extended Precision

Redesign Dekker’s algorithm to tolerate 80-bit intermediates by recalculating error terms with compensated arithmetic. This is non-trivial and increases computational overhead, but avoids compiler-specific fixes.

Example Adjustment:

// Recompute c using only double-precision values  
c = (volatile double)(p + q);  
cc = (volatile double)(p - c + q);  

Challenges:

  • Requires rigorous error analysis for each operation.
  • Potentially undermines algorithm’s original purpose.

6. Run-Time FPU Control Word Modification

Adjust the x87 FPU precision at runtime via fpu_control_t:

#include <fpu_control.h>  
void dekkerMul2(...) {
  fpu_control_t cw;  
  _FPU_GETCW(cw);  
  cw = (cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;  
  _FPU_SETCW(cw);  
  // ... algorithm ...  
  _FPU_SETCW(_FPU_EXTENDED);  // Restore if needed  
}  

Caveats:

  • Thread-unsafe: FPU state is per-thread.
  • Overhead from repeated FPU writes.
  • Incompatible with SIMD code expecting default state.

Decision Matrix for Mitigation Strategies

StrategyPortabilityPerformance ImpactCode Changes Required
volatile VariablesHighModerateLocalized
Compiler FlagsLowNoneNone
SSE TargetingMediumNoneNone
#pragma optimizeLowModerateLocalized
Algorithmic AdjustmentHighHighExtensive
Runtime FPU ControlMediumHighLocalized

SQLite’s Adopted Solution:
The check-in 5d9e9364 uses a hybrid approach:

  • volatile qualifiers on critical variables.
  • Platform-specific #pragma GCC optimize("float-store") guarded by #ifdef __GNUC__.

This balances portability with minimal performance impact, assuming modern GCC versions dominate 32-bit x86 deployments.

Long-Term Considerations for Floating-Point Reliability

  1. SSE Adoption: x86-64 mandates SSE2, making 32-bit x87 increasingly obsolete. Prioritize SSE-enabled toolchains for legacy 32-bit targets.
  2. Compiler Warnings: Enable -Wfloat-conversion, -Wdouble-promotion to catch implicit precision changes.
  3. Unit Testing: Augment SQLite’s test suite with hardware-agnostic floating-point validation using integer checksums of mantissa bits.
  4. Alternative Algorithms: Investigate IEEE 754-2008-compliant methods like error-free transforms that are less sensitive to intermediate precision.
// Example IEEE 754-2008 TwoSum algorithm (requires strict rounding)
void twoSum(double a, double b, double *s, double *t) {
  double s1 = a + b;
  double c = s1 - a;
  *t = (a - (s1 - c)) + (b - c);
  *s = s1;
}

By combining immediate rounding (volatile, -ffloat-store) with forward-looking hardware targeting, SQLite mitigates the GCC/x87 precision discrepancy while maintaining its zero-configuration ethos.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *