Precision Loss in geopoly_json Affects geopoly_contains_point Results
Geopoly Coordinate Representation Discrepancy: JSON Serialization vs. Internal BLOB Storage
Issue Overview: Mismatch Between geopoly_json Output and Internal Polygon Representation
The core problem arises from discrepancies in how the SQLite Geopoly extension handles coordinate precision during serialization versus internal computations. When using the geopoly_contains_point()
function with polygon coordinates stored directly as text (e.g., '[[0,0],[1,0],[1,1.000001],[0,0]]'
), the calculation preserves sufficient precision to return a containment result of 2 (indicating the point lies inside the polygon). However, when first converting the polygon to JSON via geopoly_json()
, the same query returns 0 (false), demonstrating a false negative due to precision loss.
This occurs because geopoly_json()
truncates coordinates to 6 significant digits during serialization, while the internal BLOB storage format preserves coordinates as 32-bit IEEE-754 floating-point numbers. The discrepancy between these representations becomes critical when polygon boundaries contain coordinates with small fractional offsets (e.g., 1.000001) that straddle decision boundaries in geometric calculations. The problem manifests most acutely when:
- Polygon vertices require sub-millimeter precision in geographic coordinates
- Applications use serialized JSON output for subsequent spatial operations
- Polygons contain vertices at extreme latitudes where degree-to-meter ratios amplify rounding errors
The internal BLOB format stores coordinates as binary 32-bit floats with ~7 decimal digits of precision, but preserves the exact bitwise representation used during computations. By contrast, geopoly_json()
converts these floats to JSON strings with forced rounding, creating irreversible precision loss. This violates the principle of idempotency: converting a polygon to JSON and back to BLOB does not yield the original spatial representation.
Root Causes: Serialization Rounding vs. Computational Precision
Three interrelated factors create this mismatch:
1. IEEE-754 Single-Precision Float Limitations
The Geopoly extension uses 32-bit floats for all coordinate storage and calculations to minimize memory footprint. While this provides ~7 significant decimal digits of precision, it introduces rounding errors during conversions between decimal text representations and binary floats. For example, the value 1.000001 cannot be represented exactly as a 32-bit float. The closest representable value is 1.00000095367431640625, which introduces a 0.00000095367431640625 error. When geopoly_json()
serializes this value, it rounds to 6 decimal places (1.000001), masking the true stored value and creating false equivalence.
2. geopoly_json() Forced Decimal Truncation
The JSON serialization function in Geopoly explicitly reduces coordinate precision to 6 decimal places through fixed-point formatting. This decision prioritizes human readability over computational accuracy, as 6 decimal places (~0.11m precision at equator) were deemed sufficient for most use cases. However, this truncation occurs after coordinates have already been converted to 32-bit floats, compounding rounding errors:
Original text input: 1.000001
→ Internal 32-bit float: 1.00000095367431640625
→ geopoly_json() output: 1.000001
The JSON output suggests higher precision than actually stored, creating an illusion of accuracy. When this JSON is reused in geopoly_contains_point()
, it gets reconverted to a 32-bit float, which may differ from the original internal value due to double-rounding errors.
3. Algorithm Sensitivity in geopoly_contains_point()
The point-in-polygon algorithm relies on exact coordinate comparisons when determining edge crossings. Small perturbations in vertex positions (as caused by JSON serialization) can flip the winding number calculation, especially near polygon boundaries. In the provided example:
Original polygon: [[0,0],[1,0],[1,1.000001],[0,0]]
JSON-serialized: [[0,0],[1,0],[1,1],[0,0]]
The Y-coordinate of the third vertex drops from ~1.00000095367431640625 to exactly 1.0. This transforms a quadrilateral into a triangle, fundamentally altering the polygon’s geometry and causing the test point (1/3, 1/3) to fall outside the revised shape.
Resolution Strategy: Bypassing JSON Serialization for Precision-Critical Workflows
1. Direct BLOB Storage and Retrieval
The Geopoly extension stores polygons internally in a binary BLOB format that preserves the exact 32-bit float representation used during computations. By working directly with these BLOBs instead of JSON-serialized text, applications avoid the precision loss introduced by geopoly_json()
.
Implementation Steps:
- Store polygons using
geopoly_blob()
instead of text representations:INSERT INTO geodata (id, polygon) VALUES (1, geopoly_blob('[[0,0],[1,0],[1,1.000001],[0,0]]'));
- Retrieve the BLOB for spatial queries:
SELECT geopoly_contains_point(polygon, 1.0/3, 1.0/3) FROM geodata WHERE id = 1;
This approach maintains parity between storage format and computational representation, eliminating serialization-induced errors.
2. Application-Side BLOB Parsing
When JSON output is required for interoperability (e.g., web APIs), parse the BLOB directly in application code to reconstruct coordinates with full 32-bit precision. The BLOB format consists of:
- 4-byte header:
- Byte 0: Flags (bit 0 = endianness: 0=big, 1=little)
- Bytes 1-3: Reserved (zero-filled)
- Coordinate pairs as 32-bit floats: X1, Y1, X2, Y2, …, Xn, Yn
Example TypeScript parser (extended from Adam Miller’s solution):
function geopolyBlobToCoords(buffer: ArrayBuffer): [number, number][] {
const view = new DataView(buffer);
const littleEndian = !!(view.getUint8(0) & 0x01);
const coords: [number, number][] = [];
// Start at byte 4 (after header)
for (let offset = 4; offset < buffer.byteLength; offset += 8) {
const x = view.getFloat32(offset, littleEndian);
const y = view.getFloat32(offset + 4, littleEndian);
coords.push([x, y]);
}
// Close polygon if not already closed
if (coords.length > 0 && (coords[0][0] !== coords[coords.length-1][0] ||
coords[0][1] !== coords[coords.length-1][1])) {
coords.push([coords[0][0], coords[0][1]]);
}
return coords;
}
This preserves the full precision of the 32-bit floats during conversion to JSON or other formats.
3. Precision-Aware JSON Serialization
If modifying application code isn’t feasible, implement a SQLite user-defined function (UDF) that serializes Geopoly BLOBs to JSON with configurable precision:
#include <sqlite3ext.h>
SQLITE_EXTENSION_INIT1
static void geopoly_json_prec(sqlite3_context *context, int argc, sqlite3_value **argv) {
const int prec = sqlite3_value_int(argv[1]);
// ... parse BLOB and serialize coordinates with 'prec' decimal digits ...
}
// Register UDF during extension loading:
sqlite3_create_function(db, "geopoly_json_prec", 2, SQLITE_UTF8, 0, geopoly_json_prec, 0, 0);
This allows developers to choose between human readability (lower precision) and computational accuracy (higher precision) based on use case requirements.
4. Schema Design Considerations
Prevent accidental precision loss through schema constraints:
- Store polygons exclusively as BLOBs using
geopoly_blob()
- Add CHECK constraints to prevent text-based polygon insertion:
CREATE TABLE geodata ( id INTEGER PRIMARY KEY, polygon BLOB CHECK (geopoly_valid(polygon)) );
- Use generated columns for JSON representations if needed:
ALTER TABLE geodata ADD COLUMN polygon_json TEXT GENERATED ALWAYS AS (geopoly_json(polygon)) VIRTUAL;
This ensures all spatial operations use the high-precision BLOB while allowing controlled access to JSON views.
5. Edge Case Mitigation
For polygons requiring sub-32-bit-float precision:
- Store critical coordinates as integers scaled by a precision factor (e.g., micrometers):
-- Store 1.000001 as 1000001 micrometers INSERT INTO geodata (polygon) VALUES (geopoly_blob('[[0,0],[1000000,0],[1000001,1000001],[0,0]]'));
- Apply scaling during queries:
SELECT geopoly_contains_point(polygon, :x*1000000, :y*1000000) FROM geodata;
This bypasses floating-point limitations entirely but requires application-level scaling coordination.
By prioritizing BLOB storage, implementing careful serialization, and designing schemas to enforce precision integrity, developers can eliminate discrepancies between Geopoly’s JSON output and internal calculations. These strategies align with SQLite’s philosophy of flexibility while mitigating the risks of silent precision loss in spatial computations.