Extracting Original JSON Strings from json_each/json_tree in SQLite
Understanding JSON Parsing Behavior in SQLite’s json_each/json_tree Functions
The challenge of preserving the original JSON string representation of values processed through SQLite’s json_each
or json_tree
functions arises from the inherent design of these functions. When JSON data is parsed using these table-valued functions, SQLite converts JSON values into their corresponding SQLite storage classes (e.g., integers, reals, text). This conversion process discards the original formatting and literal representations present in the raw JSON string. For example, JSON boolean values (true
/false
) are converted to integers (1
/0
), and floating-point numbers may lose precision if their string representation in the JSON document contains more significant digits than SQLite’s REAL
type can store.
The json
column returned by json_each
or json_tree
provides the raw JSON string for the entire document, not individual values within it. This means that while the entire JSON structure is accessible as a string, extracting the exact substring corresponding to a specific value (e.g., a boolean, number, or nested object) requires additional processing. The absence of a built-in mechanism to retrieve the original string representation of individual values forces developers to reconstruct these values manually, which introduces complexity when attempting to preserve formatting, avoid precision loss, or maintain boolean literals.
Root Causes of Formatting Loss in JSON Value Extraction
The core issue stems from SQLite’s JSON parsing logic, which prioritizes efficient querying over preserving literal representations. When json_each
or json_tree
traverses a JSON document, it decodes each value into an SQLite datatype. For instance:
- JSON
true
/false
becomes integer1
/0
. - JSON numbers are parsed into SQLite
INTEGER
orREAL
types, potentially truncating precision. - JSON strings are unescaped and returned as
TEXT
without surrounding quotes.
This behavior is intentional, as it allows JSON values to participate in SQL operations (e.g., comparisons, arithmetic). However, it becomes problematic when the goal is to reconstruct JSON with the original formatting. The json_quote
function can reintroduce quotes around strings, but it does not address booleans, numbers, or nested structures. Furthermore, floating-point values converted to SQLite’s REAL
type may not retain the exact decimal representation from the original JSON due to limitations in binary floating-point storage.
Another contributing factor is the lack of positional metadata. The json_each
and json_tree
functions do not provide byte offsets or substring ranges for values within the original JSON string. Without this information, it is impossible to directly extract the raw substring corresponding to a parsed value. Developers must instead rely on the parsed type
and value
columns to infer how to reconstruct the original JSON fragment.
Reconstructing Original JSON Values Using Type-Guided Formatting
To preserve the original JSON string representations, a combination of conditional logic and formatting functions must be applied to the type
and value
columns returned by json_each
or json_tree
. The following approach addresses each JSON data type systematically:
Step 1: Handle Boolean Values Explicitly
JSON booleans (true
/false
) are parsed as integers (1
/0
). Use the type
column to detect these cases and output the literal strings 'true'
or 'false'
:
CASE
WHEN type = 'true' THEN 'true'
WHEN type = 'false' THEN 'false'
...
END
Step 2: Preserve Floating-Point Precision
JSON numbers parsed as SQLite REAL
may lose precision. Use printf
with a format specifier that guarantees lossless string conversion. The format '%!.17g'
ensures that all significant digits of a 64-bit floating-point number are retained:
WHEN type = 'real' THEN printf('%!.17g', value)
Step 3: Quote Strings and Escape Special Characters
JSON strings are unescaped and returned without quotes. Use json_quote
to reapply quotes and escape characters like "
and \
:
WHEN type = 'text' THEN json_quote(value)
Step 4: Directly Output Nulls and Integers
JSON null
and integer values do not require special formatting. Output 'null'
for nulls and the integer’s string representation:
WHEN type = 'null' THEN 'null'
WHEN type = 'integer' THEN CAST(value AS TEXT)
Step 5: Reconstruct Nested Objects and Arrays
For JSON objects and arrays, recursively apply the same formatting logic to each child element. Use json_tree
with a recursive common table expression (CTE) to traverse nested structures:
WITH RECURSIVE rebuild_json(path, value_str) AS (
SELECT
key,
CASE type
WHEN 'true' THEN 'true'
...
END
FROM json_tree(?)
WHERE key IS NOT NULL
UNION ALL
SELECT
json_tree.key,
CASE json_tree.type
WHEN 'true' THEN 'true'
...
END
FROM json_tree, rebuild_json
WHERE json_tree.parent = rebuild_json.path
)
SELECT json_group_object(key, value_str) FROM rebuild_json;
Step 6: Aggregate Reconstructed Values
Use json_group_object
or json_group_array
to combine individually formatted values into a complete JSON document. Ensure commas and brackets are correctly placed when reconstructing arrays or objects manually.
Limitations and Considerations
- Performance: Recursive CTEs and per-row formatting can be slow for large JSON documents.
- Compatibility: The
printf
function’s behavior may vary across SQLite builds. Verify that'%!.17g'
produces the expected output. - Numeric Detection: SQLite does not distinguish between integers and floats in JSON. Use the
type
column fromjson_each
/json_tree
to determine the original JSON type.
By systematically applying these steps, developers can reconstruct JSON values that match the original document’s formatting, avoiding pitfalls related to type conversion and precision loss.