Updating Multiple Rows via JSON Array in SQLite: Handling NULL Values
Issue Overview: JSON-Based Bulk Update Producing NULL Values Instead of Expected Numbers
The core challenge involves attempting to update multiple rows in an SQLite table using values extracted from a JSON array. The table t contains columns f_id (foreign key), t_id (primary key), and amount (numeric value). A JSON object contains an array of objects under $.obj.arr, each specifying t_id and a new amount. The goal is to update the amount column in all rows matching the t_id values in the JSON array.
The original approach used nested json_extract() and json_each() calls to parse the JSON structure and correlate t_id values between the table and JSON data. However, this resulted in all amount values being set to NULL instead of the expected numbers. Subsequent analysis revealed that intermediate steps in the JSON parsing logic inadvertently stripped key names from the JSON objects, making it impossible to reference $.amount correctly.
This issue highlights three critical aspects of SQLite’s JSON functions:
- The behavior of
json_extract()when extracting multiple fields. - The structure of data returned by
json_each()in nested subqueries. - The importance of preserving JSON object key names during extraction to enable subsequent field access.
Possible Causes: Misalignment Between JSON Parsing Logic and SQLite Function Semantics
Cause 1: Incorrect Use of json_extract() for Multiple Field Extraction
The original query used json_extract(a.value, '$.amount', '$.t_id') as arr, assuming this would return a JSON object with both fields. However, json_extract() returns a concatenated array of values without keys when multiple paths are specified. For example, extracting $.amount and $.t_id from {"t_id":1,"amount":444} returns [444,1] instead of {"amount":444,"t_id":1}. This stripped the key names, making it impossible to reference $.amount in later steps.
Cause 2: Loss of JSON Object Structure in Subquery Aliasing
The subquery json_each(arr) a in the original statement operated on the array [444,1] (from Cause 1), which lacks key-value pairs. The value column in json_each() for such an array contains scalar values (e.g., 444, 1), not JSON objects. Consequently, json_extract(a.value, '$.amount') had no amount key to extract, resulting in NULL.
Cause 3: Absence of Correlation Between Subquery and Outer Table
The WHERE t_id = json_extract(arr, '$[1]') clause attempted to match the table’s t_id with the second element of the extracted array (1 or 2). However, because the subquery was not explicitly correlated to the outer UPDATE statement’s current row, this condition either matched incorrectly or not at all, leading to unintended NULL assignments.
Troubleshooting Steps, Solutions & Fixes: Correctly Mapping JSON Data to Table Updates
Step 1: Preserve JSON Object Structure During Initial Extraction
Instead of extracting multiple fields into an array, extract the entire JSON object for each array element. Modify the subquery to use json_extract(a.value, '$') to retain key-value pairs:
SELECT json_extract(a.value, '$') AS arr
FROM json_each(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.obj.arr'
) AS a
This returns JSON objects like {"t_id":1,"amount":444} in the arr column, preserving key names.
Step 2: Use json_each() on Full JSON Objects, Not Scalar Arrays
With the full JSON object preserved, apply json_each() to the arr column to iterate over its key-value pairs:
SELECT json_extract(a.value, '$.amount') AS amount
FROM (
SELECT json_extract(a.value, '$') AS arr
FROM json_each(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.obj.arr'
) AS a
) q, json_each(q.arr) AS a
WHERE json_extract(q.arr, '$.t_id') = t.t_id
Here, json_each(q.arr) iterates over the keys (t_id, amount) of the JSON object. The WHERE clause correlates the subquery with the outer table’s t_id.
Step 3: Simplify the Update Logic by Directly Joining JSON Data
A more efficient approach avoids nested subqueries by directly joining the table with the JSON array:
UPDATE t
SET amount = (
SELECT json_extract(a.value, '$.amount')
FROM json_each(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.obj.arr'
) AS a
WHERE json_extract(a.value, '$.t_id') = t.t_id
)
WHERE f_id = json_extract(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.f_id'
);
Key Improvements:
- The subquery in
SET amount = (...)directly correlatest.t_idwithjson_extract(a.value, '$.t_id'). json_each()iterates over the$.obj.arrarray, with eacha.valuebeing a JSON object ({"t_id":1,"amount":444}).- The outer
WHEREclause restricts updates to rows matching the JSON’sf_id.
Step 4: Validate JSON Paths and Intermediate Results
Use SELECT statements to test each component of the query:
-- Verify JSON array extraction
SELECT json_extract(a.value, '$') AS obj
FROM json_each(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.obj.arr'
) AS a;
-- Check field extraction from JSON objects
SELECT
json_extract(a.value, '$.t_id') AS t_id,
json_extract(a.value, '$.amount') AS amount
FROM json_each(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.obj.arr'
) AS a;
This ensures that t_id and amount are correctly parsed before integrating them into the UPDATE statement.
Step 5: Handle Edge Cases and Performance Considerations
- Missing
t_idin JSON: Rows with no matchingt_idin the JSON array will setamounttoNULL. UseCOALESCE(amount, t.amount)to retain existing values if needed. - Duplicate
t_idin JSON: The subquery returns the first matchingamount. UseLIMIT 1or aggregate functions if duplicates are possible. - Indexing: Ensure
t.t_idis indexed for faster correlation in large datasets.
Final Solution: Optimized and Robust Update Statement
UPDATE t
SET amount = (
SELECT json_extract(a.value, '$.amount')
FROM json_each(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.obj.arr'
) AS a
WHERE json_extract(a.value, '$.t_id') = t.t_id
)
WHERE EXISTS (
SELECT 1
FROM json_each(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.obj.arr'
) AS a
WHERE json_extract(a.value, '$.t_id') = t.t_id
)
AND f_id = json_extract(
'{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
'$.f_id'
);
Explanation:
- The
EXISTSclause ensures only rows with a matchingt_idin the JSON array are updated. - Direct correlation in the
SETsubquery avoids unnecessary joins. - Filtering by
f_idrestricts updates to the correct foreign key scope.
By systematically addressing the structural missteps in JSON parsing and aligning the subqueries with SQLite’s function semantics, this approach guarantees accurate bulk updates from JSON data while avoiding NULL assignment pitfalls.