Updating Multiple Rows via JSON Array in SQLite: Handling NULL Values

Issue Overview: JSON-Based Bulk Update Producing NULL Values Instead of Expected Numbers

The core challenge involves attempting to update multiple rows in an SQLite table using values extracted from a JSON array. The table t contains columns f_id (foreign key), t_id (primary key), and amount (numeric value). A JSON object contains an array of objects under $.obj.arr, each specifying t_id and a new amount. The goal is to update the amount column in all rows matching the t_id values in the JSON array.

The original approach used nested json_extract() and json_each() calls to parse the JSON structure and correlate t_id values between the table and JSON data. However, this resulted in all amount values being set to NULL instead of the expected numbers. Subsequent analysis revealed that intermediate steps in the JSON parsing logic inadvertently stripped key names from the JSON objects, making it impossible to reference $.amount correctly.

This issue highlights three critical aspects of SQLite’s JSON functions:

  1. The behavior of json_extract() when extracting multiple fields.
  2. The structure of data returned by json_each() in nested subqueries.
  3. The importance of preserving JSON object key names during extraction to enable subsequent field access.

Possible Causes: Misalignment Between JSON Parsing Logic and SQLite Function Semantics

Cause 1: Incorrect Use of json_extract() for Multiple Field Extraction

The original query used json_extract(a.value, '$.amount', '$.t_id') as arr, assuming this would return a JSON object with both fields. However, json_extract() returns a concatenated array of values without keys when multiple paths are specified. For example, extracting $.amount and $.t_id from {"t_id":1,"amount":444} returns [444,1] instead of {"amount":444,"t_id":1}. This stripped the key names, making it impossible to reference $.amount in later steps.

Cause 2: Loss of JSON Object Structure in Subquery Aliasing

The subquery json_each(arr) a in the original statement operated on the array [444,1] (from Cause 1), which lacks key-value pairs. The value column in json_each() for such an array contains scalar values (e.g., 444, 1), not JSON objects. Consequently, json_extract(a.value, '$.amount') had no amount key to extract, resulting in NULL.

Cause 3: Absence of Correlation Between Subquery and Outer Table

The WHERE t_id = json_extract(arr, '$[1]') clause attempted to match the table’s t_id with the second element of the extracted array (1 or 2). However, because the subquery was not explicitly correlated to the outer UPDATE statement’s current row, this condition either matched incorrectly or not at all, leading to unintended NULL assignments.

Troubleshooting Steps, Solutions & Fixes: Correctly Mapping JSON Data to Table Updates

Step 1: Preserve JSON Object Structure During Initial Extraction

Instead of extracting multiple fields into an array, extract the entire JSON object for each array element. Modify the subquery to use json_extract(a.value, '$') to retain key-value pairs:

SELECT json_extract(a.value, '$') AS arr  
FROM json_each(
  '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
  '$.obj.arr'
) AS a  

This returns JSON objects like {"t_id":1,"amount":444} in the arr column, preserving key names.

Step 2: Use json_each() on Full JSON Objects, Not Scalar Arrays

With the full JSON object preserved, apply json_each() to the arr column to iterate over its key-value pairs:

SELECT json_extract(a.value, '$.amount') AS amount  
FROM (
  SELECT json_extract(a.value, '$') AS arr  
  FROM json_each(
    '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
    '$.obj.arr'
  ) AS a  
) q, json_each(q.arr) AS a  
WHERE json_extract(q.arr, '$.t_id') = t.t_id  

Here, json_each(q.arr) iterates over the keys (t_id, amount) of the JSON object. The WHERE clause correlates the subquery with the outer table’s t_id.

Step 3: Simplify the Update Logic by Directly Joining JSON Data

A more efficient approach avoids nested subqueries by directly joining the table with the JSON array:

UPDATE t  
SET amount = (
  SELECT json_extract(a.value, '$.amount')  
  FROM json_each(
    '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
    '$.obj.arr'
  ) AS a  
  WHERE json_extract(a.value, '$.t_id') = t.t_id  
)  
WHERE f_id = json_extract(
  '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
  '$.f_id'
);  

Key Improvements:

  • The subquery in SET amount = (...) directly correlates t.t_id with json_extract(a.value, '$.t_id').
  • json_each() iterates over the $.obj.arr array, with each a.value being a JSON object ({"t_id":1,"amount":444}).
  • The outer WHERE clause restricts updates to rows matching the JSON’s f_id.

Step 4: Validate JSON Paths and Intermediate Results

Use SELECT statements to test each component of the query:

-- Verify JSON array extraction  
SELECT json_extract(a.value, '$') AS obj  
FROM json_each(
  '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
  '$.obj.arr'
) AS a;  

-- Check field extraction from JSON objects  
SELECT 
  json_extract(a.value, '$.t_id') AS t_id,  
  json_extract(a.value, '$.amount') AS amount  
FROM json_each(
  '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',
  '$.obj.arr'
) AS a;  

This ensures that t_id and amount are correctly parsed before integrating them into the UPDATE statement.

Step 5: Handle Edge Cases and Performance Considerations

  1. Missing t_id in JSON: Rows with no matching t_id in the JSON array will set amount to NULL. Use COALESCE(amount, t.amount) to retain existing values if needed.
  2. Duplicate t_id in JSON: The subquery returns the first matching amount. Use LIMIT 1 or aggregate functions if duplicates are possible.
  3. Indexing: Ensure t.t_id is indexed for faster correlation in large datasets.

Final Solution: Optimized and Robust Update Statement

UPDATE t  
SET amount = (  
  SELECT json_extract(a.value, '$.amount')  
  FROM json_each(  
    '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',  
    '$.obj.arr'  
  ) AS a  
  WHERE json_extract(a.value, '$.t_id') = t.t_id  
)  
WHERE EXISTS (  
  SELECT 1  
  FROM json_each(  
    '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',  
    '$.obj.arr'  
  ) AS a  
  WHERE json_extract(a.value, '$.t_id') = t.t_id  
)  
AND f_id = json_extract(  
  '{"f_id":1,"obj":{"arr":[{"t_id":1,"amount":444},{"t_id":2,"amount":443}]}}',  
  '$.f_id'  
);  

Explanation:

  • The EXISTS clause ensures only rows with a matching t_id in the JSON array are updated.
  • Direct correlation in the SET subquery avoids unnecessary joins.
  • Filtering by f_id restricts updates to the correct foreign key scope.

By systematically addressing the structural missteps in JSON parsing and aligning the subqueries with SQLite’s function semantics, this approach guarantees accurate bulk updates from JSON data while avoiding NULL assignment pitfalls.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *