Conditional UPSERT Handling in SQLite: Avoiding Unnecessary Updates and NULL Overwrites

UPSERT Operations with Selective Updates and Insert Tracking

The core challenge revolves around performing UPSERT (INSERT OR UPDATE) operations in SQLite where updates should only occur if specific column values differ between the incoming data and existing records. Additionally, the requirement is to track which rows were actually inserted or updated, excluding those where no change occurred. This becomes particularly complex when dealing with NULL values in the incoming data, as SQL’s default behavior might unintentionally overwrite existing non-NULL values with NULLs. The problem is compounded when using Object-Relational Mapping (ORM) tools like SQLAlchemy, which may lack native support for SQLite’s RETURNING clause – a feature critical for identifying affected rows post-operation.

In the provided scenario, the new_test table uses fld_text as a unique key. The goal is to:

Insert new rows for unmatched fld_text values
Update existing rows only when:
- Numeric columns (fld_int, fld_int1, fld_int2) differ from new values
- NULL values in incoming data don’t override existing non-NULL values
Retrieve clear differentiation between inserted and modified records

A naive implementation using basic ON CONFLICT DO UPDATE would cause unnecessary writes when values are identical or NULLs are present, leading to inflated write operations and difficulty in tracking actual changes. The solution requires precise control over conflict resolution logic and NULL handling, coupled with a reliable method to capture operation outcomes.

NULL Propagation Risks and ORM Limitations in Change Tracking

Three primary obstacles create the described behavior:

Default UPSERT Value Overwriting
SQLite’s ON CONFLICT DO UPDATE clause unconditionally applies all specified changes unless explicitly filtered via a WHERE clause. When incoming data contains NULL values, this results in undesired overwriting of existing non-NULL values unless mitigated through conditional expressions.
NULL Comparison Semantics
SQL’s three-valued logic treats NULL as unknown, causing direct comparisons (excluded.fld_int1 <> fld_int1) to return NULL (effectively false) when either operand is NULL. This prevents proper detection of value differences when NULLs are involved, requiring special handling through functions like COALESCE or IFNULL.
ORM Feature Gaps with SQLite
SQLAlchemy versions prior to 1.4 lack full support for SQLite’s RETURNING clause implementation (added in SQLite 3.35). Even when available, ORM abstractions may not cleanly map returned tuples to model objects, forcing developers to choose between ORM convenience and operational visibility.

The interaction between these factors creates a perfect storm:

Unfiltered UPSERT operations generate false-positive updates
NULL values in input data corrupt existing records
Application layers struggle to discern actual data changes

Strategic UPSERT Implementation with NULL Safety and ORM Integration

Step 1: Conditional Update with NULL Protection

Modify the UPDATE portion of the UPSERT to preserve existing values when incoming data contains NULLs:

INSERT INTO new_test (fld_text, fld_int1, fld_int2)
VALUES ('B', NULL, 20), ('C', 300, NULL), ('D', 4, 4)
ON CONFLICT(fld_text) DO UPDATE SET
  fld_int1 = COALESCE(excluded.fld_int1, fld_int1),
  fld_int2 = COALESCE(excluded.fld_int2, fld_int2)
WHERE 
  COALESCE(excluded.fld_int1, fld_int1) != fld_int1 OR
  COALESCE(excluded.fld_int2, fld_int2) != fld_int2
RETURNING fld_text, fld_int1, fld_int2;

Key Components:

COALESCE(excluded.column, existing.column) ensures NULL inputs don’t overwrite existing values
The WHERE clause compares post-coalesce values to detect actual changes
RETURNING clause identifies modified rows (SQLite 3.35+ required)

Step 2: Handling NULL-Specific Comparisons

For scenarios requiring different NULL handling (e.g., allowing explicit NULL overwrites), use explicit NULL checks:

ON CONFLICT(fld_text) DO UPDATE SET
  fld_int1 = CASE 
    WHEN excluded.fld_int1 IS NOT NULL THEN excluded.fld_int1 
    ELSE fld_int1 
  END
WHERE 
  (excluded.fld_int1 IS NOT NULL AND excluded.fld_int1 != fld_int1) OR
  (excluded.fld_int1 IS NULL AND fld_int1 IS NOT NULL)

This approach:

Only updates fld_int1 when the new value is explicitly non-NULL and different
Allows NULL to overwrite existing values if the input NULL is intentional

Step 3: ORM Workarounds for RETURNING Clause

When using SQLAlchemy with SQLite versions supporting RETURNING (3.35+), bypass ORM limitations by executing raw SQL while maintaining connection integrity:

from sqlalchemy import text

upsert_query = text("""
    INSERT INTO new_test (...) VALUES (...)
    ON CONFLICT (...) DO UPDATE ...
    RETURNING id, fld_text
""")

result = session.execute(upsert_query).fetchall()
# Convert tuples to dictionaries for easier handling
upserted_rows = [dict(row) for row in result]

Hybrid Approach:

Use ORM for standard CRUD operations
Switch to raw SQL for complex UPSERTs requiring RETURNING
Map returned tuples to application objects manually

Step 4: Version-Safe SQL Construction

For environments with variable SQLite versions, implement conditional SQL generation:

def upsert_new_test(items):
    base_sql = """
        INSERT INTO new_test (fld_text, fld_int1, fld_int2)
        VALUES {values}
        ON CONFLICT(fld_text) DO UPDATE SET
          fld_int1 = COALESCE(excluded.fld_int1, new_test.fld_int1),
          fld_int2 = COALESCE(excluded.fld_int2, new_test.fld_int2)
        {returning}
    """
    
    values = ", ".join(["(:text_{i}, :int1_{i}, :int2_{i})" for i in range(len(items))])
    params = {}
    for i, item in enumerate(items):
        params.update({
            f"text_{i}": item["text"],
            f"int1_{i}": item["int1"],
            f"int2_{i}": item["int2"]
        })
    
    # Check for RETURNING support
    if sqlite_version >= (3, 35):
        base_sql = base_sql.format(values=values, returning="RETURNING *")
        result = session.execute(text(base_sql), params)
        return result.fetchall()
    else:
        base_sql = base_sql.format(values=values, returning="")
        session.execute(text(base_sql), params)
        # Alternative change detection via last_insert_rowid() and updated counts
        return manual_change_detection(items)

Step 5: Manual Change Detection (Pre-RETURNING Fallback)

When stuck with SQLite <3.35, track changes through temporary tables:

-- Create temp table for incoming data
CREATE TEMP TABLE incoming_data(
  fld_text TEXT PRIMARY KEY,
  fld_int1 INTEGER,
  fld_int2 INTEGER
);

-- Insert/update main table
INSERT INTO new_test
SELECT * FROM incoming_data
ON CONFLICT(fld_text) DO UPDATE SET ...;

-- Detect inserts
SELECT * FROM new_test
WHERE fld_text IN (SELECT fld_text FROM incoming_data)
  AND id > (SELECT MAX(id) FROM new_test BEFORE UPSERT);

-- Detect updates
SELECT * FROM new_test
WHERE fld_text IN (SELECT fld_text FROM incoming_data)
  AND (fld_int1, fld_int2) NOT IN (
    SELECT fld_int1, fld_int2 FROM incoming_data
  );

While less efficient, this provides a version-agnostic method to identify changes.

Step 6: Transactional Integrity in Bulk Operations

Wrap complex UPSERT logic in transactions to maintain consistency:

try:
    session.begin()
    # Execute raw SQL UPSERT
    upsert_result = session.execute(upsert_sql)
    # Manual object refresh if needed
    for row in upsert_result:
        obj = session.query(NewTest).filter_by(fld_text=row.fld_text).first()
        if obj:
            session.refresh(obj)
    session.commit()
except:
    session.rollback()
    raise

This ensures:

Atomic application of all changes
Fresh object state in the ORM post-commit
Safe rollback on errors

Step 7: Index Optimization for UPSERT Performance

Enhance UPSERT efficiency by aligning indexes with conflict targets:

-- Original index
CREATE UNIQUE INDEX new_test_idx ON new_test(fld_text);

-- Add covering index for frequently updated columns
CREATE INDEX new_test_update_cover ON new_test(fld_text)
INCLUDE (fld_int1, fld_int2);

Covering indexes reduce disk I/O during UPDATE operations by keeping frequently accessed columns in the index structure.

Step 8: Monitoring Update Triggers

Add triggers to track changes separately from the application layer:

CREATE TABLE new_test_audit(
  id INTEGER PRIMARY KEY,
  operation TEXT,
  fld_text TEXT,
  old_int1 INTEGER,
  new_int1 INTEGER,
  change_time DATETIME
);

CREATE TRIGGER after_new_test_update
AFTER UPDATE ON new_test
FOR EACH ROW
BEGIN
  INSERT INTO new_test_audit(operation, fld_text, old_int1, new_int1, change_time)
  VALUES ('UPDATE', NEW.fld_text, OLD.fld_int1, NEW.fld_int1, CURRENT_TIMESTAMP);
END;

This provides an audit trail independent of the application’s RETURNING clause handling.

Step 9: JSON Extension Integration

For SQLite versions with JSON support (3.38+), enhance change tracking through JSON diffs:

INSERT INTO new_test ...
ON CONFLICT(fld_text) DO UPDATE SET ...
RETURNING 
  json_patch(
    json_object(
      'fld_int1', OLD.fld_int1,
      'fld_int2', OLD.fld_int2
    ),
    json_object(
      'fld_int1', NEW.fld_int1,
      'fld_int2', NEW.fld_int2
    )
  ) AS changes;

This returns a JSON object detailing exactly which fields changed during the update.

Step 10: Application-Level Caching

Reduce database roundtrips by caching existing values before UPSERT operations:

# Pre-cache existing values
existing_items = session.query(NewTest).filter(
    NewTest.fld_text.in_([item['text'] for item in upsert_items])
).all()

existing_map = {item.fld_text: item for item in existing_items}

# Perform UPSERT
upsert_result = session.execute(upsert_sql)

# Manual change detection
changed_items = []
for row in upsert_result:
    existing = existing_map.get(row.fld_text)
    if existing:
        if any(getattr(existing, col) != row[col] for col in ['fld_int1', 'fld_int2']):
            changed_items.append(row)
    else:
        changed_items.append(row)

This provides a fallback change detection mechanism independent of database features.

Final Recommendations

SQLite Version Alignment
Target SQLite 3.35+ in environments requiring precise UPSERT control with RETURNING clause support. Use PRAGMA user_version to enforce version checks during application startup.
Hybrid ORM/Raw SQL Approach
Maintain ORM usage for standard operations while strategically employing raw SQL for complex UPSERT patterns. Encapsulate these raw operations in dedicated repository classes to preserve code organization.
Comprehensive NULL Handling
Always use COALESCE or IFNULL in UPSERT SET clauses when NULL values might appear in input data, unless explicit NULL overwrites are part of business requirements.
Change Auditing
Implement database-level audit triggers alongside application-level change tracking to create redundant verification systems.
Performance Monitoring
Regularly analyze UPSERT query plans using EXPLAIN QUERY PLAN and adjust indexes accordingly. Watch for full table scans during conflict resolution.

By combining these SQL-level optimizations with careful application architecture, developers can achieve robust UPSERT functionality in SQLite that precisely controls update conditions, properly handles NULL values, and provides clear visibility into data changes – even when working within ORM constraints.

Conditional UPSERT Handling in SQLite: Avoiding Unnecessary Updates and NULL Overwrites

UPSERT Operations with Selective Updates and Insert Tracking

NULL Propagation Risks and ORM Limitations in Change Tracking

Strategic UPSERT Implementation with NULL Safety and ORM Integration

Step 1: Conditional Update with NULL Protection

Step 2: Handling NULL-Specific Comparisons

Step 3: ORM Workarounds for RETURNING Clause

Step 4: Version-Safe SQL Construction

Step 5: Manual Change Detection (Pre-RETURNING Fallback)

Step 6: Transactional Integrity in Bulk Operations

Step 7: Index Optimization for UPSERT Performance

Step 8: Monitoring Update Triggers

Step 9: JSON Extension Integration

Step 10: Application-Level Caching

Final Recommendations

Handling BLOBs in SQLite: Best Practices and Troubleshooting Guide

SQLite UPDATE Statement Not Updating Numeric Values Correctly

Resolving Syntax Errors When Assigning Column Values in SQLite INSERT Triggers

SQLite Error: ‘all VALUES must have the same number of terms’ in INSERT Statement

Trigger Implementation for Odometer Insertion in SQLite Rental Table

Storing and Retrieving Images in SQLite Using PHP: A Comprehensive Guide

Leave a Reply Cancel reply

UPSERT Operations with Selective Updates and Insert Tracking

NULL Propagation Risks and ORM Limitations in Change Tracking

Strategic UPSERT Implementation with NULL Safety and ORM Integration

Step 1: Conditional Update with NULL Protection

Step 2: Handling NULL-Specific Comparisons

Step 3: ORM Workarounds for RETURNING Clause

Step 4: Version-Safe SQL Construction

Step 5: Manual Change Detection (Pre-RETURNING Fallback)

Step 6: Transactional Integrity in Bulk Operations

Step 7: Index Optimization for UPSERT Performance

Step 8: Monitoring Update Triggers

Step 9: JSON Extension Integration

Step 10: Application-Level Caching

Final Recommendations

Related Guides

Leave a Reply Cancel reply