Handling Concurrent Edits in Wiki-Style Revision Systems with SQLite

Issue Overview: Race Conditions in Revision-Based Editing Systems

The core challenge in implementing a wiki-style editing system with SQLite lies in preventing concurrent edits from overwriting each other’s changes. When two users submit edits to the same page nearly simultaneously, the database must ensure that only one revision is accepted as the latest, while the other user is notified of a conflict. The naive approach of checking the current revision before inserting a new one fails because the check and insertion are non-atomic operations, creating a race condition.

In this scenario, the application’s write function performs two steps:

  1. Check if the latest revision ID matches the one the user based their edit on.
  2. Insert a new revision if the check passes.

These steps are vulnerable to interleaving execution threads. For example:

  • User A reads revision #5 as the latest.
  • User B reads revision #5 as the latest.
  • User A inserts revision #6.
  • User B’s check still passes (since their local copy of the latest revision is #5), and they insert revision #7, overwriting User A’s changes.

This violates the desired behavior where User B should be notified of the conflict after User A’s edit is committed. The problem is exacerbated in web applications where HTTP requests are handled concurrently by multiple workers (e.g., Gunicorn with threaded or multiprocess workers).

Possible Causes: Non-Atomic Operations and Isolation Level Misconfigurations

1. Non-Atomic Check-and-Insert Workflow

The write function’s logic is split into separate database operations:

  • A SELECT query to fetch the latest revision.
  • An INSERT query to add a new revision.

Between these operations, other transactions can modify the Revision table, rendering the initial check invalid. SQLite’s default autocommit mode treats each statement as a transaction, so the SELECT and INSERT run in separate transactions unless explicitly wrapped in a transaction block.

2. Lack of Row-Level Locking in SQLite

SQLite uses a file-level lock for writes, which serializes all write transactions. However, this does not prevent race conditions in application logic where a transaction reads stale data. For example:

  • Transaction A reads revision #5.
  • Transaction B commits revision #6.
  • Transaction A proceeds to insert revision #7, unaware of Transaction B’s changes.

This occurs because SQLite’s SERIALIZABLE isolation level ensures transactional consistency but does not implicitly enforce application-level constraints like revision lineage.

3. Unique Constraint Limitations for Revision Lineage

A proposed solution involves adding a parent_id column to enforce a unique chain of revisions. However, the first revision for any page would have a NULL parent ID, violating uniqueness if multiple pages exist. Composite unique constraints (e.g., UNIQUE(page_id, parent_id)) partially address this but require careful handling of NULL values, which SQLite treats as distinct.

4. Misunderstanding of SELECT FOR UPDATE Semantics

SQLite does not support SELECT FOR UPDATE, which in other databases locks rows to prevent concurrent modifications. Developers often attempt to emulate this behavior with exclusive transactions, but without proper isolation, these attempts fail to prevent phantom reads or lost updates.

Troubleshooting Steps, Solutions & Fixes

Solution 1: Atomic Transactions with Pessimistic Locking

Step 1: Enclose Operations in an Explicit Transaction

Wrap the revision check and insertion in a single transaction using SQLAlchemy’s transaction management:

from sqlalchemy import text  

def write(page_slug, user_revision_id, new_content):  
    with session.begin():  # Starts a transaction  
        # Fetch the latest revision WITH exclusive lock (emulated)  
        latest_rev = session.execute(  
            text("SELECT id FROM revision WHERE page_id = "  
                 "(SELECT id FROM page WHERE slug = :slug) "  
                 "ORDER BY created_at DESC LIMIT 1"),  
            {"slug": page_slug}  
        ).scalar()  

        if latest_rev != user_revision_id:  
            raise ConflictError("Page has been modified by another user.")  

        # Insert new revision  
        new_rev = Revision(  
            page_id=page.id,  
            content=new_content,  
            parent_id=latest_rev  
        )  
        session.add(new_rev)  
    # Transaction commits here  

Key Details:

  • The WITHOUT ROWID clause is unnecessary here, but the explicit transaction ensures that the SELECT and INSERT are atomic.
  • SQLite’s file-level lock guarantees that no two write transactions occur concurrently, but read operations in other transactions may still see stale data.

Step 2: Use BEGIN EXCLUSIVE for Critical Sections

To emulate row-level locking, start an exclusive transaction before reading the latest revision:

session.execute(text("BEGIN EXCLUSIVE"))  
try:  
    latest_rev = get_latest_revision(page_slug)  
    if latest_rev != user_revision_id:  
        raise ConflictError()  
    create_new_revision(latest_rev)  
    session.commit()  
except:  
    session.rollback()  
    raise  

Trade-offs:

  • Pros: Eliminates race conditions by blocking all other database accesses during the transaction.
  • Cons: Reduces concurrency and may lead to timeouts or deadlocks under high load.

Solution 2: Optimistic Concurrency Control with Version Numbers

Step 1: Add a Version Column to the Page Table

Add an integer version column to the Page table, incremented on each revision:

ALTER TABLE page ADD COLUMN version INTEGER NOT NULL DEFAULT 0;  

Step 2: Update the Version During Revision Insertion

Modify the write function to check the version and increment it atomically:

def write(page_slug, user_version, new_content):  
    with session.begin():  
        page = session.execute(  
            text("SELECT id, version FROM page WHERE slug = :slug"),  
            {"slug": page_slug}  
        ).fetchone()  

        if page.version != user_version:  
            raise ConflictError()  

        # Insert new revision  
        new_rev = Revision(page_id=page.id, content=new_content)  
        session.add(new_rev)  

        # Increment page version  
        session.execute(  
            text("UPDATE page SET version = version + 1 WHERE id = :id"),  
            {"id": page.id}  
        )  

Key Details:

  • The version column acts as a guard against concurrent updates.
  • The UPDATE statement increments the version atomically, ensuring that only one transaction can succeed per version.

Solution 3: Linear Revision History with Unique Constraints

Step 1: Add a parent_id Column with Composite Uniqueness

Modify the Revision table to include parent_id and enforce uniqueness per page:

CREATE TABLE revision (  
    id INTEGER PRIMARY KEY,  
    page_id INTEGER NOT NULL,  
    parent_id INTEGER UNIQUE,  -- Allows one child per parent  
    created_at TIMESTAMP,  
    content TEXT,  
    FOREIGN KEY (page_id) REFERENCES page(id),  
    FOREIGN KEY (parent_id) REFERENCES revision(id),  
    -- Ensure only one NULL parent per page (see Step 2)  
    UNIQUE (page_id, parent_id)  
);  

Step 2: Handle Initial Revisions with a Sentinel Value

Replace NULL with a sentinel value (e.g., -1) for the first revision’s parent_id:

def create_page(slug, title, initial_content):  
    with session.begin():  
        page = Page(slug=slug, title=title)  
        session.add(page)  
        session.flush()  # Assigns page.id  

        initial_rev = Revision(  
            page_id=page.id,  
            content=initial_content,  
            parent_id=-1  # Sentinel  
        )  
        session.add(initial_rev)  

Key Details:

  • The UNIQUE (page_id, parent_id) constraint ensures linear history.
  • The sentinel value avoids NULL uniqueness issues.

Solution 4: Conflict Detection with Triggers

Step 1: Create a Trigger to Validate Parent Revisions

Add a trigger to ensure that a new revision’s parent is the current latest:

CREATE TRIGGER validate_revision_parent  
BEFORE INSERT ON revision  
FOR EACH ROW  
BEGIN  
    SELECT CASE  
        WHEN NEW.parent_id != (  
            SELECT id FROM revision  
            WHERE page_id = NEW.page_id  
            ORDER BY created_at DESC  
            LIMIT 1  
        )  
        THEN RAISE(ABORT, 'Parent revision is not the latest')  
    END;  
END;  

Key Details:

  • The trigger runs within the inserting transaction, ensuring atomicity.
  • Conflicts raise an error that rolls back the transaction, which the application must handle.

Solution 5: Application-Level Locking with Mutexes

Step 1: Use a Global Lock for Page Writes

Implement a per-page mutex to serialize writes:

from threading import Lock  

page_locks = defaultdict(Lock)  

def write(page_slug, ...):  
    with page_locks[page_slug]:  
        latest_rev = get_latest_revision(page_slug)  
        if latest_rev != user_revision_id:  
            raise ConflictError()  
        create_new_revision(...)  

Trade-offs:

  • Pros: Simple to implement in single-process environments.
  • Cons: Fails with multiple Gunicorn workers or distributed systems.

Step 2: Use Advisory Locks in SQLite

Utilize SQLite’s sqlite3_update_hook or PRAGMA locking_mode=EXCLUSIVE for cross-process locks.

Final Recommendations

  1. Optimistic Concurrency Control is ideal for low-contention scenarios.
  2. Exclusive Transactions suit high-contention environments but reduce throughput.
  3. Composite Unique Constraints with sentinel values enforce revision lineage without triggers.

By combining atomic transactions, version columns, and careful constraint design, developers can implement robust wiki-style editing in SQLite while avoiding race conditions.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *