Resolving SQLite Parser Ambiguity in INSERT..SELECT Queries with ON CONFLICT Clause


Unexpected Syntax Error Near "DO" in INSERT..SELECT..ON CONFLICT Statements

Issue Overview: Parsing Ambiguity Between JOIN ON and UPSERT ON CONFLICT Clauses

The core issue arises when combining an INSERT INTO ... SELECT ... FROM statement with the ON CONFLICT clause (UPSERT) in SQLite. The parser struggles to distinguish whether the ON keyword belongs to a JOIN operation within the SELECT query or marks the start of the UPSERT clause. This ambiguity triggers a misleading syntax error near the DO keyword, even though the problem originates from the placement of ON.

Example Scenario
Given a table:

CREATE TABLE t (i INT PRIMARY KEY, j INT);

The following valid-looking query fails:

INSERT INTO t (i, j)
SELECT 1, 2
FROM (VALUES (1))  -- Optional derived table
ON CONFLICT DO NOTHING;

Error Message:

SQL Error [1]: [SQLITE_ERROR] SQL error or missing database (near "do": syntax error)

Why This Happens
SQLite uses an LALR(1) parser, which processes tokens sequentially without extensive lookahead. When parsing the SELECT portion of the INSERT statement, the parser encounters FROM (VALUES (1)) followed by ON. At this point, it assumes ON introduces a JOIN condition (e.g., JOIN ... ON), even if no JOIN exists. Since no JOIN is present, the parser reaches CONFLICT DO NOTHING, interprets CONFLICT as an invalid identifier for a JOIN condition, and throws an error at DO.

Key Observations

  1. Superfluous FROM Clause: The FROM (VALUES (1)) is technically unnecessary here but demonstrates how even simple derived tables trigger the ambiguity.
  2. Workaround Effectiveness: Adding a WHERE true clause or removing the FROM clause forces the parser to treat ON as part of the UPSERT clause. For example:
    -- Works: WHERE true clarifies parser state
    INSERT INTO t (i, j)
    SELECT 1, 2
    FROM (VALUES (1))
    WHERE true
    ON CONFLICT DO NOTHING;
    

Root Causes: SQLite Parser Limitations and JOIN Syntax Flexibility

1. LALR(1) Parser Limitations

SQLite’s parser cannot backtrack or reevaluate token sequences after making decisions. When parsing INSERT..SELECT, it assumes the SELECT statement’s syntax is complete before reaching ON CONFLICT. If the SELECT includes a FROM clause without a WHERE, the parser expects ON to belong to a JOIN, even if none exists.

Example of Genuine Ambiguity
Consider a hypothetical query with a JOIN:

INSERT INTO t1
SELECT * FROM t2 JOIN t3
ON t2.x = t3.y  -- Ambiguous: Is this the JOIN condition or the UPSERT clause?
ON CONFLICT DO UPDATE SET ...;

The parser cannot determine whether the first ON belongs to the JOIN or the UPSERT. However, even in non-ambiguous cases (like the original example with no JOIN), the parser’s rigid state machine causes false positives.

2. SQLite’s Non-Standard JOIN Syntax

SQLite allows JOIN operations without explicit ON clauses and treats commas (,) as CROSS JOIN operators. This flexibility creates edge cases where ON might appear in unexpected contexts. For example:

-- Valid in SQLite but invalid in standard SQL
SELECT * FROM t1, t2 ON t1.x = t2.y;

Here, ON is interpreted as part of the implicit CROSS JOIN between t1 and t2. This non-standard behavior increases parsing complexity, making it harder to distinguish ON for UPSERT from ON for JOIN.

3. Misleading Error Messages

The error message near "do": syntax error stems from the parser’s internal state:

  1. After processing FROM (VALUES (1)), it expects a JOIN or WHERE clause.
  2. ON is interpreted as the start of a JOIN condition.
  3. CONFLICT is treated as a table or column alias, which is invalid without AS.
  4. DO then appears out of context, triggering the error.

The parser does not account for the possibility that ON could belong to the outer INSERT statement’s UPSERT clause.


Solutions: Resolving Parser Ambiguity and Writing Robust Queries

1. Add a Dummy WHERE Clause

Force the parser to finalize the SELECT statement’s structure by including WHERE true (or any tautology):

INSERT INTO t (i, j)
SELECT 1, 2
FROM (VALUES (1))
WHERE true  -- Resolves parsing ambiguity
ON CONFLICT DO NOTHING;

Why This Works
The WHERE clause signals the end of the SELECT’s FROM section, ensuring ON is parsed as part of the UPSERT clause.

2. Simplify the SELECT Statement

Remove unnecessary FROM clauses when possible:

INSERT INTO t (i, j)
SELECT 1, 2  -- No FROM clause
ON CONFLICT DO NOTHING;

3. Use Explicit JOIN Syntax

If the SELECT requires JOIN, add unambiguous ON conditions:

INSERT INTO t (i, j)
SELECT t1.a, t2.b
FROM t1
INNER JOIN t2 ON t1.id = t2.id  -- Explicit JOIN condition
WHERE true
ON CONFLICT DO NOTHING;

4. Avoid Reserved Keywords in Schema Designs

While not directly related to this issue, avoid using reserved keywords like CONFLICT as column or table names to reduce parsing conflicts.

5. Consult SQLite Documentation

Refer to SQLite’s UPSERT and INSERT documentation, which explicitly warn about this ambiguity:

"To avoid parsing ambiguity, the SELECT statement should always include a WHERE clause if the upsert-clause is present."

6. Upgrade SQLite Versions

Newer SQLite versions may improve error messages or parsing logic. Test with the latest version to see if the issue persists.

7. Use Alternative Tools for Validation

Tools like Facebook’s CG-SQL provide enhanced error checking and can highlight parser ambiguities during development.


Final Recommendations

  • Always include WHERE true in INSERT..SELECT statements with ON CONFLICT to avoid parser ambiguity.
  • Simplify queries by removing unnecessary FROM clauses or derived tables.
  • Review JOIN logic to ensure ON clauses are explicitly tied to their respective JOIN operations.

By understanding SQLite’s parser limitations and adhering to documented workarounds, developers can avoid unexpected syntax errors and write robust UPSERT queries.

Related Guides

Leave a Reply

Your email address will not be published. Required fields are marked *