Resolving SQLite Parser Ambiguity in INSERT..SELECT Queries with ON CONFLICT Clause
Unexpected Syntax Error Near "DO" in INSERT..SELECT..ON CONFLICT Statements
Issue Overview: Parsing Ambiguity Between JOIN ON and UPSERT ON CONFLICT Clauses
The core issue arises when combining an INSERT INTO ... SELECT ... FROM statement with the ON CONFLICT clause (UPSERT) in SQLite. The parser struggles to distinguish whether the ON keyword belongs to a JOIN operation within the SELECT query or marks the start of the UPSERT clause. This ambiguity triggers a misleading syntax error near the DO keyword, even though the problem originates from the placement of ON.
Example Scenario
Given a table:
CREATE TABLE t (i INT PRIMARY KEY, j INT);
The following valid-looking query fails:
INSERT INTO t (i, j)
SELECT 1, 2
FROM (VALUES (1)) -- Optional derived table
ON CONFLICT DO NOTHING;
Error Message:
SQL Error [1]: [SQLITE_ERROR] SQL error or missing database (near "do": syntax error)
Why This Happens
SQLite uses an LALR(1) parser, which processes tokens sequentially without extensive lookahead. When parsing the SELECT portion of the INSERT statement, the parser encounters FROM (VALUES (1)) followed by ON. At this point, it assumes ON introduces a JOIN condition (e.g., JOIN ... ON), even if no JOIN exists. Since no JOIN is present, the parser reaches CONFLICT DO NOTHING, interprets CONFLICT as an invalid identifier for a JOIN condition, and throws an error at DO.
Key Observations
- Superfluous
FROMClause: TheFROM (VALUES (1))is technically unnecessary here but demonstrates how even simple derived tables trigger the ambiguity. - Workaround Effectiveness: Adding a
WHERE trueclause or removing theFROMclause forces the parser to treatONas part of the UPSERT clause. For example:-- Works: WHERE true clarifies parser state INSERT INTO t (i, j) SELECT 1, 2 FROM (VALUES (1)) WHERE true ON CONFLICT DO NOTHING;
Root Causes: SQLite Parser Limitations and JOIN Syntax Flexibility
1. LALR(1) Parser Limitations
SQLite’s parser cannot backtrack or reevaluate token sequences after making decisions. When parsing INSERT..SELECT, it assumes the SELECT statement’s syntax is complete before reaching ON CONFLICT. If the SELECT includes a FROM clause without a WHERE, the parser expects ON to belong to a JOIN, even if none exists.
Example of Genuine Ambiguity
Consider a hypothetical query with a JOIN:
INSERT INTO t1
SELECT * FROM t2 JOIN t3
ON t2.x = t3.y -- Ambiguous: Is this the JOIN condition or the UPSERT clause?
ON CONFLICT DO UPDATE SET ...;
The parser cannot determine whether the first ON belongs to the JOIN or the UPSERT. However, even in non-ambiguous cases (like the original example with no JOIN), the parser’s rigid state machine causes false positives.
2. SQLite’s Non-Standard JOIN Syntax
SQLite allows JOIN operations without explicit ON clauses and treats commas (,) as CROSS JOIN operators. This flexibility creates edge cases where ON might appear in unexpected contexts. For example:
-- Valid in SQLite but invalid in standard SQL
SELECT * FROM t1, t2 ON t1.x = t2.y;
Here, ON is interpreted as part of the implicit CROSS JOIN between t1 and t2. This non-standard behavior increases parsing complexity, making it harder to distinguish ON for UPSERT from ON for JOIN.
3. Misleading Error Messages
The error message near "do": syntax error stems from the parser’s internal state:
- After processing
FROM (VALUES (1)), it expects aJOINorWHEREclause. ONis interpreted as the start of aJOINcondition.CONFLICTis treated as a table or column alias, which is invalid withoutAS.DOthen appears out of context, triggering the error.
The parser does not account for the possibility that ON could belong to the outer INSERT statement’s UPSERT clause.
Solutions: Resolving Parser Ambiguity and Writing Robust Queries
1. Add a Dummy WHERE Clause
Force the parser to finalize the SELECT statement’s structure by including WHERE true (or any tautology):
INSERT INTO t (i, j)
SELECT 1, 2
FROM (VALUES (1))
WHERE true -- Resolves parsing ambiguity
ON CONFLICT DO NOTHING;
Why This Works
The WHERE clause signals the end of the SELECT’s FROM section, ensuring ON is parsed as part of the UPSERT clause.
2. Simplify the SELECT Statement
Remove unnecessary FROM clauses when possible:
INSERT INTO t (i, j)
SELECT 1, 2 -- No FROM clause
ON CONFLICT DO NOTHING;
3. Use Explicit JOIN Syntax
If the SELECT requires JOIN, add unambiguous ON conditions:
INSERT INTO t (i, j)
SELECT t1.a, t2.b
FROM t1
INNER JOIN t2 ON t1.id = t2.id -- Explicit JOIN condition
WHERE true
ON CONFLICT DO NOTHING;
4. Avoid Reserved Keywords in Schema Designs
While not directly related to this issue, avoid using reserved keywords like CONFLICT as column or table names to reduce parsing conflicts.
5. Consult SQLite Documentation
Refer to SQLite’s UPSERT and INSERT documentation, which explicitly warn about this ambiguity:
"To avoid parsing ambiguity, the SELECT statement should always include a WHERE clause if the upsert-clause is present."
6. Upgrade SQLite Versions
Newer SQLite versions may improve error messages or parsing logic. Test with the latest version to see if the issue persists.
7. Use Alternative Tools for Validation
Tools like Facebook’s CG-SQL provide enhanced error checking and can highlight parser ambiguities during development.
Final Recommendations
- Always include
WHERE trueinINSERT..SELECTstatements withON CONFLICTto avoid parser ambiguity. - Simplify queries by removing unnecessary
FROMclauses or derived tables. - Review JOIN logic to ensure
ONclauses are explicitly tied to their respectiveJOINoperations.
By understanding SQLite’s parser limitations and adhering to documented workarounds, developers can avoid unexpected syntax errors and write robust UPSERT queries.