Resolving SQLite Parser Ambiguity in INSERT..SELECT Queries with ON CONFLICT Clause
Unexpected Syntax Error Near "DO" in INSERT..SELECT..ON CONFLICT Statements
Issue Overview: Parsing Ambiguity Between JOIN ON and UPSERT ON CONFLICT Clauses
The core issue arises when combining an INSERT INTO ... SELECT ... FROM
statement with the ON CONFLICT
clause (UPSERT) in SQLite. The parser struggles to distinguish whether the ON
keyword belongs to a JOIN
operation within the SELECT
query or marks the start of the UPSERT clause. This ambiguity triggers a misleading syntax error near the DO
keyword, even though the problem originates from the placement of ON
.
Example Scenario
Given a table:
CREATE TABLE t (i INT PRIMARY KEY, j INT);
The following valid-looking query fails:
INSERT INTO t (i, j)
SELECT 1, 2
FROM (VALUES (1)) -- Optional derived table
ON CONFLICT DO NOTHING;
Error Message:
SQL Error [1]: [SQLITE_ERROR] SQL error or missing database (near "do": syntax error)
Why This Happens
SQLite uses an LALR(1) parser, which processes tokens sequentially without extensive lookahead. When parsing the SELECT
portion of the INSERT
statement, the parser encounters FROM (VALUES (1))
followed by ON
. At this point, it assumes ON
introduces a JOIN
condition (e.g., JOIN ... ON
), even if no JOIN
exists. Since no JOIN
is present, the parser reaches CONFLICT DO NOTHING
, interprets CONFLICT
as an invalid identifier for a JOIN
condition, and throws an error at DO
.
Key Observations
- Superfluous
FROM
Clause: TheFROM (VALUES (1))
is technically unnecessary here but demonstrates how even simple derived tables trigger the ambiguity. - Workaround Effectiveness: Adding a
WHERE true
clause or removing theFROM
clause forces the parser to treatON
as part of the UPSERT clause. For example:-- Works: WHERE true clarifies parser state INSERT INTO t (i, j) SELECT 1, 2 FROM (VALUES (1)) WHERE true ON CONFLICT DO NOTHING;
Root Causes: SQLite Parser Limitations and JOIN Syntax Flexibility
1. LALR(1) Parser Limitations
SQLite’s parser cannot backtrack or reevaluate token sequences after making decisions. When parsing INSERT..SELECT
, it assumes the SELECT
statement’s syntax is complete before reaching ON CONFLICT
. If the SELECT
includes a FROM
clause without a WHERE
, the parser expects ON
to belong to a JOIN
, even if none exists.
Example of Genuine Ambiguity
Consider a hypothetical query with a JOIN
:
INSERT INTO t1
SELECT * FROM t2 JOIN t3
ON t2.x = t3.y -- Ambiguous: Is this the JOIN condition or the UPSERT clause?
ON CONFLICT DO UPDATE SET ...;
The parser cannot determine whether the first ON
belongs to the JOIN
or the UPSERT. However, even in non-ambiguous cases (like the original example with no JOIN
), the parser’s rigid state machine causes false positives.
2. SQLite’s Non-Standard JOIN Syntax
SQLite allows JOIN
operations without explicit ON
clauses and treats commas (,
) as CROSS JOIN
operators. This flexibility creates edge cases where ON
might appear in unexpected contexts. For example:
-- Valid in SQLite but invalid in standard SQL
SELECT * FROM t1, t2 ON t1.x = t2.y;
Here, ON
is interpreted as part of the implicit CROSS JOIN
between t1
and t2
. This non-standard behavior increases parsing complexity, making it harder to distinguish ON
for UPSERT from ON
for JOIN
.
3. Misleading Error Messages
The error message near "do": syntax error
stems from the parser’s internal state:
- After processing
FROM (VALUES (1))
, it expects aJOIN
orWHERE
clause. ON
is interpreted as the start of aJOIN
condition.CONFLICT
is treated as a table or column alias, which is invalid withoutAS
.DO
then appears out of context, triggering the error.
The parser does not account for the possibility that ON
could belong to the outer INSERT
statement’s UPSERT clause.
Solutions: Resolving Parser Ambiguity and Writing Robust Queries
1. Add a Dummy WHERE Clause
Force the parser to finalize the SELECT
statement’s structure by including WHERE true
(or any tautology):
INSERT INTO t (i, j)
SELECT 1, 2
FROM (VALUES (1))
WHERE true -- Resolves parsing ambiguity
ON CONFLICT DO NOTHING;
Why This Works
The WHERE
clause signals the end of the SELECT
’s FROM
section, ensuring ON
is parsed as part of the UPSERT clause.
2. Simplify the SELECT Statement
Remove unnecessary FROM
clauses when possible:
INSERT INTO t (i, j)
SELECT 1, 2 -- No FROM clause
ON CONFLICT DO NOTHING;
3. Use Explicit JOIN Syntax
If the SELECT
requires JOIN
, add unambiguous ON
conditions:
INSERT INTO t (i, j)
SELECT t1.a, t2.b
FROM t1
INNER JOIN t2 ON t1.id = t2.id -- Explicit JOIN condition
WHERE true
ON CONFLICT DO NOTHING;
4. Avoid Reserved Keywords in Schema Designs
While not directly related to this issue, avoid using reserved keywords like CONFLICT
as column or table names to reduce parsing conflicts.
5. Consult SQLite Documentation
Refer to SQLite’s UPSERT and INSERT documentation, which explicitly warn about this ambiguity:
"To avoid parsing ambiguity, the SELECT statement should always include a WHERE clause if the upsert-clause is present."
6. Upgrade SQLite Versions
Newer SQLite versions may improve error messages or parsing logic. Test with the latest version to see if the issue persists.
7. Use Alternative Tools for Validation
Tools like Facebook’s CG-SQL provide enhanced error checking and can highlight parser ambiguities during development.
Final Recommendations
- Always include
WHERE true
inINSERT..SELECT
statements withON CONFLICT
to avoid parser ambiguity. - Simplify queries by removing unnecessary
FROM
clauses or derived tables. - Review JOIN logic to ensure
ON
clauses are explicitly tied to their respectiveJOIN
operations.
By understanding SQLite’s parser limitations and adhering to documented workarounds, developers can avoid unexpected syntax errors and write robust UPSERT queries.