[SPARK-56100][SQL] Fail the query when IN list has no left-hand column or expression in the where clause by mani2296 · Pull Request #54927 · apache/spark

mani2296 · 2026-03-20T19:36:42Z

What changes were proposed in this pull request?

This PR rejects invalid IN predicates where the parser never attaches a left-hand value expression (column or expression) before the IN list, and surfaces a single clear analysis error instead of allowing the query to run in a misleading way.

error-conditions.json: add INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN
FunctionRegistry: register builtin in with InPredicateExpressionBuilder so shapes that parse as in() or in() (e.g. WHERE in (1), WHERE IN (1, 2)) fail with that condition.
CheckAnalysis: handle Filter(In(UnresolvedAttribute("in"), …)) when in is an unquoted identifier used as the left side of IN (...).
QueryCompilationErrors: missingColumnBeforeInError(origin) throwing AnalysisException with the new condition and SQL origin.

Closes: https://issues.apache.org/jira/browse/SPARK-56100

Why are the changes needed?

WHERE clauses that omit the column or expression before IN—for example treating IN (1, 2) as if it were a standalone predicate—are not valid SQL. In Spark, those cases could still run successfully while not applying the filter the user thought they had written. That is easy to misread as “the query worked,” when in reality the IN list was not doing what the author assumed, which is a dangerous silent logical error rather than a loud failure. This PR fails analysis with a dedicated INVALID_SQL_SYNTAX subcondition so users immediately see that the predicate is malformed and how to write it correctly (e.g. WHERE id IN (1, 2)).

Does this PR introduce any user-facing change?

Yes

Before: For invalid IN usage with no column or expression on the left (e.g. WHERE IN (1, 2)), the query could finish without error, so it looked like the IN condition was applied when it was not, a misleading success.
After: The same invalid pattern fails at analysis time with an AnalysisException and condition INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN, plus a short explanation and examples.
Example:
SELECT * FROM t WHERE IN (1, 2);
-- After this PR: analysis fails with INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN

-- Intended form:
SELECT * FROM t WHERE id IN (1, 2);

How was this patch tested?

New unit tests in AnalysisErrorSuite: three cases using assertAnalysisErrorCondition / checkError so analyzer.checkAnalysis(analyzer.execute(plan)) fails with condition INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN and empty parameters (covers In(UnresolvedAttribute("in"), …), UnresolvedFunction("in", Seq(Literal)), and UnresolvedFunction("in", literals only)).

Was this patch authored or co-authored using generative AI tooling?

Yes, Generated-by: Cursor 2.6.18

…efore IN clause

mani2296 added 2 commits March 21, 2026 01:12

Change to throw error when column name is missed in where condition b…

e4f2a04

…efore IN clause

modified error clause as per guidelines

a3e232a

mani2296 force-pushed the master branch from 24ea408 to a3e232a Compare March 20, 2026 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56100][SQL] Fail the query when IN list has no left-hand column or expression in the where clause#54927

[SPARK-56100][SQL] Fail the query when IN list has no left-hand column or expression in the where clause#54927
mani2296 wants to merge 2 commits intoapache:masterfrom
mani2296:master

mani2296 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mani2296 commented Mar 20, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant