Skip to content

[SPARK-56100][SQL] Fail the query when IN list has no left-hand column or expression in the where clause#54927

Open
mani2296 wants to merge 2 commits intoapache:masterfrom
mani2296:master
Open

[SPARK-56100][SQL] Fail the query when IN list has no left-hand column or expression in the where clause#54927
mani2296 wants to merge 2 commits intoapache:masterfrom
mani2296:master

Conversation

@mani2296
Copy link

What changes were proposed in this pull request?

This PR rejects invalid IN predicates where the parser never attaches a left-hand value expression (column or expression) before the IN list, and surfaces a single clear analysis error instead of allowing the query to run in a misleading way.

  • error-conditions.json: add INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN 
  • FunctionRegistry: register builtin in with InPredicateExpressionBuilder so shapes that parse as in() or in() (e.g. WHERE in (1), WHERE IN (1, 2)) fail with that condition.
  • CheckAnalysis: handle Filter(In(UnresolvedAttribute("in"), …)) when in is an unquoted identifier used as the left side of IN (...).
  • QueryCompilationErrors: missingColumnBeforeInError(origin) throwing AnalysisException with the new condition and SQL origin.

Closes: https://issues.apache.org/jira/browse/SPARK-56100

Why are the changes needed?

WHERE clauses that omit the column or expression before IN—for example treating IN (1, 2) as if it were a standalone predicate—are not valid SQL. In Spark, those cases could still run successfully while not applying the filter the user thought they had written. That is easy to misread as “the query worked,” when in reality the IN list was not doing what the author assumed, which is a dangerous silent logical error rather than a loud failure. This PR fails analysis with a dedicated INVALID_SQL_SYNTAX subcondition so users immediately see that the predicate is malformed and how to write it correctly (e.g. WHERE id IN (1, 2)).

Does this PR introduce any user-facing change?

Yes

Before: For invalid IN usage with no column or expression on the left (e.g. WHERE IN (1, 2)), the query could finish without error, so it looked like the IN condition was applied when it was not, a misleading success.
After: The same invalid pattern fails at analysis time with an AnalysisException and condition INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN, plus a short explanation and examples.
Example:
SELECT * FROM t WHERE IN (1, 2);
-- After this PR: analysis fails with INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN

-- Intended form:
SELECT * FROM t WHERE id IN (1, 2);

How was this patch tested?

New unit tests in AnalysisErrorSuite: three cases using assertAnalysisErrorCondition / checkError so analyzer.checkAnalysis(analyzer.execute(plan)) fails with condition INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN and empty parameters (covers In(UnresolvedAttribute("in"), …), UnresolvedFunction("in", Seq(Literal)), and UnresolvedFunction("in", literals only)).

Was this patch authored or co-authored using generative AI tooling?

Yes, Generated-by: Cursor 2.6.18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant