[SPARK-56100][SQL] Fail the query when IN list has no left-hand column or expression in the where clause#54927
Open
mani2296 wants to merge 2 commits intoapache:masterfrom
Open
[SPARK-56100][SQL] Fail the query when IN list has no left-hand column or expression in the where clause#54927mani2296 wants to merge 2 commits intoapache:masterfrom
mani2296 wants to merge 2 commits intoapache:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR rejects invalid IN predicates where the parser never attaches a left-hand value expression (column or expression) before the IN list, and surfaces a single clear analysis error instead of allowing the query to run in a misleading way.
Closes: https://issues.apache.org/jira/browse/SPARK-56100
Why are the changes needed?
WHERE clauses that omit the column or expression before IN—for example treating IN (1, 2) as if it were a standalone predicate—are not valid SQL. In Spark, those cases could still run successfully while not applying the filter the user thought they had written. That is easy to misread as “the query worked,” when in reality the IN list was not doing what the author assumed, which is a dangerous silent logical error rather than a loud failure. This PR fails analysis with a dedicated INVALID_SQL_SYNTAX subcondition so users immediately see that the predicate is malformed and how to write it correctly (e.g. WHERE id IN (1, 2)).
Does this PR introduce any user-facing change?
Yes
Before: For invalid IN usage with no column or expression on the left (e.g. WHERE IN (1, 2)), the query could finish without error, so it looked like the IN condition was applied when it was not, a misleading success.
After: The same invalid pattern fails at analysis time with an AnalysisException and condition INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN, plus a short explanation and examples.
Example:
SELECT * FROM t WHERE IN (1, 2);
-- After this PR: analysis fails with INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN
-- Intended form:
SELECT * FROM t WHERE id IN (1, 2);
How was this patch tested?
New unit tests in AnalysisErrorSuite: three cases using assertAnalysisErrorCondition / checkError so analyzer.checkAnalysis(analyzer.execute(plan)) fails with condition INVALID_SQL_SYNTAX.MISSING_COLUMN_BEFORE_IN and empty parameters (covers In(UnresolvedAttribute("in"), …), UnresolvedFunction("in", Seq(Literal)), and UnresolvedFunction("in", literals only)).
Was this patch authored or co-authored using generative AI tooling?
Yes, Generated-by: Cursor 2.6.18