[SPARK-56030] [SQL] Normalize inner Project when below Project -> Filter by mihailoale-db · Pull Request #54859 · apache/spark

mihailoale-db · 2026-03-17T10:21:25Z

What changes were proposed in this pull request?

In this PR I propose to normalize inner Project when below Project -> Filter.

Why are the changes needed?

Right now

SELECT a.key, b.key AS key_b
FROM t1 b
FULL JOIN t2 a USING (key)
WHERE b.key IS NULL OR a.key IS NULL

throws LOGICAL_PLAN_COMPARISON_MISMATCH for analyzer dual-runs and normalizing would fix it.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added + existing tests.

Was this patch authored or co-authored using generative AI tooling?

Yes.

mihailoale-db · 2026-03-17T22:18:46Z

@dtenedor PTAL when you find time. Thanks!

mihailoale-db · 2026-03-18T12:03:47Z

@cloud-fan PTAL when you find time. Thanks!

cloud-fan · 2026-03-18T13:46:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/NormalizePlan.scala

      case project @ Project(_, innerAggregate: Aggregate) =>
        project.copy(child = normalizeAggregateListOrder(innerAggregate))

+      case project @ Project(_, filter @ Filter(_, innerProject: Project)) =>


can we make it more general? we can turn transformUpWithSubqueries into a manual top-down recursion. When we hit a project, we set a bool flag projectListOrderSensitive to false when invoking the recursive method, and we do not set projectListOrderSensitive back to true for a whitelist of plan nodes: Filter, Sample, Sort, etc.

I'm not sure about that. It's not just Filter, Sort, Sample and some small set of operators but all the operators that have override def output: Seq[Attribute] = child.output. I don't think that it's robust in a way that you can easily miss some and besides that if someone adds another operator there is 99% chance that they will forget to add it to the list. So I propose to go with this approach and think of a better solution in a background in order to avoid polluting NormalizePlan too much.

An allowlist solution is better than the current hardcoded single filter hack.

initial commit

15fd5c2

cloud-fan reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56030] [SQL] Normalize inner Project when below Project -> Filter#54859

[SPARK-56030] [SQL] Normalize inner Project when below Project -> Filter#54859
mihailoale-db wants to merge 1 commit intoapache:masterfrom
mihailoale-db:normalizeproject1

mihailoale-db commented Mar 17, 2026

Uh oh!

mihailoale-db commented Mar 17, 2026

Uh oh!

mihailoale-db commented Mar 18, 2026

Uh oh!

cloud-fan Mar 18, 2026 •

edited

Loading

Uh oh!

mihailoale-db Mar 18, 2026

Uh oh!

cloud-fan Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mihailoale-db commented Mar 17, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

mihailoale-db commented Mar 17, 2026

Uh oh!

mihailoale-db commented Mar 18, 2026

Uh oh!

cloud-fan Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mihailoale-db Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloud-fan Mar 18, 2026 •

edited

Loading