[SPARK-56030] [SQL] Normalize inner Project when below Project -> Filter#54859
[SPARK-56030] [SQL] Normalize inner Project when below Project -> Filter#54859mihailoale-db wants to merge 1 commit intoapache:masterfrom
Conversation
|
@dtenedor PTAL when you find time. Thanks! |
|
@cloud-fan PTAL when you find time. Thanks! |
| case project @ Project(_, innerAggregate: Aggregate) => | ||
| project.copy(child = normalizeAggregateListOrder(innerAggregate)) | ||
|
|
||
| case project @ Project(_, filter @ Filter(_, innerProject: Project)) => |
There was a problem hiding this comment.
can we make it more general? we can turn transformUpWithSubqueries into a manual top-down recursion. When we hit a project, we set a bool flag projectListOrderSensitive to false when invoking the recursive method, and we do not set projectListOrderSensitive back to true for a whitelist of plan nodes: Filter, Sample, Sort, etc.
There was a problem hiding this comment.
I'm not sure about that. It's not just Filter, Sort, Sample and some small set of operators but all the operators that have override def output: Seq[Attribute] = child.output. I don't think that it's robust in a way that you can easily miss some and besides that if someone adds another operator there is 99% chance that they will forget to add it to the list. So I propose to go with this approach and think of a better solution in a background in order to avoid polluting NormalizePlan too much.
There was a problem hiding this comment.
An allowlist solution is better than the current hardcoded single filter hack.
What changes were proposed in this pull request?
In this PR I propose to normalize inner Project when below Project -> Filter.
Why are the changes needed?
Right now
throws
LOGICAL_PLAN_COMPARISON_MISMATCHfor analyzer dual-runs and normalizing would fix it.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added + existing tests.
Was this patch authored or co-authored using generative AI tooling?
Yes.