Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new window_funnel_v2 aggregate function implementation and switches Nereids’ window_funnel(...) resolution to V2 by default, while keeping the legacy implementation available as window_funnel_v1. It also adds BE unit tests and Nereids regression coverage (plus updated golden outputs) for the new semantics.
Changes:
- Add FE/BE plumbing for
window_funnel_v2and map SQL namewindow_funnelto V2 (legacy exposed aswindow_funnel_v1). - Implement BE V2 aggregation state that stores only matched events and add BE unit tests.
- Add a Nereids regression suite for V2 and update affected regression golden outputs.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/nereids_p0/aggregate/window_funnel_v2.groovy | New Nereids regression suite covering V2 modes/behavior. |
| regression-test/data/nereids_p0/aggregate/window_funnel_v2.out | Golden output for the new V2 regression suite. |
| regression-test/data/nereids_p0/aggregate/window_funnel.out | Updated expected results due to window_funnel now resolving to V2. |
| regression-test/data/nereids_p0/sql_functions/aggregate_functions/test_aggregate_window_functions.out | Updated expected window-function output for window_funnel behavior change. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/types/AggStateType.java | Add mapping so window_funnel agg-state resolves to V2. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/visitor/AggregateFunctionVisitor.java | Add visitor hook for WindowFunnelV2. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/WindowFunnelV2.java | New Nereids expression class for window_funnel_v2. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/WindowFunnel.java | Rename legacy Nereids function name to window_funnel_v1. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/BuiltinAggregateFunctions.java | Register window_funnel_v1 and map window_funnel to window_funnel_v2. |
| be/src/exprs/aggregate/aggregate_function_window_funnel_v2.h | New BE V2 implementation and state storage format. |
| be/src/exprs/aggregate/aggregate_function_window_funnel_v2.cpp | Register BE window_funnel_v2 in the aggregate factory. |
| be/src/exprs/aggregate/aggregate_function_window_funnel.h | Make string_to_window_funnel_mode inline (ODR-safe for new header includes). |
| be/src/exprs/aggregate/aggregate_function_window_funnel.cpp | Register legacy implementation as window_funnel_v1 and alias window_funnel → window_funnel_v1. |
| be/src/exprs/aggregate/aggregate_function_simple_factory.cpp | Register new BE window_funnel_v2 function. |
| be/test/exprs/aggregate/vec_window_funnel_v2_test.cpp | New BE unit tests for the V2 aggregate function. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| void sort() { | ||
| if (!sorted) { | ||
| std::stable_sort(std::begin(events_list), std::end(events_list)); | ||
| sorted = true; | ||
| } |
There was a problem hiding this comment.
WindowFunnelStateV2::sort() uses std::stable_sort on std::pair<UInt64, UInt8> which orders by (timestamp, event_index) ascending. This defeats the intended “same timestamp in descending event_index order” (see the comment in add()), and can let a single input row that matches multiple conditions advance multiple funnel levels at the same timestamp (different from V1 row-based semantics). Use an explicit comparator that sorts by timestamp ascending and event_index descending (and update the sorted monotonicity check to use the same ordering).
| event_count = event_count > 0 ? event_count : other.event_count; | ||
| window = window > 0 ? window : other.window; | ||
| window_funnel_mode = window_funnel_mode == WindowFunnelMode::INVALID | ||
| ? other.window_funnel_mode | ||
| : window_funnel_mode; |
There was a problem hiding this comment.
WindowFunnelStateV2::merge() treats window as “unset” unless it’s > 0 (window = window > 0 ? window : other.window). A window value of 0 is valid (meaning events must occur at the same timestamp) but will be overwritten during merges, producing incorrect results for distributed/partial aggregation. Track whether window is initialized separately (or use a sentinel like window = -1 for unset) so that 0 is preserved.
| if (argument_types.size() < 3) { | ||
| LOG(WARNING) << "window_funnel_v2's argument less than 3."; | ||
| return nullptr; | ||
| } | ||
| if (argument_types[2]->get_primitive_type() == TYPE_DATETIMEV2) { | ||
| return creator_without_type::create<AggregateFunctionWindowFunnelV2>( | ||
| argument_types, result_is_nullable, attr); | ||
| } else { | ||
| LOG(WARNING) << "Only support DateTime type as window argument!"; | ||
| return nullptr; |
There was a problem hiding this comment.
create_aggregate_function_window_funnel_v2 allows argument_types.size() == 3, but the function requires at least 4 arguments (window, mode, timestamp, and >=1 boolean condition). With 3 args, event_count becomes 0 and the function silently returns 0 instead of rejecting invalid SQL. Update the arity check (and warning message) to require >=4 arguments.
| agg(WindowFunnel.class, "window_funnel_v1"), | ||
| agg(WindowFunnelV2.class, "window_funnel_v2", "window_funnel") |
There was a problem hiding this comment.
This change makes window_funnel resolve to window_funnel_v2 in Nereids (agg(WindowFunnelV2.class, ..., "window_funnel")) while keeping window_funnel_v1 available. Since V2 has different semantics (regression outputs change), this is a user-visible behavior change for existing queries using window_funnel. Consider gating the alias switch behind a session variable / config, or keeping window_funnel mapped to V1 until a deprecation window is completed.
| agg(WindowFunnel.class, "window_funnel_v1"), | |
| agg(WindowFunnelV2.class, "window_funnel_v2", "window_funnel") | |
| agg(WindowFunnel.class, "window_funnel_v1", "window_funnel"), | |
| agg(WindowFunnelV2.class, "window_funnel_v2") |
| agg(VarianceSamp.class, "var_samp", "variance_samp"), | ||
| agg(WindowFunnel.class, "window_funnel") | ||
| agg(WindowFunnel.class, "window_funnel_v1"), | ||
| agg(WindowFunnelV2.class, "window_funnel_v2", "window_funnel") |
There was a problem hiding this comment.
By remapping SQL name window_funnel to V2 in Nereids, the existing nereids_p0/aggregate/window_funnel regression suite now effectively tests V2 semantics and no longer provides coverage for V1 behavior. Consider adding regression coverage that explicitly calls window_funnel_v1 (at least for DEFAULT/DEDUP/FIXED/INCREASE) to ensure V1 remains stable during the transition.
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)