Description
Running TPC-DS queries (q60–q69) with Spark 4.0 / JDK 21 triggers a native engine panic when WrappedSender::send attempts to send data into a closed channel. This panic escalates into a JVM crash with exit code 134.
Environment
- Spark Version: 4.0
- Scala Version: 2.13
- JDK Version: 21 (Temurin 21.0.10+7)
- Workload: TPC-DS q60–q69
- Date: 2026-02-18
Observed Behavior
Multiple threads panic at:
2026-02-18T06:17:59.2883677Z thread 'auron-native-stage-112-part-2-tid-136' panicked at native-engine/datafusion-ext-plans/src/common/execution_context.rs:723:35:
2026-02-18T06:17:59.2884876Z output_with_sender: send error: channel closed
2026-02-18T06:17:59.2885627Z stack backtrace:
2026-02-18T06:17:59.2886022Z 0: __rustc::rust_begin_unwind
2026-02-18T06:17:59.2886773Z at /rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/std/src/panicking.rs:697:5
2026-02-18T06:17:59.2887628Z 1: core::panicking::panic_fmt
2026-02-18T06:17:59.2888362Z at /rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/core/src/panicking.rs:75:14
2026-02-18T06:17:59.2889530Z 2: datafusion_ext_plans::common::execution_context::WrappedSender<T>::send::{{closure}}::{{closure}}
2026-02-18T06:17:59.2890682Z 3: datafusion_ext_plans::common::execution_context::WrappedSender<T>::send::{{closure}}
2026-02-18T06:17:59.2894606Z 4: datafusion_ext_plans::sort_exec::send_output_batch::{{closure}}
2026-02-18T06:17:59.2895716Z 5: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::future::future::Future>::poll
2026-02-18T06:17:59.2907421Z 6: datafusion_ext_plans::common::execution_context::ExecutionContext::output_with_sender_impl::{{closure}}
2026-02-18T06:17:59.2908462Z 7: tokio::runtime::task::core::Core<T,S>::poll
2026-02-18T06:17:59.2909093Z 8: tokio::runtime::task::harness::poll_future
2026-02-18T06:17:59.2910064Z 9: tokio::runtime::task::harness::Harness<T,S>::poll_inner
2026-02-18T06:17:59.2910769Z 10: tokio::runtime::task::harness::Harness<T,S>::poll
2026-02-18T06:17:59.2911509Z 11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
2026-02-18T06:17:59.2912330Z 12: tokio::runtime::scheduler::multi_thread::worker::Context::run
2026-02-18T06:17:59.2913064Z 13: tokio::runtime::context::scoped::Scoped<T>::set
2026-02-18T06:17:59.2913652Z 14: tokio::runtime::context::set_scheduler
2026-02-18T06:17:59.2914722Z 15: tokio::runtime::context::runtime::enter_runtime
2026-02-18T06:17:59.2920955Z 16: tokio::runtime::scheduler::multi_thread::worker::run
2026-02-18T06:17:59.2923760Z 17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
2026-02-18T06:17:59.2925198Z 18: tokio::runtime::task::core::Core<T,S>::poll
2026-02-18T06:17:59.2925774Z 19: tokio::runtime::task::harness::poll_future
2026-02-18T06:17:59.2926385Z 20: tokio::runtime::task::harness::Harness<T,S>::poll_inner
2026-02-18T06:17:59.2927010Z 21: tokio::runtime::task::raw::poll
2026-02-18T06:17:59.2927558Z 22: tokio::runtime::blocking::pool::Inner::run
2026-02-18T06:17:59.2928362Z note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Expected Behavior
When the downstream channel is closed (due to task cancellation or early completion), the native engine should gracefully drop the batch without panicking or crashing the JVM.
Root Cause
Race condition: producer task continues sending data while the receiver has already been closed/canceled. The current implementation panics on send failure instead of handling it gracefully.
Solution
Modified WrappedSender::send in execution_context.rs to gracefully handle channel closure:
- Check
send().await.is_err() instead of panicking
- Log debug message with context (partition_id, task_id, session_id)
- Return early without updating metrics when channel is closed
Description
Running TPC-DS queries (q60–q69) with Spark 4.0 / JDK 21 triggers a native engine panic when
WrappedSender::sendattempts to send data into a closed channel. This panic escalates into a JVM crash with exit code 134.Environment
Observed Behavior
Multiple threads panic at:
Expected Behavior
When the downstream channel is closed (due to task cancellation or early completion), the native engine should gracefully drop the batch without panicking or crashing the JVM.
Root Cause
Race condition: producer task continues sending data while the receiver has already been closed/canceled. The current implementation panics on send failure instead of handling it gracefully.
Solution
Modified
WrappedSender::sendinexecution_context.rsto gracefully handle channel closure:send().await.is_err()instead of panicking