Skip to content

[BUG] Native engine panic on closed channel causes JVM crash #2022

@slfan1989

Description

@slfan1989

Description

Running TPC-DS queries (q60–q69) with Spark 4.0 / JDK 21 triggers a native engine panic when WrappedSender::send attempts to send data into a closed channel. This panic escalates into a JVM crash with exit code 134.

Environment

  • Spark Version: 4.0
  • Scala Version: 2.13
  • JDK Version: 21 (Temurin 21.0.10+7)
  • Workload: TPC-DS q60–q69
  • Date: 2026-02-18

Observed Behavior

Multiple threads panic at:

2026-02-18T06:17:59.2883677Z thread 'auron-native-stage-112-part-2-tid-136' panicked at native-engine/datafusion-ext-plans/src/common/execution_context.rs:723:35:
2026-02-18T06:17:59.2884876Z output_with_sender: send error: channel closed
2026-02-18T06:17:59.2885627Z stack backtrace:
2026-02-18T06:17:59.2886022Z    0: __rustc::rust_begin_unwind
2026-02-18T06:17:59.2886773Z              at /rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/std/src/panicking.rs:697:5
2026-02-18T06:17:59.2887628Z    1: core::panicking::panic_fmt
2026-02-18T06:17:59.2888362Z              at /rustc/50aa04180709189a03dde5fd1c05751b2625ed37/library/core/src/panicking.rs:75:14
2026-02-18T06:17:59.2889530Z    2: datafusion_ext_plans::common::execution_context::WrappedSender<T>::send::{{closure}}::{{closure}}
2026-02-18T06:17:59.2890682Z    3: datafusion_ext_plans::common::execution_context::WrappedSender<T>::send::{{closure}}
2026-02-18T06:17:59.2894606Z    4: datafusion_ext_plans::sort_exec::send_output_batch::{{closure}}
2026-02-18T06:17:59.2895716Z    5: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::future::future::Future>::poll
2026-02-18T06:17:59.2907421Z    6: datafusion_ext_plans::common::execution_context::ExecutionContext::output_with_sender_impl::{{closure}}
2026-02-18T06:17:59.2908462Z    7: tokio::runtime::task::core::Core<T,S>::poll
2026-02-18T06:17:59.2909093Z    8: tokio::runtime::task::harness::poll_future
2026-02-18T06:17:59.2910064Z    9: tokio::runtime::task::harness::Harness<T,S>::poll_inner
2026-02-18T06:17:59.2910769Z   10: tokio::runtime::task::harness::Harness<T,S>::poll
2026-02-18T06:17:59.2911509Z   11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
2026-02-18T06:17:59.2912330Z   12: tokio::runtime::scheduler::multi_thread::worker::Context::run
2026-02-18T06:17:59.2913064Z   13: tokio::runtime::context::scoped::Scoped<T>::set
2026-02-18T06:17:59.2913652Z   14: tokio::runtime::context::set_scheduler
2026-02-18T06:17:59.2914722Z   15: tokio::runtime::context::runtime::enter_runtime
2026-02-18T06:17:59.2920955Z   16: tokio::runtime::scheduler::multi_thread::worker::run
2026-02-18T06:17:59.2923760Z   17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
2026-02-18T06:17:59.2925198Z   18: tokio::runtime::task::core::Core<T,S>::poll
2026-02-18T06:17:59.2925774Z   19: tokio::runtime::task::harness::poll_future
2026-02-18T06:17:59.2926385Z   20: tokio::runtime::task::harness::Harness<T,S>::poll_inner
2026-02-18T06:17:59.2927010Z   21: tokio::runtime::task::raw::poll
2026-02-18T06:17:59.2927558Z   22: tokio::runtime::blocking::pool::Inner::run
2026-02-18T06:17:59.2928362Z note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Expected Behavior

When the downstream channel is closed (due to task cancellation or early completion), the native engine should gracefully drop the batch without panicking or crashing the JVM.

Root Cause

Race condition: producer task continues sending data while the receiver has already been closed/canceled. The current implementation panics on send failure instead of handling it gracefully.

Solution

Modified WrappedSender::send in execution_context.rs to gracefully handle channel closure:

  • Check send().await.is_err() instead of panicking
  • Log debug message with context (partition_id, task_id, session_id)
  • Return early without updating metrics when channel is closed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions