[AURON #2183] Implement native support for ORC InsertIntoHiveTable writes#2191
Open
weimingdiit wants to merge 2 commits intoapache:masterfrom
Open
[AURON #2183] Implement native support for ORC InsertIntoHiveTable writes#2191weimingdiit wants to merge 2 commits intoapache:masterfrom
weimingdiit wants to merge 2 commits intoapache:masterfrom
Conversation
4682ae0 to
65f7ae7
Compare
cxzl25
reviewed
Apr 13, 2026
65f7ae7 to
bc8c720
Compare
…ble writes Signed-off-by: weimingdiit <weimingdiit@gmail.com>
bc8c720 to
28946b0
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Implements native-engine execution for Hive InsertIntoHiveTable writes targeting ORC tables, reducing fallbacks to Spark’s non-native write path.
Changes:
- Added Spark-side ORC native write physical operators (native ORC insert + ORC sink) and planner conversion support.
- Added native-engine ORC sink execution plan plus proto/planner/runtime/JNI wiring.
- Added Hive-focused execution tests to cover ORC insert conversion and execution (including dynamic partitions).
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeOrcSinkBase.scala | Spark-side native ORC sink node building a native plan with ORC properties + schema. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeOrcInsertIntoHiveTableBase.scala | Base Spark exec for ORC InsertIntoHiveTable rewrite using a dummy output format and per-task context. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/ConvertToNativeBase.scala | Allows converting columnar children via executeColumnar() to avoid casting failures. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/arrowio/ArrowFFIExporter.scala | Accepts mixed InternalRow / ColumnarBatch input and normalizes to row iteration. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/Shims.scala | Adds shims hooks for native ORC insert + ORC sink creation. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala | Adds ORC write plan conversion, ORC write schema/type support checks, and per-format write toggles. |
| spark-extension/src/main/java/org/apache/spark/sql/execution/auron/plan/NativeOrcSinkUtils.java | JNI entrypoints for task output-path handoff and completion stats reporting. |
| spark-extension/src/main/java/org/apache/auron/spark/configuration/SparkAuronConfiguration.java | Adds config toggles for Parquet/ORC data writing conversions. |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/exec/AuronHiveExecSuite.scala | Adds conversion + execution tests for native ORC InsertIntoHiveTable (dynamic/static partitions). |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/exec/AuronExecSuite.scala | Minor formatting-only change. |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/BaseAuronHiveSQLSuite.scala | Introduces Hive-enabled base test suite configuration. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeOrcSinkExec.scala | Spark shims exec wrapper for NativeOrcSinkBase. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeOrcInsertIntoHiveTableExec.scala | Spark shims exec wrapper for ORC InsertIntoHiveTable across Spark versions. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/auron/ShimsImpl.scala | Wires shims implementations for native ORC insert + ORC sink. |
| native-engine/datafusion-ext-plans/src/orc_sink_exec.rs | Implements native ORC sink execution (dynamic partitioning, schema adaptation, metrics, JNI hooks). |
| native-engine/datafusion-ext-plans/src/lib.rs | Exposes orc_sink_exec module. |
| native-engine/auron/src/rt.rs | Excludes ORC sink exec from output stream coalescing. |
| native-engine/auron-planner/src/planner.rs | Adds planner mapping from proto to OrcSinkExec. |
| native-engine/auron-planner/proto/auron.proto | Adds OrcSinkExecNode + OrcProp proto definitions. |
| native-engine/auron-jni-bridge/src/jni_bridge.rs | Registers Java class/method bindings for NativeOrcSinkUtils. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: weimingdiit <weimingdiit@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2183
Rationale for this change
Auron already supports native Parquet InsertIntoHiveTable writes, but ORC Hive writes still fall back to Spark’s regular execution path. This leaves native write coverage incomplete for a common Hive storage format.
This PR adds native support for ORC InsertIntoHiveTable writes so eligible Hive ORC write workloads can stay on the native path instead of falling back.
What changes are included in this PR?
This PR:
Are there any user-facing changes?
Yes.
Hive table writes using ORC may now remain on the native execution path when they match the supported InsertIntoHiveTable write pattern, instead of falling back to Spark’s regular write execution.
How was this patch tested?
CI.