[AURON #2030] Add Native Scan Support for Apache Hudi Copy-On-Write Tables. by slfan1989 · Pull Request #2031 · apache/auron

slfan1989 · 2026-02-19T09:16:49Z

Which issue does this PR close?

Closes #2030

Rationale for this change

This PR adds native scan support for Hudi Copy-On-Write (COW) tables, enabling Auron to accelerate Hudi table reads by converting FileSourceScanExec operations to native Parquet/ORC scan implementations.

What changes are included in this PR?

1. New Module: `thirdparty/auron-hudi`

HudiConvertProvider: Implements AuronConvertProvider SPI to intercept and convert Hudi FileSourceScanExec to native scans
- Detects Hudi file formats (HoodieParquetFileFormat, HoodieOrcFileFormat)
- Converts to NativeParquetScanExec or NativeOrcScanExec
- Handles timestamp fallback logic automatically
HudiScanSupport: Core detection and validation logic
- File format recognition with NewHoodie* format rejection
- Table type resolution via multi-source metadata fallback:
  - Options → Catalog → .hoodie/hoodie.properties
- MOR table detection and rejection
- Time travel query detection (via as.of.instant, as.of.timestamp options)
- FileIndex class hierarchy verification

2. Configuration

Added spark.auron.enable.hudi.scan config option (default: true)
Respects existing Parquet/ORC timestamp scanning configurations
Runtime Spark version validation (3.0–3.5 only)

3. Build & Integration

Maven: New profile hudi-0.15 with enforcer rules
- Validates hudiEnabled=true property
- Restricts Spark to 3.0–3.5
- Pins Hudi version to 0.15.0
Build Script: Enhanced auron-build.sh
- Added --hudi <VERSION> parameter
- Version compatibility validation
- Auto-enables hudiEnabled property
CI/CD: New workflow .github/workflows/hudi.yml
- Matrix testing: Spark 3.0–3.5 × JDK 8/17/21 × Scala 2.12
- Independent Hudi test pipeline

Are there any user-facing changes?

New Configuration Option

// Enable Hudi native scan (enabled by default)
spark.conf.set("spark.auron.enable.hudi.scan", "true")

How was this patch tested?

Add Junit Test.

…rite Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

slfan1989 · 2026-02-19T11:03:49Z

Spark 3.0 and 3.1 don’t support time travel yet, so I’ll revise the unit tests accordingly.

[AURON apache#2030] Add Native Scan Support for Apache Hudi Copy-On-W…

f84a18b

…rite Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

github-actions bot added infra spark build dev-tools labels Feb 19, 2026

[AURON apache#2030] Add Native Scan Support for Apache Hudi Copy-On-W…

030ade0

…rite Tables. Signed-off-by: slfan1989 <slfan1989@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AURON #2030] Add Native Scan Support for Apache Hudi Copy-On-Write Tables.#2031

[AURON #2030] Add Native Scan Support for Apache Hudi Copy-On-Write Tables.#2031
slfan1989 wants to merge 2 commits intoapache:masterfrom
slfan1989:auron-2030

slfan1989 commented Feb 19, 2026

Uh oh!

slfan1989 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

slfan1989 commented Feb 19, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

1. New Module: thirdparty/auron-hudi

2. Configuration

3. Build & Integration

Are there any user-facing changes?

New Configuration Option

How was this patch tested?

Uh oh!

slfan1989 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

1. New Module: `thirdparty/auron-hudi`