Skip to content

[AURON #2015] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables.#2016

Open
slfan1989 wants to merge 7 commits intoapache:masterfrom
slfan1989:auron-2015
Open

[AURON #2015] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables.#2016
slfan1989 wants to merge 7 commits intoapache:masterfrom
slfan1989:auron-2015

Conversation

@slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Feb 18, 2026

Which issue does this PR close?

Closes #2015

Rationale for this change

This PR adds native scan support for Apache Iceberg Copy-On-Write (COW) tables to improve query performance. Currently, Auron lacks direct integration with Iceberg, forcing all Iceberg queries to use Spark's native execution path, missing opportunities for native engine acceleration.

Key Motivations:

  • Enable Auron's native execution engine to read Iceberg tables directly
  • Leverage native performance optimizations for Iceberg COW tables
  • Provide automatic fallback to Spark scan for unsupported scenarios
  • Lay the foundation for future Iceberg feature enhancements (MOR tables, pruning predicates, etc.)

What changes are included in this PR?

Core Implementation:

  • IcebergConvertProvider - SPI extension point that detects Iceberg scans and decides whether to use native execution
  • IcebergScanSupport - Decision logic that validates scan plans and checks for COW table eligibility
  • NativeIcebergTableScanExec - Native execution node that converts Iceberg FileScanTask to native scan plans

Build & Configuration:

  • Updated pom.xml with Iceberg version management and Maven enforcer rules
  • Modified auron-build.sh to support Iceberg build parameters
  • Added configuration option: spark.auron.enable.iceberg.scan (default: true)

Supported Features:

  • Iceberg COW tables (Parquet and ORC formats)
  • Projection pushdown (column pruning)
  • Partitioned and non-partitioned tables
  • Automatic fallback for unsupported scenarios

Version Support:

  • Spark: 3.4, 3.5, 4.0 only
  • Iceberg: 1.10.1 only (enforced by Maven)

Are there any user-facing changes?

No Breaking Changes: Existing functionality remains unchanged. Iceberg support is additive and disabled by default in unsupported scenarios.

How was this patch tested?

Unit & Integration Tests:

  • Added 10 integration test cases in AuronIcebergIntegrationSuite:
    • Simple COW table scan
    • Projection pushdown
    • Partitioned table with partition filter
    • Orc format support
    • Empty table handling
    • Residual filters fallback
    • Metadata columns fallback
    • Decimal type fallback
    • Delete files (MOR) fallback
    • Configuration toggle functionality

Test Environment:

  • Spark versions: 3.4.4, 3.5.8, 4.0.2
  • Iceberg version: 1.10.1
  • File formats: Parquet, ORC
  • Scala versions: 2.12, 2.13

…n-Write Tables.

Signed-off-by: slfan1989 <slfan1989@apache.org>
…n-Write Tables.

Signed-off-by: slfan1989 <slfan1989@apache.org>
…n-Write Tables.

Signed-off-by: slfan1989 <slfan1989@apache.org>
…n-Write Tables.

Signed-off-by: slfan1989 <slfan1989@apache.org>
…n-Write Tables.

Signed-off-by: slfan1989 <slfan1989@apache.org>
…n-Write Tables.

Signed-off-by: slfan1989 <slfan1989@apache.org>
…n-Write Tables.

Signed-off-by: slfan1989 <slfan1989@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables

1 participant

Comments