[Feature] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables

### Overview

This PR introduces native scan support for Apache Iceberg Copy-On-Write (COW) tables in Auron engine, enabling Auron to directly read Iceberg data files and accelerate query performance through the native execution engine.

### Design

#### Architecture Overview 
The implementation adopts the SPI (Service Provider Interface) extension mechanism with three core components:
```
Spark Scan → Detect → Validate → Convert → Native Execute
              (SPI)    (Support)  (Exec)      (JNI)
```

#### Core Modules

- IcebergConvertProvider
   - Implements `AuronConvertProvider` SPI interface
   - Auto-registered via `META-INF/services`
   - Checks Spark version compatibility (supports 3.4-4.0)
   - Provides configuration toggle: `spark.auron.enable.iceberg.scan`

- IcebergScanSupport
   - Determines if the scan is from Iceberg data source (class name check)
   - Uses reflection to access Iceberg's internal `SparkInputPartition` and `FileScanTask`
   - Performs multiple checks to determine native scan eligibility:
      -  Only supports COW tables (no delete files)
      -  Does not support metadata columns (`_file`, `_pos`, etc.)
      -  Only supports Parquet and ORC formats
      - Does not support residual filters (row-level filtering)
      - Does not support mixed file formats
      - Only supports Auron-compatible data types

- NativeIcebergTableScanExec
   - Extends `LeafExecNode` and `NativeSupports`
   - Converts Iceberg `FileScanTask` to Spark `FilePartition`
   - Generates Protobuf scan plans (`ParquetScanExecNode` or `OrcScanExecNode`)
   - Registers Hadoop FileSystem resources via JniBridge
   - Implements projection pushdown
   - Handles file splitting and coalescing for partitioned tables

#### Supported Features

- Currently Supported:
   - Full table scan on Iceberg COW tables
   - Parquet and ORC file formats
   - Projection pushdown (column pruning)
   - Partitioned table queries (partition filtering handled at Iceberg layer)
   - Empty table handling
   -  Configuration toggle: spark.auron.enable.iceberg.scan (default: enabled)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables #2015

Overview

Design

Architecture Overview

Core Modules

Supported Features

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Add Native Scan Support for Apache Iceberg Copy-On-Write Tables #2015

Description

Overview

Design

Architecture Overview

Core Modules

Supported Features

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions