-
Notifications
You must be signed in to change notification settings - Fork 209
Open
Description
Overview
This PR introduces native scan support for Apache Iceberg Copy-On-Write (COW) tables in Auron engine, enabling Auron to directly read Iceberg data files and accelerate query performance through the native execution engine.
Design
Architecture Overview
The implementation adopts the SPI (Service Provider Interface) extension mechanism with three core components:
Spark Scan → Detect → Validate → Convert → Native Execute
(SPI) (Support) (Exec) (JNI)
Core Modules
-
IcebergConvertProvider
- Implements
AuronConvertProviderSPI interface - Auto-registered via
META-INF/services - Checks Spark version compatibility (supports 3.4-4.0)
- Provides configuration toggle:
spark.auron.enable.iceberg.scan
- Implements
-
IcebergScanSupport
- Determines if the scan is from Iceberg data source (class name check)
- Uses reflection to access Iceberg's internal
SparkInputPartitionandFileScanTask - Performs multiple checks to determine native scan eligibility:
- Only supports COW tables (no delete files)
- Does not support metadata columns (
_file,_pos, etc.) - Only supports Parquet and ORC formats
- Does not support residual filters (row-level filtering)
- Does not support mixed file formats
- Only supports Auron-compatible data types
-
NativeIcebergTableScanExec
- Extends
LeafExecNodeandNativeSupports - Converts Iceberg
FileScanTaskto SparkFilePartition - Generates Protobuf scan plans (
ParquetScanExecNodeorOrcScanExecNode) - Registers Hadoop FileSystem resources via JniBridge
- Implements projection pushdown
- Handles file splitting and coalescing for partitioned tables
- Extends
Supported Features
- Currently Supported:
- Full table scan on Iceberg COW tables
- Parquet and ORC file formats
- Projection pushdown (column pruning)
- Partitioned table queries (partition filtering handled at Iceberg layer)
- Empty table handling
- Configuration toggle: spark.auron.enable.iceberg.scan (default: enabled)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels