Skip to content

feat: Variant Support#2188

Open
c-thiel wants to merge 3 commits intoapache:mainfrom
c-thiel:feat/variant-support
Open

feat: Variant Support#2188
c-thiel wants to merge 3 commits intoapache:mainfrom
c-thiel:feat/variant-support

Conversation

@c-thiel
Copy link
Collaborator

@c-thiel c-thiel commented Feb 28, 2026

Which issue does this PR close?

Variant Support.
Arrow value support is currently missing as I am unsure how we want to extend Literal

What changes are included in this PR?

Core: Variant Type

  • crates/iceberg/src/spec/datatypes.rs — new Variant type
  • crates/iceberg/src/spec/values/literal.rsVariant literal value
  • crates/iceberg/src/spec/schema/ — visitor, index, pruning, mod, id reassigner all handle Variant
  • crates/iceberg/src/spec/table_metadata.rs — metadata support

Avro

  • crates/iceberg/src/avro/schema.rs — read/write Variant in Avro

Arrow

  • crates/iceberg/src/arrow/schema.rs — map Variant to Arrow type
  • crates/iceberg/src/arrow/reader.rs — read Variant from Arrow
  • crates/iceberg/src/arrow/value.rs — Arrow value conversion
  • Minor fixes in caching_delete_file_loader.rs and nan_val_cnt_visitor.rs

Parquet

  • crates/iceberg/src/writer/file_writer/parquet_writer.rs — write Variant columns

Tests & Dev

  • crates/integration_tests/tests/read_variant.rs — new integration test for reading Variant data
  • dev/spark/provision.py — Spark provisioning to generate Variant test data

Are these changes tested?

Sure! Even integration tested :)

let table_creation = TableCreation::builder()
.name(name.clone())
.schema(iceberg_schema)
.format_version(format_version)
Copy link
Collaborator Author

@c-thiel c-thiel Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change existing tests where rightfully failing as we used to create a V2 table with a NS Timestamp column:
https://github.com/apache/iceberg-rust/actions/runs/22522306915/job/65248930667

This new logic determines the min format version required and uses that - but at least V2. Thus we switch now to V3 for ns timestamps.

@brgr-s brgr-s mentioned this pull request Mar 4, 2026
@c-thiel
Copy link
Collaborator Author

c-thiel commented Mar 18, 2026

@CTTY @liurenjie1024 @Xuanwo this would be ready for review!

Copy link
Collaborator

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature! Just took a look.

Also the test seems to be failing

custom_attributes: Default::default(),
},
];
let mut schema = avro_record_schema(VARIANT_LOGICAL_TYPE, fields)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use a static string for every record. what if there are multiple variant columns, would the record name conflict?

// field ID resolves to Type::Variant and record all their sub-fields so
// the second filter_leaves can include them directly.
let mut variant_sub_fields: HashMap<FieldRef, i32> = HashMap::new();
for top_field in fields.iter() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if variant is nested within another type?

default: None,
custom_attributes: Default::default(),
},
];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be static?

Err(Error::new(
ErrorKind::FeatureUnsupported,
"Conversion from VariantType is not supported for Glue",
))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just return "variant".to_string(), on glue it would look like below:

{
  "data": {
    "unknown": "variant"
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants