Summary
The rewrite_position_delete_files procedure fails with a ValidationException when run on tables that have array columns containing primitive fields. This is a regression introduced in Iceberg 1.10.0.
Error
org.apache.iceberg.exceptions.ValidationException: Invalid partition field parent: list<struct<5: value: optional long, 6: count: optional int>>
at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:674)
at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:658)
at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:514)
at org.apache.iceberg.PartitionSpec$Builder.identity(PartitionSpec.java:542)
at org.apache.iceberg.expressions.ExpressionUtil.lambda$identitySpec$5(ExpressionUtil.java:745)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.apache.iceberg.expressions.ExpressionUtil.identitySpec(ExpressionUtil.java:744)
at org.apache.iceberg.expressions.ExpressionUtil.extractByIdInclusive(ExpressionUtil.java:275)
at org.apache.iceberg.spark.source.PositionDeletesRowReader.open(PositionDeletesRowReader.java:95)
Root Cause
Commit 9fb80b7 added validation in PartitionSpec.checkCompatibility() that partition field parents must be StructType.
When reading position deletes, ExpressionUtil.nonConstantFieldIds() collects ALL primitive field IDs from the table schema, including those nested inside arrays. Then ExpressionUtil.identitySpec() attempts to create identity partitions for these fields, which fails validation because the parent type is a list, not a struct.
Reproduction
Tables with array columns containing primitive fields trigger this bug:
CREATE TABLE test_table (
id BIGINT,
data STRING,
items ARRAY<STRUCT<value:BIGINT, count:INT>>
) USING iceberg
TBLPROPERTIES('format-version'='2', 'write.delete.mode'='merge-on-read');
INSERT INTO test_table VALUES
(1, 'a', array(named_struct('value', cast(10 as bigint), 'count', 1))),
(2, 'b', array(named_struct('value', cast(20 as bigint), 'count', 2)));
DELETE FROM test_table WHERE id = 1;
DELETE FROM test_table WHERE id = 2;
-- This fails with ValidationException
CALL system.rewrite_position_delete_files(table => 'test_table', options => map('rewrite-all','true'));
Fix
PR #15079
Environment
- Iceberg version: 1.10.0+ (regression from 1.9.x)
- Spark version: 3.5.x
- Table format version: 2
Workaround
Use Iceberg 1.9.x or earlier until this is fixed.
Summary
The
rewrite_position_delete_filesprocedure fails with aValidationExceptionwhen run on tables that have array columns containing primitive fields. This is a regression introduced in Iceberg 1.10.0.Error
Root Cause
Commit 9fb80b7 added validation in
PartitionSpec.checkCompatibility()that partition field parents must beStructType.When reading position deletes,
ExpressionUtil.nonConstantFieldIds()collects ALL primitive field IDs from the table schema, including those nested inside arrays. ThenExpressionUtil.identitySpec()attempts to create identity partitions for these fields, which fails validation because the parent type is a list, not a struct.Reproduction
Tables with array columns containing primitive fields trigger this bug:
Fix
PR #15079
Environment
Workaround
Use Iceberg 1.9.x or earlier until this is fixed.