Core: Fix rewrite_position_delete_files failure with array/map columns#15079
Core: Fix rewrite_position_delete_files failure with array/map columns#15079bk-mz wants to merge 6 commits intoapache:mainfrom
Conversation
… columns This test demonstrates a regression in Iceberg 1.10.0 where rewrite_position_delete_files fails with ValidationException when the table contains array columns with primitive fields. The error occurs because ExpressionUtil.identitySpec() creates a temporary partition spec from schema field IDs, and the new validation added in commit 9fb80b7 rejects fields whose parent type is a list.
ExpressionUtil.identitySpec() now skips fields that cannot be partition sources. Fields nested inside arrays or maps are filtered out by checking that all ancestor types are structs, matching the validation logic in PartitionSpec.checkCompatibility(). Fixes apache#15080
| } | ||
|
|
||
| @TestTemplate | ||
| public void testRewritePositionDeletesWithArrayColumns() throws Exception { |
There was a problem hiding this comment.
nit: Shall we also add a similar test for Map column?
|
Yea I think this code is suggested by @aokolnychyi , to be honest I am not sure what happens when we have a filter on a nested column. I am wondering, what happens when we have a filter like `where map_col = map('a', 1)', was it passed before 9fb80b7 as a residual filter to the file, or not? This will help me understand if we should do another approach like bypass the PartitionSpec check here We could potentially test this at the lower level TestPositionDeletesReader. |
|
@bk-mz Hi! |
|
@yogevyuval IDK, this PR is sound to me, but looks like maintainers want to do additional research? |
|
guys, this actually wont work here. The residualWithoutConstants now has no filters for map/array columns, which lead to wrong results for PositionDeletes metadata table. |
|
i propose a cleaner solution : #15632 |
Summary
Fixes
rewrite_position_delete_filesfailing withValidationException: Invalid partition field parentwhen tables contain array or map columns.Root Cause
ExpressionUtil.identitySpec()attempted to create identity partition fields for all primitive fields in the schema, including those nested inside arrays and maps. Since commit 9fb80b7,PartitionSpec.checkCompatibility()validates that partition field parents must be struct types, causing the failure.Fix
identitySpec()now filters out fields that cannot be partition sources by checking that all ancestor types are structs, usingTypeUtil.indexParents()- the same approach used inPartitionSpec.checkCompatibility().Testing
Added test case that reproduces the issue with a table containing
ARRAY<STRUCT<...>>column.Fixes #15080