-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-49217: [C++][Parquet] Fix map type to preserve key-value metadata #49218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
|
Could you help review this? @pitrou @adamreeve @mapleFU |
| case ::arrow::Type::MAP: | ||
| if (origin_type.id() == ::arrow::Type::MAP) { | ||
| const bool keys_sorted = | ||
| checked_cast<const ::arrow::MapType&>(origin_type).keys_sorted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to MapToSchemaField, is keys_sorted always false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's why I made it sorted in this test case.
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to believe it doesn't have mapping...
| restored_schema->field(0)->type()); | ||
| ASSERT_EQ(GetFieldId(*restored_schema->field(0)), 99); | ||
|
|
||
| ASSERT_TRUE(restored_map->keys_sorted()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just call AssertTypeEqual(*sorted_map, *restored_map, /*check_metadata=*/true);?
It doesn't seem useful to spelunk the nested type hierarchy manually.
Rationale for this change
Previously, when converting Parquet schemas back to Arrow schemas with serialized ARROW:schema metadata, the key-value metadata on map nested fields (key/value fields) was lost. The GetNestedFactory() function lacked a MAP case, preventing ApplyOriginalStorageMetadata from recursively restoring metadata to map children. This fix adds MAP support to the factory function, enabling proper metadata preservation for maps during schema roundtrips.
What changes are included in this PR?
Added a MAP case to GetNestedFactory() that returns a lambda to reconstruct MapType while preserving the keys_sorted property from the original type.
Are these changes tested?
Yes. Added a test case to verify the issue has been fixed.
Are there any user-facing changes?
No.