-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-49114: [C++][Parquet] Fix converting schema failure with deep nested two-level encoding list structure #49125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@pitrou @emkornfield @wgtmac Please take a look. |
b494f79 to
7176212
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Fixes Parquet-to-Arrow schema conversion for deep nested two-level encoding LIST structures (and related MAP-in-LIST cases) that previously errored as illegal.
Changes:
- Refactors list element resolution into a new
ResolveList()helper to better handle nested two-level LIST encodings. - Relaxes repetition constraints to allow LIST/MAP nodes to be
REPEATEDwhen they are elements of an enclosing LIST. - Adds/updates unit tests to cover deep nested two-level LISTs and two-level LIST-of-MAP schemas, and adjusts expected error text.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cpp/src/parquet/arrow/schema.cc |
Adjusts LIST/MAP schema conversion logic and introduces ResolveList() for nested/two-level handling. |
cpp/src/parquet/arrow/arrow_schema_test.cc |
Adds coverage for deep nested two-level LIST and allows previously-failing LIST-of-MAP case; updates an error expectation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d9eb0dc to
4f7558f
Compare
4f7558f to
145580f
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly look good to me. Thanks for fixing this!
cpp/src/parquet/arrow/schema.cc
Outdated
| } else if (list_group.field_count() == 1) { | ||
| const auto& repeated_field = list_group.field(0); | ||
| if (repeated_field->is_repeated()) { | ||
| RETURN_NOT_OK(check_two_level_list_repeated(group)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to validate that list_group has no logical type annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not in backward-compatibility rules.
| // optional binary value (STRING); | ||
| // } | ||
| // } | ||
| // } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add a case for the below which is still produced by parquet-java by default:
// optional group my_list (LIST) {
// repeated group array (MAP) {
// repeated group key_value (MAP_KEY_VALUE) {
// required binary key (STRING);
// optional binary value (STRING);
// }
// }
// }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not in backward-compatibility rules too. We may add it into document first.
efddcaa to
404a85e
Compare
|
Do you want to take a look? @pitrou @emkornfield @mapleFU |
Rationale for this change
Fix the failure when converting parquet schema with deep nested two-level encoding list structure to arrow schema.
What changes are included in this PR?
Modified the
ListToSchemaFieldandMapToSchemaFieldmethods.Are these changes tested?
Yes.
Are there any user-facing changes?
It allows some legal schemas that were once considered illegal.