Skip to content

Conversation

@HuaHuaY
Copy link
Contributor

@HuaHuaY HuaHuaY commented Feb 3, 2026

Rationale for this change

Fix the failure when converting parquet schema with deep nested two-level encoding list structure to arrow schema.

What changes are included in this PR?

Modified the ListToSchemaField and MapToSchemaField methods.

Are these changes tested?

Yes.

Are there any user-facing changes?

It allows some legal schemas that were once considered illegal.

@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Feb 3, 2026

@pitrou @emkornfield @wgtmac Please take a look.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Parquet-to-Arrow schema conversion for deep nested two-level encoding LIST structures (and related MAP-in-LIST cases) that previously errored as illegal.

Changes:

  • Refactors list element resolution into a new ResolveList() helper to better handle nested two-level LIST encodings.
  • Relaxes repetition constraints to allow LIST/MAP nodes to be REPEATED when they are elements of an enclosing LIST.
  • Adds/updates unit tests to cover deep nested two-level LISTs and two-level LIST-of-MAP schemas, and adjusts expected error text.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
cpp/src/parquet/arrow/schema.cc Adjusts LIST/MAP schema conversion logic and introduces ResolveList() for nested/two-level handling.
cpp/src/parquet/arrow/arrow_schema_test.cc Adds coverage for deep nested two-level LIST and allows previously-failing LIST-of-MAP case; updates an error expectation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 4, 2026
@HuaHuaY HuaHuaY requested a review from Copilot February 4, 2026 11:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@HuaHuaY HuaHuaY force-pushed the fix_issue_49114 branch 2 times, most recently from d9eb0dc to 4f7558f Compare February 9, 2026 02:45
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly look good to me. Thanks for fixing this!

} else if (list_group.field_count() == 1) {
const auto& repeated_field = list_group.field(0);
if (repeated_field->is_repeated()) {
RETURN_NOT_OK(check_two_level_list_repeated(group));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to validate that list_group has no logical type annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not in backward-compatibility rules.

// optional binary value (STRING);
// }
// }
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add a case for the below which is still produced by parquet-java by default:

    // optional group my_list (LIST) {
    //   repeated group array (MAP) {
    //     repeated group key_value (MAP_KEY_VALUE) {
    //       required binary key (STRING);
    //       optional binary value (STRING);
    //     }
    //   }
    // }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not in backward-compatibility rules too. We may add it into document first.

@wgtmac
Copy link
Member

wgtmac commented Feb 10, 2026

Do you want to take a look? @pitrou @emkornfield @mapleFU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants