Skip to content

[lake/lance] Add NestedRow type support for Lance#2578

Open
XuQianJin-Stars wants to merge 1 commit intoapache:mainfrom
XuQianJin-Stars:feature/issue-2404-row-lance
Open

[lake/lance] Add NestedRow type support for Lance#2578
XuQianJin-Stars wants to merge 1 commit intoapache:mainfrom
XuQianJin-Stars:feature/issue-2404-row-lance

Conversation

@XuQianJin-Stars
Copy link
Contributor

Purpose

Linked issue: close #2404

This PR adds NestedRow (Struct) type support for Lance lake storage, extending the existing Array type support implementation.

Brief change log

  • LanceArrowUtils.java:

    • Extended toArrowField() to handle RowType by recursively creating child fields for nested struct types
    • Extended toArrowType() to map Fluss RowType to Arrow Struct.INSTANCE
  • ArrowDataConverter.java:

    • Added copyStructVectorData() method to recursively copy data from shaded StructVector to non-shaded StructVector
    • Updated copyVectorData() to detect and delegate to struct-specific copy logic
  • ShadedArrowBatchWriter.java:

    • Extended initFieldVector() to properly allocate and initialize StructVector and its child vectors
  • FlinkLanceTieringTestBase.java:

    • Added createLogTableWithNestedRowType() helper method for creating tables with nested Row columns
    • Added createLogTableWithArrayOfRowType() helper method for creating tables with Array columns

Tests

Unit Tests (LanceArrowUtilsTest.java):

  • testToArrowSchemaWithNestedRowType: Verifies simple nested Row type conversion to Arrow Struct
  • testToArrowSchemaWithDeeplyNestedRowType: Verifies deeply nested Row type conversion
  • testToArrowSchemaWithArrayOfRowType: Verifies Array type conversion
  • testToArrowSchemaWithRowContainingArray: Verifies Row containing Array field

Unit Tests (LanceTieringTest.java):

  • testTieringWriteTableWithNestedRowType: Verifies writing and reading tables with nested Row type

Integration Tests (LanceTieringITCase.java):

  • testTieringWithNestedRowType: End-to-end test for tiering with nested Row type
  • testTieringWithArrayOfRowType: End-to-end test for tiering with Array type

API and Format

No API changes. This change extends the internal Lance lake storage format to support Struct types, which is backward compatible.

Documentation

No documentation changes needed. This is an internal enhancement to support additional data types in Lance lake storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[lake/lance] NestedRow type support for Lance

1 participant