Add STRUCT/ARRAY/MAP nested column support to RAB (CSA-371)#2392
Add STRUCT/ARRAY/MAP nested column support to RAB (CSA-371)#2392bladata1990 wants to merge 6 commits intomainfrom
Conversation
Expands complex SQL types (STRUCT, ARRAY<STRUCT>, MAP<K,STRUCT>) into child Column assets linked via parentColumn hierarchy. Sub-columns are excluded from the table's flat Columns list by clearing all tableQualifiedName/tableName/table/view refs — navigation is via parentColumn chain only. New fields on sub-columns: - parentColumnQualifiedName / parentColumn / parentColumnName - columnHierarchy: newline-delimited JSON ancestor entries (enables breadcrumb display in the UI, e.g. struct_col > city) - columnDepthLevel: 1 for direct fields, 2+ for deeper nesting - nestedColumnOrder: ordinal position within parent - subType=nested QN format (matching Databricks connector): - STRUCT field: tableQN/parentCol/fieldName - ARRAY<STRUCT>: tableQN/parentCol/items/fieldName - MAP<K,STRUCT>: tableQN/parentCol/values/fieldName - Deeply nested: tableQN/parentCol/outer/inner Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
The dependencies { include(project(":samples:packages:asset-import")) }
filter was accidentally removed from the shadowJar block, causing the fat
jar to bundle all transitive SDK dependencies instead of just asset-import.
This conflicted with the base container image jars and caused
ClassNotFoundException: com.atlan.pkg.rab.Importer at runtime.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
|
@cmgrote Gentle nudge on this, can you check and approve |
|
Tested sample asset in customer environment- https://mastercard-pov.atlan.com/assets/e2b15ecb-62be-4304-b661-9cc27f5d1b18/overview |
cmgrote
left a comment
There was a problem hiding this comment.
Need to drop the changes to the CI (.github/workflows/merge.yml in particular) for non-main branches. Want to avoid creating a permanent footprint for this that needs to be maintained.
Also, it looks like the logic should handle new headings in the CSV (for parent column qualifiedName, etc) — but the provided test file and tests don't seem to exercise this path at all. Please extend the test file and test scenario to test these additions, too.
|
@cmgrote We have not added any new columns to the files because the Struct itself is defined in the 'dataType' column attached a sample file |
… tag logic Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
| RowSerde.getHeaderForField(Column.PARENT_COLUMN_QUALIFIED_NAME, Column::class.java), | ||
| RowSerde.getHeaderForField(Column.PARENT_COLUMN, Column::class.java), | ||
| RowSerde.getHeaderForField(Column.PARENT_COLUMN_NAME, Column::class.java), | ||
| RowSerde.getHeaderForField(Column.NESTED_COLUMN_ORDER, Column::class.java), | ||
| RowSerde.getHeaderForField(Column.COLUMN_DEPTH_LEVEL, Column::class.java), | ||
| RowSerde.getHeaderForField(Column.COLUMN_HIERARCHY, Column::class.java), | ||
| RowSerde.getHeaderForField(Asset.SUB_TYPE), |
There was a problem hiding this comment.
These are all new headers in the CSV file, implying they'll be handled... (But missing from the tests.)
| /** Returns the base type name, stripping any angle-bracket type parameters. | ||
| * E.g. "STRUCT<a:INT,b:DOUBLE>" → "STRUCT", "INT" → "INT". */ | ||
| private fun baseTypeName(rawType: String): String = if (rawType.contains("<")) rawType.substringBefore("<").trim().uppercase() else rawType | ||
|
|
||
| /** {@inheritDoc} | ||
| * | ||
| * Overridden to emit additional child column rows when the column's data type is a complex type | ||
| * (STRUCT, ARRAY<STRUCT>, or MAP<K, STRUCT>). Child columns are generated recursively for | ||
| * deeply nested types. | ||
| */ | ||
| override fun mapRow(inputRow: Map<String, String>): List<List<String>> { | ||
| val rows = super.mapRow(inputRow).toMutableList() | ||
| val rawType = trimWhitespace(inputRow.getOrElse(Column.DATA_TYPE.atlanFieldName) { "" }) | ||
| val parseResult = ComplexTypeParser.extractStructFields(rawType) | ||
| if (parseResult != null) { | ||
| val connectionQN = getConnectionQN(inputRow) | ||
| val details = getSQLHierarchyDetails(inputRow, typeNameFilter, preprocessedDetails.entityQualifiedNameToType) | ||
| val parentColumnQN = "$connectionQN/${details.partialQN}" | ||
| val parentAssetMap = mapAsset(inputRow) | ||
| rows.addAll(buildSubColumnRows(parentAssetMap, parentColumnQN, parseResult)) | ||
| } | ||
| return rows | ||
| } | ||
|
|
||
| /** | ||
| * Recursively build child column rows for all fields in the given [parseResult]. | ||
| * | ||
| * @param baseAssetMap field map of the immediate parent column (used to inherit context fields) | ||
| * @param parentColumnQN qualified name of the parent column asset (used for [PARENT_COLUMN_QN_HEADER]) | ||
| * @param parseResult parsed complex type fields and optional synthetic QN node (e.g. "items" for ARRAY) | ||
| * @param depth nesting depth of the child columns (1 for direct children of a top-level column, 2 for grandchildren, etc.) | ||
| */ | ||
| private fun buildSubColumnRows( | ||
| baseAssetMap: Map<String, String>, | ||
| parentColumnQN: String, | ||
| parseResult: ComplexTypeParser.ParseResult, | ||
| depth: Int = 1, | ||
| ): List<List<String>> { | ||
| val rows = mutableListOf<List<String>>() | ||
| // For ARRAY / MAP, insert the synthetic node into the QN path but NOT into parentColumnQN | ||
| val qnBase = if (parseResult.syntheticNode != null) "$parentColumnQN/${parseResult.syntheticNode}" else parentColumnQN | ||
| parseResult.fields.forEachIndexed { idx, field -> | ||
| val childQN = "$qnBase/${field.name}" | ||
| val childAssetMap = buildChildAssetMap(baseAssetMap, parentColumnQN, childQN, field, idx + 1, depth) | ||
| rows.add(assetMapToValueList(childAssetMap)) | ||
| // Recurse for nested complex types (e.g. STRUCT within STRUCT, ARRAY within STRUCT) | ||
| val nestedResult = ComplexTypeParser.extractStructFields(field.rawType) | ||
| if (nestedResult != null) { | ||
| rows.addAll(buildSubColumnRows(childAssetMap, childQN, nestedResult, depth + 1)) | ||
| } | ||
| } | ||
| return rows | ||
| } | ||
|
|
||
| /** | ||
| * Build the asset map for a single child column, inheriting all context fields from | ||
| * [parentAssetMap] and overriding the column-specific fields. | ||
| * | ||
| * @param parentAssetMap asset map of the immediate parent column | ||
| * @param parentColumnQN qualified name of the parent column (for [PARENT_COLUMN_QN_HEADER]) | ||
| * @param childQN qualified name for the child column | ||
| * @param field field definition (name and raw type) for the child column | ||
| * @param order ordinal position of the child column within its parent | ||
| * @param depth nesting depth of this child column (1 for direct children of a top-level column, 2 for grandchildren, etc.) | ||
| */ | ||
| private fun buildChildAssetMap( | ||
| parentAssetMap: Map<String, String>, | ||
| parentColumnQN: String, | ||
| childQN: String, | ||
| field: ComplexTypeParser.FieldDefinition, | ||
| order: Int, | ||
| depth: Int, | ||
| ): Map<String, String> { | ||
| val childMap = parentAssetMap.toMutableMap() | ||
| childMap[RowSerde.getHeaderForField(Asset.QUALIFIED_NAME)] = childQN | ||
| childMap[RowSerde.getHeaderForField(Asset.NAME)] = field.name | ||
| childMap[RowSerde.getHeaderForField(Column.DATA_TYPE, Column::class.java)] = baseTypeName(field.rawType) | ||
| childMap[RowSerde.getHeaderForField(Column.RAW_DATA_TYPE_DEFINITION, Column::class.java)] = field.rawType | ||
| childMap[RowSerde.getHeaderForField(Column.ORDER, Column::class.java)] = order.toString() | ||
| // Clear numeric type-specific fields — they're not meaningful for the child's raw type | ||
| childMap[RowSerde.getHeaderForField(Column.PRECISION, Column::class.java)] = "" | ||
| childMap[RowSerde.getHeaderForField(Column.NUMERIC_SCALE, Column::class.java)] = "" | ||
| childMap[RowSerde.getHeaderForField(Column.MAX_LENGTH, Column::class.java)] = "" | ||
| // Clear table/view references on sub-columns so they do not appear in the table's flat | ||
| // column list (table_columns relationship). Navigation is via parentColumn chain instead. | ||
| childMap[RowSerde.getHeaderForField(Column.TABLE_QUALIFIED_NAME, Column::class.java)] = "" | ||
| childMap[RowSerde.getHeaderForField(Column.TABLE_NAME, Column::class.java)] = "" | ||
| childMap[RowSerde.getHeaderForField(Column.TABLE, Column::class.java)] = "" | ||
| childMap[RowSerde.getHeaderForField(Column.VIEW_QUALIFIED_NAME, Column::class.java)] = "" | ||
| childMap[RowSerde.getHeaderForField(Column.VIEW_NAME, Column::class.java)] = "" | ||
| childMap[RowSerde.getHeaderForField(Column.VIEW, Column::class.java)] = "" | ||
| childMap[RowSerde.getHeaderForField(Column.MATERIALIZED_VIEW, Column::class.java)] = "" | ||
| childMap[PARENT_COLUMN_QN_HEADER] = parentColumnQN | ||
| childMap[PARENT_COLUMN_HEADER] = "${Column.TYPE_NAME}@$parentColumnQN" | ||
| childMap[PARENT_COLUMN_NAME_HEADER] = parentColumnQN.substringAfterLast('/') | ||
| childMap[NESTED_COLUMN_ORDER_HEADER] = order.toString() | ||
| // columnDepthLevel tells Atlan this is a nested sub-column (not a top-level table column). | ||
| childMap[COLUMN_DEPTH_LEVEL_HEADER] = depth.toString() | ||
| // columnHierarchy lists all ancestor columns from depth-1 up to the immediate parent. | ||
| // Each entry is a JSON object: {"depth":"<n>","qualifiedName":"<qn>","name":"<name>"}. | ||
| // Multiple entries are newline-delimited (CellXformer.LIST_DELIMITER). | ||
| // Matches the format used in AIM nested_columns.csv reference and Databricks connector. | ||
| val parentHierarchyStr = parentAssetMap.getOrElse(COLUMN_HIERARCHY_HEADER) { "" } | ||
| val parentName = parentColumnQN.substringAfterLast('/') | ||
| val newEntry = """{"depth": "$depth","qualifiedName": "$parentColumnQN","name": "$parentName"}""" | ||
| childMap[COLUMN_HIERARCHY_HEADER] = if (parentHierarchyStr.isBlank()) newEntry else "$parentHierarchyStr\n$newEntry" | ||
| childMap[RowSerde.getHeaderForField(Asset.SUB_TYPE)] = "nested" | ||
| return childMap | ||
| } |
There was a problem hiding this comment.
All of this logic looks like it'd be handling nested columns — but looks currently unexercised by the tests (given the input file).
Adds ComplexTypeColumnsRABTest covering the full pipeline from CSV input to nested child columns in Atlan, verifying parentColumnQualifiedName, columnDepthLevel, synthetic QN nodes (/items/, /values/), and depth-2 recursion. Also updates assets-complex.csv with adminRoles/adminUsers so the connection can be created during test setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
|
@cmgrote updated the E2E tests now |
Summary
parentColumnhierarchyKey design decisions
Why sub-columns don't appear in the flat Columns list:
Atlan auto-creates the
table_columnsrelationship fromtableQualifiedNameserver-side. Sub-columns have all table/view reference fields cleared (tableQualifiedName,tableName,table,viewQualifiedName,viewName,view,materializedView) so they are NOT added totable_columns. Navigation is via theparentColumnchain.QN format (matching Databricks connector):
tableQN/parentCol/fieldNametableQN/parentCol/items/fieldNametableQN/parentCol/values/fieldNametableQN/parentCol/outerField/innerFieldFields set on sub-columns:
parentColumnQualifiedName/parentColumn/parentColumnNamecolumnHierarchy: newline-delimited JSON ancestor entries (enables breadcrumb display)columnDepthLevel: 1 for direct fields, 2+ for deeper nestingnestedColumnOrder: ordinal within parentsubType=nestedFiles changed
ComplexTypeParser.kt— NEW: bracket-aware recursive parser for STRUCT/ARRAY/MAP type stringsAssetXformer.kt— added nested column fields toBASE_OUTPUT_HEADERSColumnXformer.kt— overridesmapRow()to recursively emit child column rows for complex typesComplexTypeParserTest.kt— NEW: unit tests for the parserassets-complex.csv— NEW: test fixture with STRUCT/ARRAY/MAP typesbuild.gradle.kts(RAB + AIM) — minor build fixesTest plan
ci_noqn_testtable)plain_col,struct_col,nested_struct_colappear in table's flat Columns list (3 total)city,zipappear only insidestruct_col's nested viewouter,labelappear only insidenested_struct_col's nested viewinner,countappear only insideouter's nested view (depth=2)columnHierarchybreadcrumbs display correctly in the UI./gradlew :samples:packages:relational-assets-builder:test -PpackageTests🤖 Generated with Claude Code