Skip to content

Add STRUCT/ARRAY/MAP nested column support to RAB (CSA-371)#2392

Open
bladata1990 wants to merge 6 commits intomainfrom
feature/rab-complex-type-nested-columns
Open

Add STRUCT/ARRAY/MAP nested column support to RAB (CSA-371)#2392
bladata1990 wants to merge 6 commits intomainfrom
feature/rab-complex-type-nested-columns

Conversation

@bladata1990
Copy link
Copy Markdown
Collaborator

Summary

  • Adds support for complex SQL types (STRUCT, ARRAY, MAP<K,STRUCT>) in the Relational Assets Builder package
  • Child fields of complex-typed columns are emitted as separate Column assets linked via parentColumn hierarchy
  • Sub-columns are correctly excluded from the table's flat Columns list — they only appear inside the parent column's nested view

Key design decisions

Why sub-columns don't appear in the flat Columns list:
Atlan auto-creates the table_columns relationship from tableQualifiedName server-side. Sub-columns have all table/view reference fields cleared (tableQualifiedName, tableName, table, viewQualifiedName, viewName, view, materializedView) so they are NOT added to table_columns. Navigation is via the parentColumn chain.

QN format (matching Databricks connector):

  • STRUCT field: tableQN/parentCol/fieldName
  • ARRAY: tableQN/parentCol/items/fieldName
  • MAP<K,STRUCT>: tableQN/parentCol/values/fieldName
  • Deeply nested: tableQN/parentCol/outerField/innerField

Fields set on sub-columns:

  • parentColumnQualifiedName / parentColumn / parentColumnName
  • columnHierarchy: newline-delimited JSON ancestor entries (enables breadcrumb display)
  • columnDepthLevel: 1 for direct fields, 2+ for deeper nesting
  • nestedColumnOrder: ordinal within parent
  • subType=nested

Files changed

  • ComplexTypeParser.kt — NEW: bracket-aware recursive parser for STRUCT/ARRAY/MAP type strings
  • AssetXformer.kt — added nested column fields to BASE_OUTPUT_HEADERS
  • ColumnXformer.kt — overrides mapRow() to recursively emit child column rows for complex types
  • ComplexTypeParserTest.kt — NEW: unit tests for the parser
  • assets-complex.csv — NEW: test fixture with STRUCT/ARRAY/MAP types
  • build.gradle.kts (RAB + AIM) — minor build fixes

Test plan

  • Tested on fs3.atlan.com with Redshift connector (ci_noqn_test table)
  • plain_col, struct_col, nested_struct_col appear in table's flat Columns list (3 total)
  • city, zip appear only inside struct_col's nested view
  • outer, label appear only inside nested_struct_col's nested view
  • inner, count appear only inside outer's nested view (depth=2)
  • columnHierarchy breadcrumbs display correctly in the UI
  • Run unit tests: ./gradlew :samples:packages:relational-assets-builder:test -PpackageTests

🤖 Generated with Claude Code

Expands complex SQL types (STRUCT, ARRAY<STRUCT>, MAP<K,STRUCT>) into
child Column assets linked via parentColumn hierarchy. Sub-columns are
excluded from the table's flat Columns list by clearing all
tableQualifiedName/tableName/table/view refs — navigation is via
parentColumn chain only.

New fields on sub-columns:
- parentColumnQualifiedName / parentColumn / parentColumnName
- columnHierarchy: newline-delimited JSON ancestor entries (enables
  breadcrumb display in the UI, e.g. struct_col > city)
- columnDepthLevel: 1 for direct fields, 2+ for deeper nesting
- nestedColumnOrder: ordinal position within parent
- subType=nested

QN format (matching Databricks connector):
- STRUCT field:      tableQN/parentCol/fieldName
- ARRAY<STRUCT>:     tableQN/parentCol/items/fieldName
- MAP<K,STRUCT>:     tableQN/parentCol/values/fieldName
- Deeply nested:     tableQN/parentCol/outer/inner

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
@bladata1990 bladata1990 requested a review from cmgrote as a code owner April 8, 2026 13:44
bladata1990 and others added 3 commits April 8, 2026 20:58
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
The dependencies { include(project(":samples:packages:asset-import")) }
filter was accidentally removed from the shadowJar block, causing the fat
jar to bundle all transitive SDK dependencies instead of just asset-import.
This conflicted with the base container image jars and caused
ClassNotFoundException: com.atlan.pkg.rab.Importer at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
@bladata1990
Copy link
Copy Markdown
Collaborator Author

@cmgrote Gentle nudge on this, can you check and approve

@bladata1990
Copy link
Copy Markdown
Collaborator Author

Tested sample asset in customer environment- https://mastercard-pov.atlan.com/assets/e2b15ecb-62be-4304-b661-9cc27f5d1b18/overview

Copy link
Copy Markdown
Collaborator

@cmgrote cmgrote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to drop the changes to the CI (.github/workflows/merge.yml in particular) for non-main branches. Want to avoid creating a permanent footprint for this that needs to be maintained.

Also, it looks like the logic should handle new headings in the CSV (for parent column qualifiedName, etc) — but the provided test file and tests don't seem to exercise this path at all. Please extend the test file and test scenario to test these additions, too.

@bladata1990
Copy link
Copy Markdown
Collaborator Author

@cmgrote We have not added any new columns to the files because the Struct itself is defined in the 'dataType' column attached a sample file
Struct_example_rab.csv

… tag logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
@bladata1990 bladata1990 requested a review from cmgrote April 16, 2026 10:27
Comment on lines +97 to +103
RowSerde.getHeaderForField(Column.PARENT_COLUMN_QUALIFIED_NAME, Column::class.java),
RowSerde.getHeaderForField(Column.PARENT_COLUMN, Column::class.java),
RowSerde.getHeaderForField(Column.PARENT_COLUMN_NAME, Column::class.java),
RowSerde.getHeaderForField(Column.NESTED_COLUMN_ORDER, Column::class.java),
RowSerde.getHeaderForField(Column.COLUMN_DEPTH_LEVEL, Column::class.java),
RowSerde.getHeaderForField(Column.COLUMN_HIERARCHY, Column::class.java),
RowSerde.getHeaderForField(Asset.SUB_TYPE),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all new headers in the CSV file, implying they'll be handled... (But missing from the tests.)

Comment on lines +117 to +225
/** Returns the base type name, stripping any angle-bracket type parameters.
* E.g. "STRUCT<a:INT,b:DOUBLE>" → "STRUCT", "INT" → "INT". */
private fun baseTypeName(rawType: String): String = if (rawType.contains("<")) rawType.substringBefore("<").trim().uppercase() else rawType

/** {@inheritDoc}
*
* Overridden to emit additional child column rows when the column's data type is a complex type
* (STRUCT, ARRAY<STRUCT>, or MAP<K, STRUCT>). Child columns are generated recursively for
* deeply nested types.
*/
override fun mapRow(inputRow: Map<String, String>): List<List<String>> {
val rows = super.mapRow(inputRow).toMutableList()
val rawType = trimWhitespace(inputRow.getOrElse(Column.DATA_TYPE.atlanFieldName) { "" })
val parseResult = ComplexTypeParser.extractStructFields(rawType)
if (parseResult != null) {
val connectionQN = getConnectionQN(inputRow)
val details = getSQLHierarchyDetails(inputRow, typeNameFilter, preprocessedDetails.entityQualifiedNameToType)
val parentColumnQN = "$connectionQN/${details.partialQN}"
val parentAssetMap = mapAsset(inputRow)
rows.addAll(buildSubColumnRows(parentAssetMap, parentColumnQN, parseResult))
}
return rows
}

/**
* Recursively build child column rows for all fields in the given [parseResult].
*
* @param baseAssetMap field map of the immediate parent column (used to inherit context fields)
* @param parentColumnQN qualified name of the parent column asset (used for [PARENT_COLUMN_QN_HEADER])
* @param parseResult parsed complex type fields and optional synthetic QN node (e.g. "items" for ARRAY)
* @param depth nesting depth of the child columns (1 for direct children of a top-level column, 2 for grandchildren, etc.)
*/
private fun buildSubColumnRows(
baseAssetMap: Map<String, String>,
parentColumnQN: String,
parseResult: ComplexTypeParser.ParseResult,
depth: Int = 1,
): List<List<String>> {
val rows = mutableListOf<List<String>>()
// For ARRAY / MAP, insert the synthetic node into the QN path but NOT into parentColumnQN
val qnBase = if (parseResult.syntheticNode != null) "$parentColumnQN/${parseResult.syntheticNode}" else parentColumnQN
parseResult.fields.forEachIndexed { idx, field ->
val childQN = "$qnBase/${field.name}"
val childAssetMap = buildChildAssetMap(baseAssetMap, parentColumnQN, childQN, field, idx + 1, depth)
rows.add(assetMapToValueList(childAssetMap))
// Recurse for nested complex types (e.g. STRUCT within STRUCT, ARRAY within STRUCT)
val nestedResult = ComplexTypeParser.extractStructFields(field.rawType)
if (nestedResult != null) {
rows.addAll(buildSubColumnRows(childAssetMap, childQN, nestedResult, depth + 1))
}
}
return rows
}

/**
* Build the asset map for a single child column, inheriting all context fields from
* [parentAssetMap] and overriding the column-specific fields.
*
* @param parentAssetMap asset map of the immediate parent column
* @param parentColumnQN qualified name of the parent column (for [PARENT_COLUMN_QN_HEADER])
* @param childQN qualified name for the child column
* @param field field definition (name and raw type) for the child column
* @param order ordinal position of the child column within its parent
* @param depth nesting depth of this child column (1 for direct children of a top-level column, 2 for grandchildren, etc.)
*/
private fun buildChildAssetMap(
parentAssetMap: Map<String, String>,
parentColumnQN: String,
childQN: String,
field: ComplexTypeParser.FieldDefinition,
order: Int,
depth: Int,
): Map<String, String> {
val childMap = parentAssetMap.toMutableMap()
childMap[RowSerde.getHeaderForField(Asset.QUALIFIED_NAME)] = childQN
childMap[RowSerde.getHeaderForField(Asset.NAME)] = field.name
childMap[RowSerde.getHeaderForField(Column.DATA_TYPE, Column::class.java)] = baseTypeName(field.rawType)
childMap[RowSerde.getHeaderForField(Column.RAW_DATA_TYPE_DEFINITION, Column::class.java)] = field.rawType
childMap[RowSerde.getHeaderForField(Column.ORDER, Column::class.java)] = order.toString()
// Clear numeric type-specific fields — they're not meaningful for the child's raw type
childMap[RowSerde.getHeaderForField(Column.PRECISION, Column::class.java)] = ""
childMap[RowSerde.getHeaderForField(Column.NUMERIC_SCALE, Column::class.java)] = ""
childMap[RowSerde.getHeaderForField(Column.MAX_LENGTH, Column::class.java)] = ""
// Clear table/view references on sub-columns so they do not appear in the table's flat
// column list (table_columns relationship). Navigation is via parentColumn chain instead.
childMap[RowSerde.getHeaderForField(Column.TABLE_QUALIFIED_NAME, Column::class.java)] = ""
childMap[RowSerde.getHeaderForField(Column.TABLE_NAME, Column::class.java)] = ""
childMap[RowSerde.getHeaderForField(Column.TABLE, Column::class.java)] = ""
childMap[RowSerde.getHeaderForField(Column.VIEW_QUALIFIED_NAME, Column::class.java)] = ""
childMap[RowSerde.getHeaderForField(Column.VIEW_NAME, Column::class.java)] = ""
childMap[RowSerde.getHeaderForField(Column.VIEW, Column::class.java)] = ""
childMap[RowSerde.getHeaderForField(Column.MATERIALIZED_VIEW, Column::class.java)] = ""
childMap[PARENT_COLUMN_QN_HEADER] = parentColumnQN
childMap[PARENT_COLUMN_HEADER] = "${Column.TYPE_NAME}@$parentColumnQN"
childMap[PARENT_COLUMN_NAME_HEADER] = parentColumnQN.substringAfterLast('/')
childMap[NESTED_COLUMN_ORDER_HEADER] = order.toString()
// columnDepthLevel tells Atlan this is a nested sub-column (not a top-level table column).
childMap[COLUMN_DEPTH_LEVEL_HEADER] = depth.toString()
// columnHierarchy lists all ancestor columns from depth-1 up to the immediate parent.
// Each entry is a JSON object: {"depth":"<n>","qualifiedName":"<qn>","name":"<name>"}.
// Multiple entries are newline-delimited (CellXformer.LIST_DELIMITER).
// Matches the format used in AIM nested_columns.csv reference and Databricks connector.
val parentHierarchyStr = parentAssetMap.getOrElse(COLUMN_HIERARCHY_HEADER) { "" }
val parentName = parentColumnQN.substringAfterLast('/')
val newEntry = """{"depth": "$depth","qualifiedName": "$parentColumnQN","name": "$parentName"}"""
childMap[COLUMN_HIERARCHY_HEADER] = if (parentHierarchyStr.isBlank()) newEntry else "$parentHierarchyStr\n$newEntry"
childMap[RowSerde.getHeaderForField(Asset.SUB_TYPE)] = "nested"
return childMap
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this logic looks like it'd be handling nested columns — but looks currently unexercised by the tests (given the input file).

Adds ComplexTypeColumnsRABTest covering the full pipeline from CSV input
to nested child columns in Atlan, verifying parentColumnQualifiedName,
columnDepthLevel, synthetic QN nodes (/items/, /values/), and depth-2
recursion. Also updates assets-complex.csv with adminRoles/adminUsers so
the connection can be created during test setup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: bladata1990 <balakrishnan.r@atlan.com>
@bladata1990
Copy link
Copy Markdown
Collaborator Author

@cmgrote updated the E2E tests now

@bladata1990 bladata1990 requested a review from cmgrote April 16, 2026 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants