Skip to content

Improve error message when precombine field value is null during payload creation #18060

@prashantwason

Description

@prashantwason

Problem Description

When records have null values in the precombine field, Hudi jobs fail with a cryptic error message that makes it difficult for users to diagnose the root cause:

org.apache.hudi.exception.HoodieException: Could not create payload for class: org.apache.hudi.common.model.DefaultHoodieRecordPayload
Caused by: org.apache.hudi.exception.HoodieException: Ordering value is null for record: ...

This error provides no actionable information about:

  • Which precombine field has the null value
  • Which record is problematic (record key)
  • How to remediate the issue

Root Cause

BaseAvroPayload's constructor requires a non-null orderingVal parameter. When records have null values in the precombine field, HoodieAvroUtils.getNestedFieldVal() returns null, which causes payload instantiation to fail with the confusing error message above.

The relevant code path in HoodieCreateRecordUtils.scala:

val hoodieRecord = if (shouldCombine && !orderingFields.isEmpty) {
  val orderingVal = OrderingValues.create(
    orderingFields,
    JFunction.toJavaFunction[String, Comparable[_]](
      field => HoodieAvroUtils.getNestedFieldVal(avroRec, field, false,
        consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]))
  // ... creates payload which fails if orderingVal contains null
}

Proposed Solution

Add explicit null-check with a clear, actionable error message before attempting payload creation. The new error message should:

  • Identify the specific precombine field that has a null value
  • Provide the record key to help locate the problematic record
  • Suggest remediation options (fix data or use a different payload class like OverwriteWithLatestAvroPayload)

Example improved error message:

Precombine field 'ts' has null value for record key 'abc123'. Please ensure all records have non-null values for the precombine field, or use a payload class that doesn't require ordering (e.g., OverwriteWithLatestAvroPayload).

Affected Components

  • Spark: HoodieCreateRecordUtils.scala
  • Flink: Payload creation utilities

Impact

This is a usability improvement that helps users quickly diagnose and fix data quality issues in their ingestion pipelines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions