-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Problem Description
When records have null values in the precombine field, Hudi jobs fail with a cryptic error message that makes it difficult for users to diagnose the root cause:
org.apache.hudi.exception.HoodieException: Could not create payload for class: org.apache.hudi.common.model.DefaultHoodieRecordPayload
Caused by: org.apache.hudi.exception.HoodieException: Ordering value is null for record: ...
This error provides no actionable information about:
- Which precombine field has the null value
- Which record is problematic (record key)
- How to remediate the issue
Root Cause
BaseAvroPayload's constructor requires a non-null orderingVal parameter. When records have null values in the precombine field, HoodieAvroUtils.getNestedFieldVal() returns null, which causes payload instantiation to fail with the confusing error message above.
The relevant code path in HoodieCreateRecordUtils.scala:
val hoodieRecord = if (shouldCombine && !orderingFields.isEmpty) {
val orderingVal = OrderingValues.create(
orderingFields,
JFunction.toJavaFunction[String, Comparable[_]](
field => HoodieAvroUtils.getNestedFieldVal(avroRec, field, false,
consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]))
// ... creates payload which fails if orderingVal contains null
}Proposed Solution
Add explicit null-check with a clear, actionable error message before attempting payload creation. The new error message should:
- Identify the specific precombine field that has a null value
- Provide the record key to help locate the problematic record
- Suggest remediation options (fix data or use a different payload class like
OverwriteWithLatestAvroPayload)
Example improved error message:
Precombine field 'ts' has null value for record key 'abc123'. Please ensure all records have non-null values for the precombine field, or use a payload class that doesn't require ordering (e.g., OverwriteWithLatestAvroPayload).
Affected Components
- Spark:
HoodieCreateRecordUtils.scala - Flink: Payload creation utilities
Impact
This is a usability improvement that helps users quickly diagnose and fix data quality issues in their ingestion pipelines.