Skip to content

[SPARK-55302][SQL] Fix custom metrics in case of KeyGroupedPartitioning#54081

Closed
peter-toth wants to merge 2 commits intoapache:masterfrom
peter-toth:SPARK-55302-fix-kgp-custom-metrics
Closed

[SPARK-55302][SQL] Fix custom metrics in case of KeyGroupedPartitioning#54081
peter-toth wants to merge 2 commits intoapache:masterfrom
peter-toth:SPARK-55302-fix-kgp-custom-metrics

Conversation

@peter-toth
Copy link
Copy Markdown
Contributor

@peter-toth peter-toth commented Jan 31, 2026

What changes were proposed in this pull request?

This PR adds a new initMetricsValues() method to PartitionReader so as to initialize custom metrics returned by currentMetricsValues(). In case of KeyGroupedPartitioning multiple input partitions are grouped and so multiple PartitionReader belong to one output partition. A PartitionReader needs to be initialized with metrics calculated by the previous PartitionReader of the same partition group to calculate the right value.

Why are the changes needed?

To calculate custom metrics correctly.

Does this PR introduce any user-facing change?

It fixes metrics calculation.

How was this patch tested?

New UT is added.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 31, 2026

JIRA Issue Information

=== Bug SPARK-55302 ===
Summary: Fix custom metrics in case of KeyGroupedPartitioning
Assignee: None
Status: Open
Affected: ["4.0.1","3.5.8","4.1.1"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions Bot added the SQL label Jan 31, 2026
@peter-toth
Copy link
Copy Markdown
Contributor Author

@viirya, @szehon-ho can you please take a look at this fix when you have some time?

}
}

test("SSPARK-55302: Custom metrics of grouped partitions") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. SSPARK-55302 -> SPARK-55302?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the typo in fae7720.

* when multiple {@link PartitionReader}s are grouped into one partition in case of
* {@link org.apache.spark.sql.connector.read.partitioning.KeyGroupedPartitioning} and the reader
* is initialized with the metrics returned by the previous reader that belongs to the same
* partition.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention that by default this is no-op?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added in fae7720.

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @peter-toth .

@peter-toth peter-toth closed this in c94ce2c Feb 2, 2026
@peter-toth
Copy link
Copy Markdown
Contributor Author

Thank you @dongjoon-hyun and @viirya for the review!

Merged to master (4.2.0).

peter-toth added a commit to peter-toth/spark that referenced this pull request Feb 20, 2026
…ing`

This PR adds a new `initMetricsValues()` method to `PartitionReader` so as to initialize custom metrics returned by `currentMetricsValues()`. In case of `KeyGroupedPartitioning` multiple input partitions are grouped and so multiple `PartitionReader` belong to one output partition. A `PartitionReader` needs to be initialized with metrics calculated by the previous `PartitionReader` of the same partition group to calculate the right value.

To calculate custom metrics correctly.

It fixes metrics calculation.

New UT is added.

No.

Closes apache#54081 from peter-toth/SPARK-55302-fix-kgp-custom-metrics.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Peter Toth <peter.toth@gmail.com>
peter-toth added a commit to peter-toth/spark that referenced this pull request Feb 20, 2026
…ing`

This PR adds a new `initMetricsValues()` method to `PartitionReader` so as to initialize custom metrics returned by `currentMetricsValues()`. In case of `KeyGroupedPartitioning` multiple input partitions are grouped and so multiple `PartitionReader` belong to one output partition. A `PartitionReader` needs to be initialized with metrics calculated by the previous `PartitionReader` of the same partition group to calculate the right value.

To calculate custom metrics correctly.

It fixes metrics calculation.

New UT is added.

No.

Closes apache#54081 from peter-toth/SPARK-55302-fix-kgp-custom-metrics.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Peter Toth <peter.toth@gmail.com>
peter-toth added a commit to peter-toth/spark that referenced this pull request Feb 20, 2026
…ing`

This PR adds a new `initMetricsValues()` method to `PartitionReader` so as to initialize custom metrics returned by `currentMetricsValues()`. In case of `KeyGroupedPartitioning` multiple input partitions are grouped and so multiple `PartitionReader` belong to one output partition. A `PartitionReader` needs to be initialized with metrics calculated by the previous `PartitionReader` of the same partition group to calculate the right value.

To calculate custom metrics correctly.

It fixes metrics calculation.

New UT is added.

No.

Closes apache#54081 from peter-toth/SPARK-55302-fix-kgp-custom-metrics.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Peter Toth <peter.toth@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants