[CELEBORN-2248] Implement lazy loading for columnar shuffle classes and skew shuffle method using static holder pattern#3581
Conversation
212f823 to
77447b0
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes the initialization of optional columnar shuffle features by converting static initialization to lazy initialization using the initialization-on-demand holder idiom (static inner class pattern). This ensures that columnar shuffle classes and skew shuffle detection logic are only loaded when actually needed, reducing unnecessary class loading overhead for users not utilizing these features.
Changes:
- Introduced three static inner holder classes to lazily initialize columnar shuffle constructors and skew shuffle method
- Refactored
createColumnarHashBasedShuffleWriter(),createColumnarShuffleReader(), andisCelebornSkewShuffleOrChildShuffle()to use the holder pattern - Added comprehensive JavaDoc comments explaining the lazy loading mechanism
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
77447b0 to
b53a8cf
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3581 +/- ##
==========================================
- Coverage 67.13% 66.80% -0.33%
==========================================
Files 357 357
Lines 21860 21966 +106
Branches 1943 1949 +6
==========================================
- Hits 14674 14672 -2
- Misses 6166 6275 +109
+ Partials 1020 1019 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…nd skew shuffle method using static holder pattern ### What changes were proposed in this pull request? This PR converts the static initialization of columnar shuffle class constructors and skew shuffle method to lazy initialization using the initialization-on-demand holder idiom (static inner class pattern) in SparkUtils.java. Specifically, the following changes were made: 1. Introduced `ColumnarHashBasedShuffleWriterConstructorHolder` static inner class to lazily initialize the constructor for ColumnarHashBasedShuffleWriter 2. Introduced `ColumnarShuffleReaderConstructorHolder` static inner class to lazily initialize the constructor for CelebornColumnarShuffleReader 3. Introduced `CelebornSkewShuffleMethodHolder` static inner class to lazily initialize the `isCelebornSkewedShuffle` method reference 4. Modified `createColumnarHashBasedShuffleWriter()`, `createColumnarShuffleReader()`, and `isCelebornSkewShuffleOrChildShuffle()` methods to use the holder pattern for lazy initialization 5. Added JavaDoc comments explaining the lazy loading mechanism ### Why are the changes needed? The current implementation statically initializes columnar shuffle class constructors and the skew shuffle method at SparkUtils class loading time, which means these classes/methods are loaded regardless of whether they are actually used. This lazy loading approach ensures that: - Columnar shuffle classes are only loaded when actually needed (when `celeborn.columnarShuffle.enabled` is true and the create methods are called) - CelebornShuffleState class is only loaded when skew shuffle detection is needed - Reduces unnecessary class loading overhead for users not using these features - Improves startup performance and memory footprint - Aligns with the conditional usage pattern already present in SparkShuffleManager The static holder pattern (initialization-on-demand holder idiom) provides several advantages: - Thread-safe without explicit synchronization (guaranteed by JVM class loading mechanism) - No synchronization overhead at runtime (no volatile reads or lock acquisition) - Simpler and more concise code compared to double-checked locking - Recommended by Effective Java (Item 83) for lazy initialization ### Does this PR resolve a correctness bug? No, this is a performance optimization. ### Does this PR introduce any user-facing change? No. This change only affects when certain classes are loaded internally. The functionality and API remain unchanged. ### How was this patch tested? - Code review to verify correct implementation of the initialization-on-demand holder pattern - Verified that JVM class loading guarantees thread safety (JLS §12.4.2) - Analyzed existing columnar shuffle and skew shuffle test coverage in the codebase - The changes are backward compatible and don't alter functionality, only initialization timing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
b53a8cf to
445f319
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (5)
client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkUtils.java:238
- Consider typing the cached constructor to the expected supertype (e.g., DynConstructors.Ctor> or similar) instead of Ctor<?>. This gives a bit more compile-time safety and makes it clearer what INSTANCE is expected to construct.
private static final DynConstructors.Ctor<?> INSTANCE =
DynConstructors.builder()
client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkUtils.java:301
- Since this is a constructor handle (DynConstructors.Ctor), prefer calling newInstance(...) rather than invoke(null, ...) here for clarity and to avoid the "target must be null" requirement on invoke(...).
return ColumnarShuffleReaderConstructorHolder.INSTANCE.invoke(
null,
handle,
startPartition,
endPartition,
client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkUtils.java:571
- DynMethods.UnboundMethod.asStatic() allocates a new StaticMethod wrapper each call. Since this path may be invoked frequently, consider caching the StaticMethod in the holder (e.g., expose a StaticMethod field) and invoking that directly to avoid per-call allocations.
if (!CelebornSkewShuffleMethodHolder.INSTANCE.isNoop()) {
return CelebornSkewShuffleMethodHolder.INSTANCE.asStatic().invoke(appShuffleId);
client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkUtils.java:272
- Consider typing this cached constructor to the expected supertype (e.g., DynConstructors.Ctor>) instead of Ctor<?> to make the expected constructed type clearer and catch mismatches earlier.
private static final DynConstructors.Ctor<?> INSTANCE =
DynConstructors.builder()
client-spark/spark-3/src/main/java/org/apache/spark/shuffle/celeborn/SparkUtils.java:260
- Since this is a constructor handle (DynConstructors.Ctor), prefer calling newInstance(...) rather than invoke(null, ...) here. Using newInstance avoids the special "target must be null" contract, reads more clearly, and prevents accidental misuse if this code is refactored.
return ColumnarHashBasedShuffleWriterConstructorHolder.INSTANCE.invoke(
null, shuffleId, handle, taskContext, conf, client, metrics, sendBufferPool);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nd skew shuffle method using static holder pattern ### What changes were proposed in this pull request? This PR converts the static initialization of columnar shuffle class constructors and skew shuffle method to lazy initialization using the initialization-on-demand holder idiom (static inner class pattern) in SparkUtils.java. Specifically, the following changes were made: 1. Introduced `ColumnarHashBasedShuffleWriterConstructorHolder` static inner class to lazily initialize the constructor for ColumnarHashBasedShuffleWriter 2. Introduced `ColumnarShuffleReaderConstructorHolder` static inner class to lazily initialize the constructor for CelebornColumnarShuffleReader 3. Introduced `CelebornSkewShuffleMethodHolder` static inner class to lazily initialize the `isCelebornSkewedShuffle` method reference 4. Modified `createColumnarHashBasedShuffleWriter()`, `createColumnarShuffleReader()`, and `isCelebornSkewShuffleOrChildShuffle()` methods to use the holder pattern for lazy initialization 5. Added JavaDoc comments explaining the lazy loading mechanism ### Why are the changes needed? The current implementation statically initializes columnar shuffle class constructors and the skew shuffle method at SparkUtils class loading time, which means these classes/methods are loaded regardless of whether they are actually used. This lazy loading approach ensures that: - Columnar shuffle classes are only loaded when actually needed (when `celeborn.columnarShuffle.enabled` is true and the create methods are called) - CelebornShuffleState class is only loaded when skew shuffle detection is needed - Reduces unnecessary class loading overhead for users not using these features - Improves startup performance and memory footprint - Aligns with the conditional usage pattern already present in SparkShuffleManager The static holder pattern (initialization-on-demand holder idiom) provides several advantages: - Thread-safe without explicit synchronization (guaranteed by JVM class loading mechanism) - No synchronization overhead at runtime (no volatile reads or lock acquisition) - Simpler and more concise code compared to double-checked locking - Recommended by Effective Java (Item 83) for lazy initialization ### Does this PR resolve a correctness bug? No, this is a performance optimization. ### Does this PR introduce any user-facing change? No. This change only affects when certain classes are loaded internally. The functionality and API remain unchanged. ### How was this patch tested? - Code review to verify correct implementation of the initialization-on-demand holder pattern - Verified that JVM class loading guarantees thread safety - The changes are backward compatible and don't alter functionality, only initialization timing Closes #3581 from ever4Kenny/CELEBORN-2248. Authored-by: zhenghuan <zhenghuan@weidian.com> Signed-off-by: SteNicholas <programgeek@163.com> (cherry picked from commit deb7538) Signed-off-by: SteNicholas <programgeek@163.com>
|
Thanks for update. Merged to main(v0.7.0) and branch-0.6(v0.6.3). |
What changes were proposed in this pull request?
This PR converts the static initialization of columnar shuffle class constructors
and skew shuffle method to lazy initialization using the initialization-on-demand
holder idiom (static inner class pattern) in SparkUtils.java.
Specifically, the following changes were made:
Introduced
ColumnarHashBasedShuffleWriterConstructorHolderstatic inner classto lazily initialize the constructor for ColumnarHashBasedShuffleWriter
Introduced
ColumnarShuffleReaderConstructorHolderstatic inner class to lazilyinitialize the constructor for CelebornColumnarShuffleReader
Introduced
CelebornSkewShuffleMethodHolderstatic inner class to lazilyinitialize the
isCelebornSkewedShufflemethod referenceModified
createColumnarHashBasedShuffleWriter(),createColumnarShuffleReader(),and
isCelebornSkewShuffleOrChildShuffle()methods to use the holder pattern forlazy initialization
Added JavaDoc comments explaining the lazy loading mechanism
Why are the changes needed?
The current implementation statically initializes columnar shuffle class constructors
and the skew shuffle method at SparkUtils class loading time, which means these
classes/methods are loaded regardless of whether they are actually used.
This lazy loading approach ensures that:
celeborn.columnarShuffle.enabledis true and the create methods are called)The static holder pattern (initialization-on-demand holder idiom) provides several
advantages:
Does this PR resolve a correctness bug?
No, this is a performance optimization.
Does this PR introduce any user-facing change?
No. This change only affects when certain classes are loaded internally.
The functionality and API remain unchanged.
How was this patch tested?