Skip to content

[SPARK-56057] Fix TLS Memory Leak#54894

Open
akpatnam25 wants to merge 2 commits intoapache:masterfrom
akpatnam25:SPARK-56057-tls-fix
Open

[SPARK-56057] Fix TLS Memory Leak#54894
akpatnam25 wants to merge 2 commits intoapache:masterfrom
akpatnam25:SPARK-56057-tls-fix

Conversation

@akpatnam25
Copy link

@akpatnam25 akpatnam25 commented Mar 18, 2026

What changes were proposed in this pull request?

Celeborn currently uses the same code as Spark for TLS support. We encountered a memory leak there (see PR apache/celeborn#3630).

I am contributing this to Spark in order to keep parity, given that the TLS implementation is the same.

Essentially, the fix for this issue is to release the original message body, so that the net reference count is preserved. The second reference — now living inside the composite buffer in out — keeps the memory alive while Netty writes it to the network. When Netty finishes and releases the composite, the count reaches 0 and the memory is freed cleanly.

This is exactly what the non-SSL MessageEncoder already does via MessageWithHeader.deallocate() — the SSL path simply needed to replicate that behavior explicitly.

Why are the changes needed?

fix memory leak

Does this PR resolve a correctness bug?

Does this PR introduce any user-facing change?

no

How was this patch tested?

already internally in production and tested.
Also added unit tests

Was this patch authored or co-authored using generative AI tooling?

no

@akpatnam25 akpatnam25 marked this pull request as ready for review March 18, 2026 22:03
@akpatnam25
Copy link
Author

+CC @mridulm

@akpatnam25
Copy link
Author

+CC @SteNicholas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant