fix(memory): eliminate memory leaks in Python bindings and inference pipeline#115
Open
vieenrose wants to merge 1 commit intofoldl:masterfrom
Open
fix(memory): eliminate memory leaks in Python bindings and inference pipeline#115vieenrose wants to merge 1 commit intofoldl:masterfrom
vieenrose wants to merge 1 commit intofoldl:masterfrom
Conversation
…pipeline - Add explicit destructors to BaseTokenizer, BaseModelForConditionalGeneration, and PreludeCacheDisable to properly delete heap-allocated members - Fix CoreAttention pos_helper to use heap allocation instead of stack reference - Add virtual destructors to DataReader and Processor base classes - Expose chatllm_destroy() in Python bindings and clean up callback dict references Fixes memory accumulation during repeated inference iterations.
Owner
|
Thanks. I will check this later. You don't need to destroy and re-create the object for repeated inference. |
Author
Agreed—restart() works perfectly fine for repeated inference across multiple audio clips. These fixes become useful when you need to switch model sizes (e.g., from 0.6B to 1.7B) dynamically within the same Python session. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes memory leaks in the Python bindings and C++ inference pipeline that cause memory accumulation during repeated inference (particularly noticeable with ASR models like Qwen3-ASR).
Root Causes
tp,transformer,disablerallocated withnewbut never deleted due to= defaultdestructorspos_helperpointed to&def_pos_helperwhich could be invalidatedDataReaderandProcessorlack virtual destructors, preventing proper cleanup via base class pointers_obj2id/_id2objdictionaries never cleaned up whenChatLLMobjects are destroyedChanges
src/chat.h/src/chat.cpp~BaseTokenizer()todelete tpsrc/models_priv.h/src/models.cpp~BaseModelForConditionalGeneration()todelete transformersrc/layers.h~PreludeCacheDisable()destructor; fixpos_helperheap allocationsrc/tokenizer.hDataReaderandProcessorbindings/chatllm.pydestroy()method with dict cleanupDetailed Changes
1. BaseTokenizer destructor
2. BaseModelForConditionalGeneration destructor
3. PreludeCacheDisable destructor
4. CoreAttention pos_helper heap allocation
Note:
pos_helperis astd::unique_ptr, so the heap allocation is automatically cleaned up.5. Virtual destructors for base classes
6. Python binding destroy() method
Testing
destroy()Notes
chatllm_destroy()C API already exists in upstream; this PR exposes it in Python bindings