fix(memory): eliminate memory leaks in Python bindings and inference pipeline by vieenrose · Pull Request #115 · foldl/chatllm.cpp

vieenrose · 2026-02-13T13:01:51Z

Summary

This PR fixes memory leaks in the Python bindings and C++ inference pipeline that cause memory accumulation during repeated inference (particularly noticeable with ASR models like Qwen3-ASR).

Root Causes

Raw pointer members without deletion - tp, transformer, disabler allocated with new but never deleted due to = default destructors
Stack reference to local variable - pos_helper pointed to &def_pos_helper which could be invalidated
Missing virtual destructors - DataReader and Processor lack virtual destructors, preventing proper cleanup via base class pointers
Python callback dict accumulation - _obj2id/_id2obj dictionaries never cleaned up when ChatLLM objects are destroyed

Changes

File	Change
`src/chat.h` / `src/chat.cpp`	Add `~BaseTokenizer()` to `delete tp`
`src/models_priv.h` / `src/models.cpp`	Add `~BaseModelForConditionalGeneration()` to `delete transformer`
`src/layers.h`	Add `~PreludeCacheDisable()` destructor; fix `pos_helper` heap allocation
`src/tokenizer.h`	Add virtual destructors to `DataReader` and `Processor`
`bindings/chatllm.py`	Add `destroy()` method with dict cleanup

Detailed Changes

1. BaseTokenizer destructor

BaseTokenizer::~BaseTokenizer()
{
    if (tp)
    {
        delete tp;
        tp = nullptr;
    }
}

2. BaseModelForConditionalGeneration destructor

BaseModelForConditionalGeneration::~BaseModelForConditionalGeneration()
{
    if (transformer)
    {
        delete transformer;
        transformer = nullptr;
    }
}

3. PreludeCacheDisable destructor

virtual ~PreludeCacheDisable()
{
    if (disabler)
    {
        delete disabler;
        disabler = nullptr;
    }
}

4. CoreAttention pos_helper heap allocation

// Before: pos_helper(helper ? helper : &def_pos_helper)
// After:
pos_helper(helper ? helper : new BaseTensorPosHelper(max_length))

Note: pos_helper is a std::unique_ptr, so the heap allocation is automatically cleaned up.

5. Virtual destructors for base classes

class DataReader { virtual ~DataReader() {} ... };
class Processor { virtual ~Processor() {} ... };

6. Python binding destroy() method

def destroy(self) -> int:
    if hasattr(self, "_chat") and self._chat:
        if self.is_generating: self.abort()
        obj_id = LibChatLLM._obj2id.get(self)
        if obj_id is not None:
            LibChatLLM._obj2id.pop(self, None)
            LibChatLLM._id2obj.pop(obj_id, None)
        self._lib.destroy(self._chat)
        self._chat = None
    return 0

Testing

Build verified (GCC, Linux x86_64)
Tested with Qwen3-ASR model across multiple inference iterations
Memory leak significantly reduced after calling destroy()

Notes

The chatllm_destroy() C API already exists in upstream; this PR exposes it in Python bindings
All changes are pure bug fixes with no new features or API additions
Changes are minimal and focused on proper RAII cleanup patterns

…pipeline - Add explicit destructors to BaseTokenizer, BaseModelForConditionalGeneration, and PreludeCacheDisable to properly delete heap-allocated members - Fix CoreAttention pos_helper to use heap allocation instead of stack reference - Add virtual destructors to DataReader and Processor base classes - Expose chatllm_destroy() in Python bindings and clean up callback dict references Fixes memory accumulation during repeated inference iterations.

foldl · 2026-02-14T01:13:37Z

Thanks. I will check this later.

You don't need to destroy and re-create the object for repeated inference. restart() is ok.

vieenrose · 2026-02-14T06:06:48Z

Thanks. I will check this later.

You don't need to destroy and re-create the object for repeated inference. restart() is ok.

Agreed—restart() works perfectly fine for repeated inference across multiple audio clips. These fixes become useful when you need to switch model sizes (e.g., from 0.6B to 1.7B) dynamically within the same Python session.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): eliminate memory leaks in Python bindings and inference pipeline#115

fix(memory): eliminate memory leaks in Python bindings and inference pipeline#115
vieenrose wants to merge 1 commit intofoldl:masterfrom
vieenrose:fix/memory-leaks-python-bindings

vieenrose commented Feb 13, 2026

Uh oh!

foldl commented Feb 14, 2026

Uh oh!

vieenrose commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vieenrose commented Feb 13, 2026

Summary

Root Causes

Changes

Detailed Changes

1. BaseTokenizer destructor

2. BaseModelForConditionalGeneration destructor

3. PreludeCacheDisable destructor

4. CoreAttention pos_helper heap allocation

5. Virtual destructors for base classes

6. Python binding destroy() method

Testing

Notes

Uh oh!

foldl commented Feb 14, 2026

Uh oh!

vieenrose commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants