fix vl mem leak by hhaAndroid · Pull Request #1645 · InternLM/xtuner

hhaAndroid · 2026-03-31T08:27:21Z

This PR fixes memory leaks in VL (vision-language) data loading by:

Using context managers for Image.open() and io.BytesIO() to ensure file handles are closed promptly
Adding a trim_memory() helper that calls malloc_trim(0) to return freed heap pages to the OS after processing multimodal/video items

hhaAndroid · 2026-03-31T08:55:24Z

@claude review

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py

xtuner/v1/datasets/mllm_tokenize_fn/qwen3_vl_utils.py

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py

claude · 2026-03-31T08:57:53Z

Claude: ## Summary

This PR fixes memory leaks in VL (vision-language) data loading by:

Using context managers for Image.open() and io.BytesIO() to ensure file handles are closed promptly
Adding a trim_memory() helper that calls malloc_trim(0) to return freed heap pages to the OS after processing multimodal/video items

The resource cleanup changes (context managers) are correct and good practice. A few concerns:

Issues

Warning

Incorrect type hint on trim_memory: The logger parameter is typed as logging.Logger but get_logger() returns a loguru.Logger. This will fail strict mypy checks.
Inverted dependency: trim_memory is a general utility but lives in qwen3_vl_utils.py. The base class module (base_mllm_tokenize_fn.py) should not import from a model-specific utils module. Consider moving it to a shared utils location.

Nit

Per-item malloc_trim overhead: trim_memory is called on every multimodal/video item. Worth benchmarking whether this syscall overhead matters at scale, or whether periodic trimming would be more efficient.
PR body is empty — a brief description of the root cause and fix would help reviewers.

Verdict

COMMENT — the core fix (context managers) is solid. The trim_memory utility needs the type hint fix and should ideally live in a shared module.

nil0x9 · 2026-04-04T03:14:06Z

xtuner/v1/utils/misc.py

+_TRIM_MEMORY_WARNED = False
+
+
+def trim_memory(logger: logging.Logger | None = None):


logger in function arg seems weird. looks like we want to trim memory induced by logger per se. under what circumstances do we need a logger other than the global loguru logger returned by xtuner.v1.utils.get_logger?

nil0x9 · 2026-04-04T03:17:53Z

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py

        data_name: str | None = None,
        llm_pack_weight: float = 1.0,
        visual_pack_weight: float = 0.0,
+        trim_memory_step: int = 1,


might be better if we stick to trainer naming convention -- use ..._step as counter, and ..._interval for specifying frequency of doing things.

HAOCHENYE · 2026-04-05T14:49:44Z

@claude please fix the PR as comment

- Remove unnecessary logger parameter from trim_memory(); use global logger internally only when needed (on failure). - Rename trim_memory_step to trim_memory_interval to follow trainer naming convention (_step = counter, _interval = frequency).

claude · 2026-04-05T15:23:31Z

Claude: Addressed the review comments from @nil0x9:

Removed logger parameter from trim_memory() — the function now uses the global loguru logger internally (only on failure), eliminating the confusing parameter.
Renamed trim_memory_step → trim_memory_interval — follows the trainer naming convention where _step is a counter and _interval specifies frequency.

Changes across 3 files: misc.py, base_mllm_tokenize_fn.py, qwen3_vl_tokenize_fn.py.

New commit: hhaAndroid@4851588f

nil0x9 · 2026-04-05T15:32:45Z

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py

            else:
                ret = self.multi_modal_get_item(item, media_root)
+                if self._trim_memory_count % self._trim_memory_interval == 0:
+                    self._trim_memory_count += 1


this counter should be outside the if condition? otherwise the memory would be trimmed only once (count == 0) if trim memory interval is greater than 1?

nil0x9 · 2026-04-05T16:31:49Z

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py

                ret = self.calc_num_tokens_video_get_item(item)
            else:
                ret = self.video_get_item(item, media_root)
+                if self._trim_memory_count % self._trim_memory_interval == 0:


same issue here. incrementing should be moved outside of if clause.

nil0x9 · 2026-04-05T16:33:11Z

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py

        self._video_extra_info_list: list[dict] = []

+        self._trim_memory_interval = trim_memory_interval
+        self._trim_memory_count = 0


_trim_memory_counter might be better bc count would imply that "memory has been trimmed X times"

nil0x9 · 2026-04-05T16:35:57Z

@claude address the comments in this PR and fix the code accordingly.

hhaAndroid added 2 commits March 31, 2026 08:26

fix vl mem leak

85508af

fix

481dd5a

claude bot reviewed Mar 31, 2026

View reviewed changes

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py Outdated Show resolved Hide resolved

claude bot reviewed Mar 31, 2026

View reviewed changes

xtuner/v1/datasets/mllm_tokenize_fn/qwen3_vl_utils.py Outdated Show resolved Hide resolved

claude bot reviewed Mar 31, 2026

View reviewed changes

xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py Show resolved Hide resolved

refine

29ff875

nil0x9 reviewed Apr 4, 2026

View reviewed changes

nil0x9 reviewed Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix vl mem leak#1645

fix vl mem leak#1645
hhaAndroid wants to merge 4 commits intoInternLM:mainfrom
hhaAndroid:fix_vl_mem_leak

hhaAndroid commented Mar 31, 2026 •

edited

Loading

Uh oh!

hhaAndroid commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 31, 2026

Uh oh!

nil0x9 Apr 4, 2026

Uh oh!

nil0x9 Apr 4, 2026

Uh oh!

HAOCHENYE commented Apr 5, 2026

Uh oh!

claude bot commented Apr 5, 2026

Uh oh!

nil0x9 Apr 5, 2026

Uh oh!

nil0x9 Apr 5, 2026

Uh oh!

nil0x9 Apr 5, 2026

Uh oh!

nil0x9 commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		_TRIM_MEMORY_WARNED = False


		def trim_memory(logger: logging.Logger \| None = None):

Conversation

hhaAndroid commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hhaAndroid commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 31, 2026

Issues

Warning

Nit

Verdict

Uh oh!

nil0x9 Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

nil0x9 Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

HAOCHENYE commented Apr 5, 2026

Uh oh!

claude bot commented Apr 5, 2026

Uh oh!

nil0x9 Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

nil0x9 Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

nil0x9 Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

nil0x9 commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hhaAndroid commented Mar 31, 2026 •

edited

Loading