<EXPERIMENTAL - DO NOT REVIEW> DYNAMIC_UNBOUND support for portable runtime: lazy KV cache allocation by psiddh · Pull Request #18350 · pytorch/executorch

psiddh · 2026-03-19T22:16:11Z

Enable DYNAMIC_UNBOUND tensors in the portable runtime, allowing KV cache buffers to be dynamically managed rather than statically memory-planned. This is the architectural foundation for pay-as-you-go memory allocation in ExecuTorch LLM inference.

Core changes:

DynamicAllocator interface with allocate/reallocate/free
PalDynamicAllocator default impl (PAL-backed, 2x growth policy)
TrackingDynamicAllocator for memory stats observability
MemoryManager gains 4th slot for DynamicAllocator (backward compatible)
TensorImpl gains dynamic_allocator_ and capacity_bytes_ fields
TensorImpl::internal_resize_contiguous handles DYNAMIC_UNBOUND resize
tensor_parser_portable.cpp: remove DYNAMIC_UNBOUND rejection, wire up allocator at load time for tensors with no memory-planned data
method.cpp: FreeCall frees dynamic memory; destructor cleans up all
Module API auto-creates PalDynamicAllocator (DYNAMIC_UNBOUND just works)

Export changes:

MarkDynamicUnboundPass marks KV cache buffers as DYNAMIC_UNBOUND
--lazy_kv_cache flag for Llama export

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

Enable DYNAMIC_UNBOUND tensors in the portable runtime, allowing KV cache buffers to be dynamically managed rather than statically memory-planned. This is the architectural foundation for pay-as-you-go memory allocation in ExecuTorch LLM inference. Core changes: - DynamicAllocator interface with allocate/reallocate/free - PalDynamicAllocator default impl (PAL-backed, 2x growth policy) - TrackingDynamicAllocator for memory stats observability - MemoryManager gains 4th slot for DynamicAllocator (backward compatible) - TensorImpl gains dynamic_allocator_ and capacity_bytes_ fields - TensorImpl::internal_resize_contiguous handles DYNAMIC_UNBOUND resize - tensor_parser_portable.cpp: remove DYNAMIC_UNBOUND rejection, wire up allocator at load time for tensors with no memory-planned data - method.cpp: FreeCall frees dynamic memory; destructor cleans up all - Module API auto-creates PalDynamicAllocator (DYNAMIC_UNBOUND just works) Export changes: - MarkDynamicUnboundPass marks KV cache buffers as DYNAMIC_UNBOUND - --lazy_kv_cache flag for Llama export Co-authored-by: Claude <noreply@anthropic.com>

pytorch-bot · 2026-03-19T22:16:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18350

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 1 Unrelated Failure

As of commit f0b5b5f with merge base 02bad9d ():

NEW FAILURES - The following jobs have failed:

pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t b92e78b544e37fd2cd05fdb1d86a2d8e3fd3d0b3a603df2ca85ab3f526f66e86 /exec failed with exit code 139
pull / unittest / linux / linux-job (gh)
[ FAILED ] TensorImplTest.TestSetSizesContigUnbounded
pull / unittest / macos / macos-job (gh)
[ FAILED ] TensorImplTest.TestSetSizesContigUnbounded
pull / unittest / windows / windows-job (gh)
[ FAILED ] TensorImplTest.TestSetSizesContigUnbounded
pull / unittest-buck / linux / linux-job (gh)
[ FAILED ] TensorFactoryTest.MakeDynamismParameter
pull / unittest-buck / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 32
pull / unittest-editable / linux / linux-job (gh)
[ FAILED ] TensorImplTest.TestSetSizesContigUnbounded
pull / unittest-editable / macos / macos-job (gh)
[ FAILED ] TensorImplTest.TestSetSizesContigUnbounded
pull / unittest-editable / windows / windows-job (gh)
[ FAILED ] TensorImplTest.TestSetSizesContigUnbounded

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-llama-runner-linux (fp32, xnnpack+custom+quantize_kv, linux.2xlarge, executorch-ubuntu-22.04... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-19T22:16:56Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

digantdesai · 2026-03-20T16:31:12Z

examples/models/llama/export_llama_lib.py

+        default=False,
+        help="Mark KV cache buffers as DYNAMIC_UNBOUND so they are allocated "
+        "lazily at runtime instead of at load time. Reduces initial memory "
+        "usage when max_context_length is large.",


is this because we do actually touch the full memory during attention?

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026

psiddh changed the title ~~DYNAMIC_UNBOUND support for portable runtime: lazy KV cache allocation~~ <EXPERIMENTAL - DO NOT REVIEW> DYNAMIC_UNBOUND support for portable runtime: lazy KV cache allocation Mar 19, 2026

digantdesai reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<EXPERIMENTAL - DO NOT REVIEW> DYNAMIC_UNBOUND support for portable runtime: lazy KV cache allocation#18350

<EXPERIMENTAL - DO NOT REVIEW> DYNAMIC_UNBOUND support for portable runtime: lazy KV cache allocation#18350
psiddh wants to merge 1 commit intomainfrom
dynamic_unbound_kv_cache

psiddh commented Mar 19, 2026

Uh oh!

pytorch-bot bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

digantdesai Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

psiddh commented Mar 19, 2026

Summary

Test plan

Uh oh!

pytorch-bot bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18350

❌ 9 New Failures, 1 Unrelated Failure

Uh oh!

github-actions bot commented Mar 19, 2026

This PR needs a release notes: label

Uh oh!

digantdesai Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 19, 2026 •

edited

Loading

This PR needs a `release notes:` label