Skip to content

fix(py_image_layer): resolve symlinks in mtree specs to preserve them in tar layers (#567)#892

Open
mishas wants to merge 2 commits intoaspect-build:mainfrom
mishas:misha/fix-567-py-image-layer-symlinks
Open

fix(py_image_layer): resolve symlinks in mtree specs to preserve them in tar layers (#567)#892
mishas wants to merge 2 commits intoaspect-build:mainfrom
mishas:misha/fix-567-py-image-layer-symlinks

Conversation

@mishas
Copy link
Copy Markdown

@mishas mishas commented Mar 24, 2026

mtree_spec records every file as type=file even when files are symlinks (e.g. python/python3 -> python3.13 in the Python toolchain), causing tars to contain multiple full copies instead of lightweight links.

Add a _resolve_symlinks rule that post-processes mtree specs using a two-step readlink to detect real filesystem symlinks and convert them to type=link entries. Only relative symlink targets are converted; absolute targets are left as-is to avoid breaking cross-repo references.

Fixes: #567


Changes are visible to end-users: yes

  • Searched for relevant documentation and updated as needed: yes
  • Breaking change (forces users to change their own code or config): no
  • Suggested release notes appear below: yes
    Fix py_image_layer interpreter layer bloat by preserving filesystem-level symlinks (e.g. python -> python3.13), reducing the layer size by ~60%.

Test plan

  • New test cases added

… in tar layers (aspect-build#567)

mtree_spec records every file as type=file even when files are symlinks
(e.g. python/python3 -> python3.13 in the Python toolchain), causing
tars to contain multiple full copies instead of lightweight links.

Add a _resolve_symlinks rule that post-processes mtree specs using a
two-step readlink to detect real filesystem symlinks and convert them
to type=link entries. Only relative symlink targets are converted;
absolute targets are left as-is to avoid breaking cross-repo references.

Fixes: aspect-build#567
@aspect-workflows
Copy link
Copy Markdown

aspect-workflows bot commented Mar 24, 2026

Bazel 8 (Test)

All tests were cache hits

107 tests (100.0%) were fully cached saving 47s.


Bazel 9 (Test)

All tests were cache hits

107 tests (100.0%) were fully cached saving 1m 16s.


Bazel 8 (Test)

e2e

1 test target passed

Targets
//cases/py-image-layer-symlinks-567:test_resolve_symlinks [k8-fastbuild]58ms

Total test execution time was 58ms. 45 tests (97.8%) were fully cached saving 52s.


Bazel 9 (Test)

e2e

1 test target passed

Targets
//cases/py-image-layer-symlinks-567:test_resolve_symlinks [k8-fastbuild]68ms

Total test execution time was 68ms. 45 tests (97.8%) were fully cached saving 50s.


Bazel 8 (Test)

examples/uv_pip_compile

All tests were cache hits

1 test (100.0%) was fully cached saving 444ms.

@arrdem
Copy link
Copy Markdown
Contributor

arrdem commented Mar 24, 2026

I'll leave it to @thesayyn who has more deep context on this than I do to really review this change which on the surface looks great. My one comment is that rules_py is moving towards putting as much in the e2e tests as possible both for test organization and to isolate dependencies from the main repo, I'd ask that you shuffle (or have Claude shuffle) this test case into an e2e regression test case dir bearing the #567 issue suffix as with the others.

@mishas
Copy link
Copy Markdown
Author

mishas commented Mar 24, 2026

Thank you @arrdem for the quick review.
I've moved the tests into //e2e/cases/py-image-layer-symlinks-567/, hope that works for you.
Awaiting @thesayyn 's comments :).

Thanks again.

@mrdomino
Copy link
Copy Markdown

One comment here is that I elsewhere found that I wanted to do the same thing (resolve symlinks) for a Go image rule.

I wonder if it’s worth moving the dedup by symlinks logic into tar.bzl?

@mishas
Copy link
Copy Markdown
Author

mishas commented Mar 30, 2026

One comment here is that I elsewhere found that I wanted to do the same thing (resolve symlinks) for a Go image rule.

I wonder if it’s worth moving the dedup by symlinks logic into tar.bzl?

@mrdomino , In most cases Bazel is aware of symlinks in its output (e.g. they're declared with ctx.actions.declare_symlink), and then, there's no need to "touch" the filesystem to find them.

In this case, we download/extract the toolchain (Python) as-is, and thus Bazel never been told what's a symlink and what's a regular file.

As such - I believe the solution strategies should be different in those two cases.

Additionally, please note that there's an experimental preserve_symlinks option in tar.bzl already - it's not good for our case, but, maybe it's what you're looking for?
https://github.com/bazel-contrib/tar.bzl/blob/v0.10.0/tar/mtree.bzl#L63

Thanks for the comment :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: py_image_layer results in multiple shim copies

3 participants