Fix: build-context archiving correctness and performance#1341
Fix: build-context archiving correctness and performance#1341mazdak wants to merge 6 commits intoapple:mainfrom
Conversation
275885b to
f120d30
Compare
|
@mazdak Are these fixes and optimizations generally useful? If so, why can't they be applied to the archiver implementation in the containerization library, as opposed creating a separate implementation just for container? |
Yes, they are indeed generally useful. Let me see about adding them to the Containerization lib and I'll circle back |
|
Thanks for looking into this. Another thing that would be helpful would be to provide concrete examples of what was wrong for each of the "archive correctness" points and what the fix will do instead. For anything that fails in |
It's a been a coupe of weeks, so I am going to dig and find out what the issue was. But no, this problem does not exist in docker. I have now made two PRs and can probably close this one: |
I also just added more context on the original issue in #1391 |
Type of Change
Motivation and Context
While building out a Docker Compose-style plugin and validating it against our own real development stack, we ran into two classes of problems in the build-context path:
This PR keeps the earlier archive/symlink correctness fixes and adds a follow-up performance pass in the same codepath.
Problems addressed
Archive correctness
The previous implementation had several issues around symlink handling and archive identity:
These issues could break staged/remote builds or make archive digests unstable for cache/integrity purposes.
Build-context performance
The build-context sync path was also doing unnecessary work on large repositories:
BuildFSSynccomputed the included file set, thenArchiver.compressre-enumerated the source tree againBuildFSSyncadded avoidable overhead during context preparationIn practice, this showed up as very slow
[internal] load build contexttimes while testing against a real repo.What changed
Correctness
Performance
BuildFSSynchandArchiverthe already-computed entry list instead of forcing a second full-tree archive walkTesting