Skip to content

GHA: targeted cleanup — preserve BuildKit ccache across runs#229

Open
rucoder wants to merge 2 commits intolf-edge:eve-kernel-amd64-v6.12.49-genericfrom
rucoder:rucoder/gha-cleanup
Open

GHA: targeted cleanup — preserve BuildKit ccache across runs#229
rucoder wants to merge 2 commits intolf-edge:eve-kernel-amd64-v6.12.49-genericfrom
rucoder:rucoder/gha-cleanup

Conversation

@rucoder
Copy link
Copy Markdown
Collaborator

@rucoder rucoder commented Mar 29, 2026

Summary

Finally move back to on: pull_request and remove docker login. Also rework broken clean-up target and properly control disk consumption

Replace the nuclear docker system prune -af / docker volume prune -af in CI cleanup with targeted removal, preserving BuildKit ccache across runs on self-hosted runners.

Changes

Commit 1: Pin actions to commit SHAs and bump to latest versions

  • actions/checkout: v4 → v6.0.2 (pinned to SHA)
  • docker/login-action: v3 → v4.0.0 (pinned to SHA)
  • Fixes Node.js 20 deprecation warning

Commit 2: Targeted cleanup — preserve BuildKit ccache

  • Replace docker system prune -af and docker volume prune -af with targeted removal of only the image we just built
  • Add clean-gcc/clean-clang Makefile targets that remove specific image from both linuxkit cache and docker
  • Add BuildKit cache pruning with configurable quota via BUILDKIT_KEEP_STORAGE_MB env var (defaults to 32000 MB)
  • Print BuildKit disk usage before/after pruning for diagnostics
  • Use buildctl inside linuxkit-builder container for cache management
  • Switch pr-build.yml from pull_request_target to pull_request
  • Remove unnecessary Docker Hub login from PR builds

Testing

Tested via PR #205 against gha-test-rebase branch on production runners.

- actions/checkout: v4 -> v6.0.2 (de0fac2e4500dabe0009e67214ff5f5447ce83dd)
- docker/login-action: v3 -> v4.0.0 (b45d80f862d83dbcd57f89517bcf500b2ab88fb2)

Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CI workflows to avoid destructive Docker cleanup on self-hosted runners by switching to targeted image removal and adding BuildKit cache pruning intended to preserve ccache across runs, while also pinning GitHub Actions to SHAs and tightening PR workflow security.

Changes:

  • Add clean-gcc/clean-clang targets and a clean-image-% rule to remove only the just-built image from linuxkit cache and Docker.
  • Replace docker system prune -af / docker volume prune -af with BuildKit cache diagnostics + buildctl prune using a configurable keep-storage setting.
  • Pin actions/checkout and docker/login-action to commit SHAs; change PR workflow trigger from pull_request_target to pull_request and remove Docker Hub login from PR builds.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
Makefile.eve Adds targeted cleanup targets for linuxkit cache + Docker images.
.github/workflows/publish.yml Pins actions; replaces “nuclear” prune with targeted cleanup + BuildKit pruning/diagnostics.
.github/workflows/pr-build.yml Switches to pull_request, removes Docker Hub login, and adopts targeted cleanup + BuildKit pruning/diagnostics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +53 to +58
echo "=== BuildKit disk usage before prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
echo "=== BuildKit disk usage after prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup step will fail the job if the BuildKit container isn’t running (e.g., if the build fails before it starts) because docker exec linuxkit-builder ... will exit non-zero. Since this is an if: always() cleanup, it should be resilient: check for the container first (e.g., docker ps -q -f name=linuxkit-builder) and/or make the docker exec commands non-fatal (|| true) so cleanup doesn’t mask the real failure.

Suggested change
echo "=== BuildKit disk usage before prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
echo "=== BuildKit disk usage after prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
if docker ps -q -f name=linuxkit-builder >/dev/null; then
echo "=== BuildKit disk usage before prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
echo "=== BuildKit disk usage after prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
else
echo "linuxkit-builder container not running; skipping BuildKit cleanup."
fi

Copilot uses AI. Check for mistakes.
Makefile.eve Outdated
$(IMAGE_TAG_NO_FLAVOR)-$*,)
clean-image-%:
$(LK) cache rm $(IMAGE_TAG_WITH_FLAVOR)-$* 2>/dev/null || true
docker rmi docker.io/$(IMAGE_TAG_WITH_FLAVOR)-$* 2>/dev/null || true
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-image-% hardcodes docker.io/ when removing the image. Since IMAGE_REPOSITORY is configurable (and could include a different registry like ghcr.io/...), this can generate an invalid image reference (e.g., docker.io/ghcr.io/...). Consider removing the hardcoded docker.io/ prefix and using $(IMAGE_TAG_WITH_FLAVOR)-$* directly so cleanup works with any configured registry.

Suggested change
docker rmi docker.io/$(IMAGE_TAG_WITH_FLAVOR)-$* 2>/dev/null || true
docker rmi $(IMAGE_TAG_WITH_FLAVOR)-$* 2>/dev/null || true

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +66
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildctl prune --keep-storage expects a byte size (often with units like GB). Passing ${BUILDKIT_KEEP_STORAGE_MB:-32000} as a raw number while naming it “_MB” will likely be interpreted as 32000 bytes and prune far more aggressively than intended. Consider either (a) renaming the env var to reflect bytes/units, or (b) appending an explicit unit suffix (e.g., MB/GB) when invoking buildctl so the default keeps ~32GB as described in the PR.

Suggested change
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
echo "=== Reclaiming==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}MB" | tail -n 1

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +68
echo "=== BuildKit disk usage before prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
echo "=== BuildKit disk usage after prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup step will fail the job if the BuildKit container isn’t running (e.g., if the build fails before it starts) because docker exec linuxkit-builder ... will exit non-zero. Since this is an if: always() cleanup, it should be resilient: check for the container first (e.g., docker ps -q -f name=linuxkit-builder) and/or make the docker exec commands non-fatal (|| true) so cleanup doesn’t mask the real failure.

Suggested change
echo "=== BuildKit disk usage before prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
echo "=== BuildKit disk usage after prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
if docker ps -q -f name=linuxkit-builder >/dev/null 2>&1; then
echo "=== BuildKit disk usage before prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2 || true
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1 || true
echo "=== BuildKit disk usage after prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2 || true
else
echo "linuxkit-builder container not running; skipping BuildKit disk cleanup."
fi

Copilot uses AI. Check for mistakes.
echo "=== BuildKit disk usage before prune ==="
docker exec linuxkit-builder buildctl du | tail -n 2
echo "=== Reclaiming ==="
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildctl prune --keep-storage expects a byte size (often with units like GB). Passing ${BUILDKIT_KEEP_STORAGE_MB:-32000} as a raw number while naming it “_MB” will likely be interpreted as 32000 bytes and prune far more aggressively than intended. Consider either (a) renaming the env var to reflect bytes/units, or (b) appending an explicit unit suffix (e.g., MB/GB) when invoking buildctl so the default keeps ~32GB as described in the PR.

Suggested change
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}" | tail -n 1
docker exec linuxkit-builder buildctl prune --keep-storage "${BUILDKIT_KEEP_STORAGE_MB:-32000}MB" | tail -n 1

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--keep-storage value Keep data below this limit (in MB) (default: 0)

- Replace nuclear 'docker system prune -af' and 'docker volume prune -af'
  with targeted removal of only the image we just built
- Add clean-gcc/clean-clang Makefile targets that remove specific image
  from both linuxkit cache and docker
- Add BuildKit cache pruning with configurable quota via BUILDX_KEEP_STORAGE_MB
  env var (defaults to 20000 MB) to prevent unbounded disk growth
- Print BuildKit disk usage before pruning for diagnostics
- Use buildctl inside linuxkit-builder container for cache management
- Switch pr-build.yml from pull_request_target to pull_request
- Remove unnecessary Docker Hub login from PR builds

Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 3 to 7
on: # yamllint disable-line rule:truthy
pull_request_target:
pull_request:
branches:
- "gha-test-rebase"
- "eve-kernel-*"
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching this workflow to pull_request means PR code is executed on runs-on: [runner-xl, X64, Linux] (a self-hosted runner label). If PRs can come from forks, this enables untrusted contributors to run arbitrary code on the self-hosted runner host, which is a high-risk security posture. Consider gating the job to same-repo PRs (e.g., only when github.event.pull_request.head.repo.full_name == github.repository), or moving PR builds to GitHub-hosted runners / a hardened ephemeral runner pool for external contributions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants