[DX-3383] Increase Docker Build runner power#21699
Conversation
|
👋 kalverra, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
|
✅ No conflicts with other open PRs targeting |
There was a problem hiding this comment.
Pull request overview
Risk Rating: MEDIUM
This PR updates the integration test workflow to use larger self-hosted “runs-on” builder runners for Docker image builds, aiming to reduce build time at a modest additional monthly cost.
Changes:
- Increase self-hosted builder runner sizing from ~32–36 CPU / 64–72 GB to ~64–72 CPU / 128–192 GB for core and plugins image builds.
- Increase the GitHub-hosted fallback runner size in the build matrix from
ubuntu22.04-8cores-32GBtoubuntu22.04-16cores-64GB.
Suggested reviewers (per .github/CODEOWNERS): @smartcontractkit/devex-cicd, @smartcontractkit/devex-tooling, @smartcontractkit/core.
| - name: "" | ||
| runner: ${{ needs.labels.outputs.builder-runner-label-core || 'ubuntu22.04-8cores-32GB' }} | ||
| runner: ${{ needs.labels.outputs.builder-runner-label-core || 'ubuntu22.04-16cores-64GB' }} | ||
| dockerfile: core/chainlink.Dockerfile |
There was a problem hiding this comment.
The inline || 'ubuntu22.04-16cores-64GB' fallback is probably dead code: needs.labels.outputs.builder-runner-label-core should always be set by the labels job outputs (either to the self-hosted runs-on label or, on opt-out, to GH_BUILDER_RUNNER). Consider removing the fallback to avoid confusion, or keep it but align it with the opt-out runner so defaults don’t drift.
| - name: (plugins) | ||
| runner: ${{ needs.labels.outputs.builder-runner-label-plugins || 'ubuntu22.04-8cores-32GB' }} | ||
| runner: ${{ needs.labels.outputs.builder-runner-label-plugins || 'ubuntu22.04-16cores-64GB' }} |
There was a problem hiding this comment.
Same for plugins: || 'ubuntu22.04-16cores-64GB' is likely unused given the labels job always sets the runner label output. Consider removing it (or aligning defaults) to keep runner selection logic simpler.
| runner: ${{ needs.labels.outputs.builder-runner-label-plugins || 'ubuntu22.04-16cores-64GB' }} | |
| runner: ${{ needs.labels.outputs.builder-runner-label-plugins }} |
| SH_BUILDER_RUNNER_CORE: runs-on=${{ github.run_id }}-core/cpu=64+72/memory=128+192/family=c6i+c7i+c5.*/extras=s3-cache+tmpfs | ||
| SH_BUILDER_RUNNER_PLUGINS: runs-on=${{ github.run_id }}-plugins/cpu=64+72/memory=128+192/family=c6i+c7i+c5.*/extras=s3-cache+tmpfs |
There was a problem hiding this comment.
Doesn't really seem like it's saving 30 seconds consistently. First other run I found showed a 10 second difference:
-
This PR (5m8s): https://github.com/smartcontractkit/chainlink/actions/runs/23554547567/job/68577643373?pr=21699
-
This PR Plugins (4m14s): https://github.com/smartcontractkit/chainlink/actions/runs/23554547567/job/68577643314?pr=21699
-
Other (5m18s): https://github.com/smartcontractkit/chainlink/actions/runs/23553695576/job/68574579072
-
Other Plugins (4m19s): https://github.com/smartcontractkit/chainlink/actions/runs/23553695576/job/68574579076
There was a problem hiding this comment.
Can look at the resource metrics now as gathered by runs-on/action:
Old setup doesn't saturate the CPU: https://github.com/smartcontractkit/chainlink/actions/runs/23553695576/job/68574579076#step:13:46
So is it bottlenecked on a few cores?
There was a problem hiding this comment.
Speed up could be from the runner queue time, and favouring more resourceful ec2 instances, might make them more available?
There was a problem hiding this comment.
We could start trying to cache docker layers?
chainlink-ccv has this setup: https://github.com/smartcontractkit/chainlink-ccv/blob/main/.github/actions/build-cl/action.yaml#L43-L47
Would need to wire this into ctf-build-image, and pass it to our internal action which already supports it iirc.
There was a problem hiding this comment.
Seems like we could adjust these to better support caching?
Increases Docker Build runners from
32 core->64 coremachines. Gives us a speedup of 29s/run, for an extra estimated cost of $50/month