-
Notifications
You must be signed in to change notification settings - Fork 3
ci: DH-19408: PNAP Integration #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR migrates the benchmark infrastructure from Equinix to Phoenix NAP (PNAP) as the bare metal provider, with several related infrastructure improvements.
Changes:
- Integrated Phoenix NAP REST API for bare metal server provisioning and management, replacing Equinix Metal
- Upgraded GitHub Actions runners and deployed servers from Ubuntu 22.04 to 24.04
- Migrated from root user execution to non-root user with proper Docker group permissions
- Implemented automated server expiration and cleanup for ephemeral benchmark systems
- Added comprehensive wait states for server readiness and APT availability
- Disabled automatic system updates on benchmark servers to ensure consistent test environments
- Updated documentation to reflect new provider requirements and secrets
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 27 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/io/deephaven/benchmark/controller/DeephavenDockerController.java | Removed sudo prefix from all docker commands to support non-root user execution |
| docs/ForkSetup.md | Updated secrets documentation for Phoenix NAP provider and added reference to private vault |
| .github/workflows/remote-benchmarks.yml | Upgraded to Ubuntu 24.04 runners, added purge job for expired servers, added METAL_PROJECT_ID to environment |
| .github/workflows/adhoc-exist-remote-benchmarks.yml | Upgraded to Ubuntu 24.04 runners, fixed default test class list spacing |
| .github/workflows/adhoc-auto-remote-benchmarks.yml | Upgraded to Ubuntu 24.04 runners, updated copyright year, changed server plan, removed METAL_EXPIRE parameter, added METAL_PROJECT_ID |
| .github/scripts/setup-test-server-remote.sh | Added APT lock waiting, disabled automatic updates, configured SSH security, changed all paths from /root to /${HOME}, added usermod for docker group, added DEBIAN_FRONTEND export |
| .github/scripts/run-benchmarks-remote.sh | Updated paths from /root to /${HOME}, added userHome variable substitution in properties |
| .github/scripts/manage-deephaven-remote.sh | Updated paths from /root to /${HOME} |
| .github/scripts/fetch-results-local.sh | Updated path to use /home/${USER} format |
| .github/scripts/build-server-distribution-remote.sh | Updated paths from /root to /${HOME} |
| .github/scripts/build-docker-image-remote.sh | Updated paths from /root to /${HOME}, added copyright header |
| .github/scripts/build-benchmark-artifact-remote.sh | Updated paths from /root to /${HOME}, added copyright header |
| .github/scripts/adhoc.sh | Complete rewrite of deploy-metal, delete-metal functions for PNAP API, added purge-metal function, added getApiToken function, updated examples |
| .github/resources/terraform.tfstate | Added empty Terraform state file |
| .github/resources/*-scale-benchmark.properties | Updated docker.compose.file path to use ${userHome} variable substitution |
| .github/resources/adhoc-server-deploy.json | New JSON template for PNAP server deployment configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 22 out of 22 changed files in this pull request and generated 20 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/scripts/adhoc.sh
Outdated
| IP_ADDRESS=$(jq -r '.publicIpAddresses[0]' <<< "${RESPONSE}") | ||
| DEVICE_ID=$(jq -r '.id' <<< "${RESPONSE}") | ||
| echo "Got Address ${IP_ADDRESS} and Device Id ${DEVICE_ID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test -z does seem like a reasonable safety check here.
| for i in {1..60}; do | ||
| STATUS_RESPONSE=$(curl -s -H "Authorization: Bearer ${TOKEN}" "https://api.phoenixnap.com/bmc/v1/servers/${DEVICE_ID}") | ||
| SERVER_STATUS=$(echo "${STATUS_RESPONSE}" | jq -r '.status') | ||
| if [[ "${SERVER_STATUS}" == "powered-on" ]]; then | ||
| if nc -z -w 2 "${IP_ADDRESS}" 22 >/dev/null 2>&1; then | ||
| echo "Server is reachable" | ||
| STATUS=1 | ||
| break | ||
| fi | ||
| fi | ||
| sleep 10 | ||
| done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't necessarily add any checking, but maybe just print at least the last response if we are not actually ready.
Wrote an integration with Phoenix NAP (our new bare metal provider).
always()) routines was ~27 incidents last year, the probability of the purge-expired function having an impact is 0.28%. (Gotta love Copilot)