Checks
Controller Version
0.12.1
Deployment Method
Helm
Checks
To Reproduce
- Create a Runner ScaleSet with
containerMode: kubernetes and scaleset name of length between 40 and 45 characters (e.g. arc-prd-my-awesome-continuous-integration-1)
- Create a matrix job with at least 2 jobs
- One of the jobs will fail with
Error: failed to create job pod: pods "arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow" already exists
Describe the bug
When the scaleset name has 40 or more characters (under the current 45-character limit) in kubernetes mode, the workflow pod creation code truncates the unique EphemeralRunner suffix. This results in naming collision for any new EphemeralRunners created in an EphemeralRunnerSet.
For a Scaleset EphemeralRunnerSet/arc-prd-my-awesome-continuous-integration-1-q5nb5r;
let's say 2 jobs are requested.
When job 1 comes in, it creates a new runner (EphemeralRunner) pod for a job arc-prd-my-awesome-continuous-integration-1-q5nb5-runner-f45dst which then creates the workflow pod arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow.
When job 2 comes in, it creates a new runner (EphemeralRunner) pod for a job arc-prd-my-awesome-continuous-integration-1-q5nb5-runner-g8fb7 which then creates the workflow pod arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow.
Job 2 fails on GitHub UI with error:
Error: failed to create job pod:
pods "arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow" already exists
As you can see, the workflow pod for job 1 and job 2 have the same name which causes the naming collision and the pod "already exists" error.
Describe the expected behavior
Workflow pod name for job 1 and job 2 should be different.
Additional Context
Controller Logs
ERROR EphemeralRunner Failed to create pod resource for ephemeral runner. {"version": "0.12.1", "ephemeralrunner": {"name":"arc-prd-my-awesome-continuous-integration-1-q5nb5-runner-g8fb7","namespace":"arc-runner"}, "error": "pods \"arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow\" already exists"}
Runner Pod Logs
Checks
Controller Version
0.12.1
Deployment Method
Helm
Checks
To Reproduce
containerMode: kubernetesand scaleset name of length between 40 and 45 characters (e.g.arc-prd-my-awesome-continuous-integration-1)Error: failed to create job pod: pods "arc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow" already existsDescribe the bug
When the scaleset name has 40 or more characters (under the current 45-character limit) in kubernetes mode, the workflow pod creation code truncates the unique EphemeralRunner suffix. This results in naming collision for any new EphemeralRunners created in an EphemeralRunnerSet.
For a Scaleset
EphemeralRunnerSet/arc-prd-my-awesome-continuous-integration-1-q5nb5r;let's say 2 jobs are requested.
When job 1 comes in, it creates a new runner (EphemeralRunner) pod for a job
arc-prd-my-awesome-continuous-integration-1-q5nb5-runner-f45dstwhich then creates the workflow podarc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow.When job 2 comes in, it creates a new runner (EphemeralRunner) pod for a job
arc-prd-my-awesome-continuous-integration-1-q5nb5-runner-g8fb7which then creates the workflow podarc-prd-my-awesome-continuous-integration-1-q5nb5-runn-workflow.Job 2 fails on GitHub UI with error:
As you can see, the workflow pod for job 1 and job 2 have the same name which causes the naming collision and the pod "already exists" error.
Describe the expected behavior
Workflow pod name for job 1 and job 2 should be different.
Additional Context
n/aController Logs
Runner Pod Logs