Skip to content

Conversation

@priteau
Copy link
Member

@priteau priteau commented Feb 6, 2026

We often see CI failures where the reboot playbook is successful, but the growroot playbook invoked immediately after it fails with a Connection timed out error.

Add an extra delay after the reboot to ensure host has finished booting successfully.

@priteau priteau self-assigned this Feb 6, 2026
@priteau priteau requested a review from a team as a code owner February 6, 2026 08:34
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a 30-second delay after rebooting a host to prevent connection issues with subsequent tasks. The change is reasonable and directly addresses the CI failures mentioned. I've suggested making the delay configurable by using a variable instead of a hardcoded value, which would improve maintainability.

We often see CI failures where the reboot playbook is successful, but
the growroot playbook invoked immediately after it fails with a
`Connection timed out` error.

Add a small 5 seconds extra delay after the reboot to ensure hosts have
finished booting successfully.

The delay can be customised with the `post_reboot_delay_s` variable.
@priteau priteau requested a review from Alex-Welsh February 6, 2026 12:34
@priteau priteau changed the title Add extra 30 seconds delay to reboot playbook Add extra delay to reboot playbook Feb 6, 2026
@priteau
Copy link
Member Author

priteau commented Feb 6, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve the reliability of the reboot playbook by adding a post-reboot delay. This is a sensible addition to address the CI failures described. However, the implementation introduces a critical issue by incorrectly applying an | int filter to the reboot_timeout parameter. This would cause the timeout to be effectively zero, likely leading to more failures. I have left a specific comment with a suggested fix for this issue. The addition of the post_reboot_delay and the corresponding release note are otherwise correct.

@priteau priteau merged commit d6cab4a into stackhpc/2025.1 Feb 6, 2026
21 of 22 checks passed
@priteau priteau deleted the reboot-delay branch February 6, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants