-
Notifications
You must be signed in to change notification settings - Fork 172
docs: Add boot failure detection documentation #1981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,54 @@ | ||||||||||||
| # Upgrade/rollback failure detection in bootc | ||||||||||||
|
|
||||||||||||
| This document describes how to detect when a reboot failed to enable the staged image in bootc. | ||||||||||||
|
|
||||||||||||
| ## Overview | ||||||||||||
|
|
||||||||||||
| bootc uses different mechanisms to detect boot failures depending on the backend (OSTree vs. composefs+UKI) and the specific point of failure. Understanding these mechanisms is crucial for system administrators and automated tooling that needs to detect failed updates. | ||||||||||||
|
|
||||||||||||
| ## OSTree Backend Boot Failure Detection | ||||||||||||
|
|
||||||||||||
| For systems using the traditional OSTree backend, bootc relies on OSTree's built-in boot failure detection mechanisms. | ||||||||||||
|
|
||||||||||||
| ### Key Services | ||||||||||||
|
|
||||||||||||
| 1. **`ostree-finalize-staged.service`** - Runs during shutdown to finalize staged deployments | ||||||||||||
| 2. **`ostree-boot-complete.service`** - Runs early in boot to detect finalization failures | ||||||||||||
|
|
||||||||||||
| When `ostree-finalize-staged.service` fails during shutdown/reboot, this will create | ||||||||||||
| a stamp file in `/boot`, and then on a subsequent reboot the `ostree-boot-complete.service` | ||||||||||||
| service will detect it, and then itself exit with a failure mode. | ||||||||||||
|
|
||||||||||||
| You can monitor the success of both services, though for `ostree-finalize-staged.service` | ||||||||||||
| note that the failure occurred during the previous boot's shutdown. | ||||||||||||
|
Comment on lines
+22
to
+23
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence is a bit long and could be clearer. Consider splitting it or rephrasing the latter part for improved readability.
Suggested change
|
||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| ## Composefs Backend Boot Failure Detection | ||||||||||||
|
|
||||||||||||
| ### Key Services | ||||||||||||
|
|
||||||||||||
| There is a `bootc-finalize-staged.service` which is similar to `ostree-finalize-staged.service`, | ||||||||||||
| but there is not currently a similar `-boot-complete.service`. There is also a `bootc-root-setup.service` | ||||||||||||
| that runs during initramfs to mount the composefs image and set up `/etc` and `/var` - but if this | ||||||||||||
| service fails, the system will not boot at all (emergency mode or hang). | ||||||||||||
|
Comment on lines
+30
to
+33
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence is quite long and contains multiple clauses. Splitting it into two or more sentences would significantly improve readability.
Suggested change
|
||||||||||||
|
|
||||||||||||
| At the current time then, it is recommended to check the journal for failures from the previous boot: | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The word "then" in "At the current time then" is redundant and can be removed for conciseness.
Suggested change
|
||||||||||||
|
|
||||||||||||
| ```bash | ||||||||||||
| # Check for finalization failures from previous boot | ||||||||||||
| journalctl -u bootc-finalize-staged.service -b -1 | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| ### Systemd Boot Assessment Integration | ||||||||||||
|
|
||||||||||||
| As of a recent OSTree with [this commit](https://github.com/ostreedev/ostree/commit/08487091256b93493f8d692e37ab3d892c758da1) | ||||||||||||
| it is possible to configure the boot loader entry counting. | ||||||||||||
|
|
||||||||||||
| At the current time, the composefs backend does not configure boot entry counting, this is likely to be added in the future. | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Statements about future plans like "this is likely to be added in the future" can quickly become outdated in documentation. It's generally better to state the current facts or phrase it more cautiously, e.g., "This feature is planned for future releases" or simply omit it if it's not critical for understanding the current state.
Suggested change
|
||||||||||||
|
|
||||||||||||
| ## See Also | ||||||||||||
|
|
||||||||||||
| - [systemd Automatic Boot Assessment](https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/) | ||||||||||||
| - [OSTree Manual](https://ostreedev.github.io/ostree/) | ||||||||||||
| - [bootc-rollback(8)](man/bootc-rollback.8.md) | ||||||||||||
| - [bootc-status(8)](man/bootc-status.8.md) | ||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing "and then itself exit with a failure mode" is a bit awkward. Consider rephrasing for better clarity and flow.