Backend: Add BackendWithFallback to retry IO#215
Backend: Add BackendWithFallback to retry IO#215riley-dixon wants to merge 9 commits intodevelopfrom
Conversation
0234797 to
8b8ccfa
Compare
8b8ccfa to
4025ee3
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the Backend IO pathway to centralize common logic (stats updates + fallback retry orchestration) and introduces BackendWithFallback so a backend (e.g., Fastpath) can automatically retry failed IO through an alternate backend (e.g., Fallback).
Changes:
- Added
BackendWithFallbackand refactoredBackendto route IO through a new_io_impl()hook, withio()acting as the common entry point. - Wired Fastpath to optionally register and use a fallback backend when
_io_impl()throws and the fallback backend accepts the request. - Added/updated unit + integration tests and build plumbing for the new backend behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/amd_detail/backend.h |
Introduces _io_impl(), stats update hooks, and new BackendWithFallback interface. |
src/amd_detail/backend.cpp |
Implements the shared Backend::io() flow and fallback retry logic for BackendWithFallback::io(). |
src/amd_detail/backend/fastpath.h / src/amd_detail/backend/fastpath.cpp |
Migrates Fastpath to BackendWithFallback, moves IO submission into _io_impl(), adds stats hook implementations. |
src/amd_detail/backend/fallback.h / src/amd_detail/backend/fallback.cpp |
Adds stats hook implementations and _io_impl(); adjusts IO helper structure for chunking. |
src/amd_detail/state.cpp |
Creates/registers a shared Fallback backend instance and registers it as Fastpath’s fallback when enabled. |
src/amd_detail/CMakeLists.txt |
Adds new backend.cpp to the library build. |
test/amd_detail/backend.cpp |
New unit tests for fallback retry behavior at the Backend layer. |
test/amd_detail/fastpath.cpp |
Adds Fastpath integration tests covering retry + rejection behaviors via fallback. |
test/amd_detail/mbackend.h |
Extends backend mocks to support the new hooks (_io_impl, stats updates) and adds a BackendWithFallback mock. |
test/amd_detail/CMakeLists.txt |
Adds new backend test file to the test build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| ssize_t nbytes = _io_impl(type, file, buffer, size, file_offset, buffer_offset); | ||
| switch (type) { | ||
| case (IoType::Read): | ||
| update_read_stats(nbytes); | ||
| break; | ||
| case (IoType::Write): | ||
| update_write_stats(nbytes); | ||
| break; | ||
| default: | ||
| break; | ||
| } | ||
| return nbytes; |
There was a problem hiding this comment.
Will add the guard to check for nbytes >= 0 , however I agree we may want to look at what we are returning. To my knowledge we only throw exceptions on errors at this moment in time.
4025ee3 to
756a18f
Compare
|
@riley-dixon I've opened a new pull request, #227, to work on those changes. Once the pull request is ready, I'll request review from you. |
One challenge here was ensuring that the right stats counter gets incremented when an IO is retried, or trying to prevent an exception being raised by the stats module from causing the IO to be retried. Parts of RetryableBackend may want to be brought up to Backend, like the update_stats functions.
Adds tests for the behaviour of the retry mechanism, as well as any default behaviour from the RetryableBackend.
There was a problem hiding this comment.
Pull request overview
This PR introduces an automatic IO retry mechanism by adding a BackendWithFallback interface and refactoring backend IO submission so failures from one backend (e.g., Fastpath) can be retried via a registered fallback backend (e.g., Fallback).
Changes:
- Refactor
BackendIO flow by introducing_io_impl()and centralizing stats updates inBackend::io(). - Add
BackendWithFallbackand wire Fastpath to optionally retry failed IO using a registered fallback backend. - Add unit and integration tests covering fallback retry behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/amd_detail/backend.h |
Defines new Backend::io() entry point, _io_impl(), stats hooks, and BackendWithFallback. |
src/amd_detail/backend.cpp |
Implements shared IO/stats behavior and fallback retry logic. |
src/amd_detail/backend/fastpath.h |
Converts Fastpath to BackendWithFallback and moves IO into _io_impl(). |
src/amd_detail/backend/fastpath.cpp |
Implements _io_impl() plus Fastpath stats update hooks. |
src/amd_detail/backend/fallback.h |
Adds stats hooks and _io_impl() plumbing for the fallback backend. |
src/amd_detail/backend/fallback.cpp |
Routes IO through _io_impl() and ensures stats update behavior remains correct. |
src/amd_detail/state.cpp |
Creates a shared fallback backend instance and registers it with Fastpath. |
src/amd_detail/CMakeLists.txt |
Adds new production source backend.cpp to the build. |
test/amd_detail/mbackend.h |
Updates mocks for new backend interface and adds MBackendWithFallback. |
test/amd_detail/backend.cpp |
Adds unit tests validating fallback eligibility and retry behavior. |
test/amd_detail/fastpath.cpp |
Adds integration-level tests for Fastpath retrying via Fallback. |
test/amd_detail/CMakeLists.txt |
Adds new test source backend.cpp to the test build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
This change passes in the original parameters of the IO request for `is_retryable()` to process. By default, we now check that the fallback IO engine will accept the IO request prior to submitting it. This is useful in cases where the Fallback backend is available, but for some reason the request is still invalid. It also removes a hack that was used for testing this scenario with "optionally" mocking `is_retryable()`.
Note: retryable_io() -> _io_impl() Formatter moved the order of some functions around. Previously, a "RetryableBackend" would use io() to wrap the fallback mechanism before calling retryable_io() to actually perform the request. Now, every Backend will use _io_impl() which will be responsible for handling the IO request, leaving io() as the public front-end.
"Retryable" was considered to be a misnomer in this context. Instead of retrying the IO with the same backend, we were resubmitting the IO to a different Backend. While "BackendWithFallback" is not exactly a great name, its something we can change later.
This was originally written to BackendWithFallback. However, this could be further generalized to all Backends. _io_impl() is then only concerned with issuing the IO, letting the io() method do any additional processing.
517341b to
4f131e6
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces an automatic IO retry mechanism via a new BackendWithFallback interface, refactoring backend IO submission so a failing backend (e.g., Fastpath) can transparently retry with a registered fallback backend (e.g., Fallback).
Changes:
- Adds
BackendWithFallbackwith built-in retry logic and refactorsBackend::io()to centralize shared concerns (stats, retry coordination) while pushing submission into_io_impl(). - Updates Fastpath and Fallback backends to the new
_io_impl()+update_*_stats()model and wires Fastpath→Fallback registration inDriverState. - Adds unit/integration tests covering successful fallback retry and exception propagation when fallback is ineligible.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/amd_detail/backend.h |
Refactors Backend API and introduces BackendWithFallback interface. |
src/amd_detail/backend.cpp |
Implements new shared IO entry points and fallback retry logic. |
src/amd_detail/backend/fastpath.h |
Converts Fastpath to BackendWithFallback and moves IO to _io_impl(). |
src/amd_detail/backend/fastpath.cpp |
Implements _io_impl() and stat update hooks for Fastpath. |
src/amd_detail/backend/fallback.h |
Adds stat update hooks and _io_impl() plumbing for Fallback. |
src/amd_detail/backend/fallback.cpp |
Routes IO through _io_impl() and updates stats on success. |
src/amd_detail/state.cpp |
Registers a fallback backend and attaches it to Fastpath when enabled. |
src/amd_detail/CMakeLists.txt |
Adds new backend.cpp compilation unit to the library build. |
test/amd_detail/mbackend.h |
Extends mocks to support _io_impl() and stats update hooks; adds MBackendWithFallback. |
test/amd_detail/backend.cpp |
Adds unit tests for fallback eligibility and retry behavior. |
test/amd_detail/fastpath.cpp |
Adds integration-level tests validating Fastpath retry via fallback. |
test/amd_detail/CMakeLists.txt |
Adds the new backend unit test file to the internal test target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| throw; | ||
| } | ||
| return nbytes; | ||
| } |
There was a problem hiding this comment.
Similar in nature to one of the previous CoPilot suggestions.
4f131e6 to
34d8412
Compare
Motivation
Modifies Backend & adds a new BackendWithFallback interface to re-attempt an IO request that fails with a different Backend. For example, if a Fastpath IO request fails, the Fallback path can automatically retry the IO.
Technical Details
BackendWithFallbackbackends can optionally register another Backend as eligible to retry failed IO requests._io_impl()has been added which handles only the logic required to submit the IO request to the kernel/driver.io()now acts as the entry point and can handle coordinating other tasks such as updating stats or the auto fallback mechanism. (Part of the rationale for this change is so the auto fallback mechanism only retries the IO if the IO failed and not something else.)Test Plan
Added unit tests and Fastpath integration level tests. System level tests will be added separately as it is a non-trivial addition.
Reviewer Note
When reviewing the first couple of commits, you may wish to be aware of certain changes made later on.
AIHIPFILE-100