Skip to content

[RFC] Uniform service spawning with external 'resources'#740

Open
williampMSFT wants to merge 17 commits intoOpenDevicePartnership:v0.2.0from
williampMSFT:user/williamp/es-uniform-spawn-with-innersvc
Open

[RFC] Uniform service spawning with external 'resources'#740
williampMSFT wants to merge 17 commits intoOpenDevicePartnership:v0.2.0from
williampMSFT:user/williamp/es-uniform-spawn-with-innersvc

Conversation

@williampMSFT
Copy link
Contributor

This is a proof-of-concept attempt to create a common pattern for initializing and starting the services in embedded-services. It demonstrates this pattern on the time-alarm and espi service.

Currently, starting the various services in embedded-services is pretty complicated. Each service has a different set of 'worker tasks' that users have to declare and start, each of which has varying numbers of parameters of varying types. It's very easy to miss one, and whenever a change comes in that changes what state a given task needs to function, that's a breaking change that, in the best case, causes a build failure, and in the worst case causes the service to silently misbehave.

This attempts to solve this disparity with a few tricks:

  1. Rather than having N worker tasks, we can select!() over a list of worker tasks. This works because worker 'green threads' were already intended to never return, so the select never returns either. This lets us execute them all against a single embassy task.
  2. Rather than having each of those worker tasks require a different set of parameters, the onus for figuring out which parameters need to go to what is placed on the service author. At service construction time, the service author must return a (Service, ServiceRunner) tuple. The ServiceRunner implementation which has a single method that takes no arguments and never returns. This has a few benefits:
  • It makes changes to the number and shape of 'green thread' functions transparent to the end user.
  • It makes it impossible for users to only launch some of the tasks
  • It makes it impossible for users to launch tasks in an order that might cause problems
  • It makes it difficult (but not impossible) to forget to launch the task at all - they get a ServiceRunner object back when the instantiate the service, and if they don't use it the compiler will complain about unused variables. They can bypass the warning with a leading _, but that's an explicit action the programmer has to take. I haven't found a way to outright prevent this class of error, unfortunately.
  1. Require a &'hw mut Resources as an argument at service creation time. This Resources object is the 'real' service, and by taking it as a mutable borrow rather than a reference-to-oncelock or similar, we avoid prescribing any particular way of storing the memory required to run the service.
  2. Create a spawn_service!() macro that does all the boilerplate of declaring storage/tasks and spawning the tasks on Embassy for a given service. In product use cases, this makes it outright impossible to forget to start the runner, and reduces the ~10 lines of somewhat verbose boilerplate required for each service down to zero.

This approach aligns with a what a few other embassy services are doing - see embassy_net::new for a good example.

Some notes on testing -
The spawn_service!() macro isn't useful in test scenarios where you need to run on tokio, but in those situations you can manually call new() and run() with stack variables thanks to (3). In order to have the long-running task future terminate at the end of the test, you can simply select() over runner.run() and some test code; when the test code returns, the worker 'green thread' stops and the test exits. This technique is demonstrated in tad_test.rs.

Copilot AI review requested due to automatic review settings February 28, 2026 00:23
@williampMSFT williampMSFT requested review from a team as code owners February 28, 2026 00:23
@williampMSFT
Copy link
Contributor Author

williampMSFT commented Feb 28, 2026

A couple slightly different approaches to this problem are here: #727 #726

I think this is probably the best of them, though.

@williampMSFT williampMSFT requested a review from asasine February 28, 2026 00:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces an RFC service initialization/spawning pattern in embedded-services based on explicit Resources storage and a Runner handle, and applies it to the time-alarm and eSPI services (plus updating an example and tests) to reduce per-service task boilerplate and make service startup uniform.

Changes:

  • Add embedded_services::service::{Service, ServiceResources, ServiceRunner} traits and a spawn_service!() macro to standardize service construction and task spawning.
  • Refactor time-alarm-service and espi-service to use (Service, Runner) returned from Service::new(...) with external Resources, removing the old task modules.
  • Update the RT685 example and time-alarm tokio tests to use the new APIs.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
embedded-service/src/service.rs Adds the new service traits and spawn_service!() macro for uniform initialization/spawning.
embedded-service/src/lib.rs Exposes the new service module publicly.
time-alarm-service/src/lib.rs Refactors time-alarm into Resources + Runner + control-handle Service implementing the new trait.
time-alarm-service/src/task.rs Removes the old dedicated task wrapper API.
time-alarm-service/tests/tad_test.rs Updates tokio tests to construct resources and drive the new Runner.
examples/rt685s-evk/src/bin/time_alarm.rs Switches the example to spawn_service!() and updates relay handler wiring accordingly.
espi-service/src/espi_service.rs Refactors eSPI service into Resources + Runner + control-handle Service implementing the new trait.
espi-service/src/task.rs Removes the old task wrapper API.
espi-service/src/lib.rs Stops exporting the removed task module.
Comments suppressed due to low confidence (4)

embedded-service/src/service.rs:27

  • Docs here still refer to an init() API, but the trait method is named new(...). Please update the wording so it matches the current public API and avoids sending users to a non-existent init() method.
/// A trait for a service that can be run on the EC.
/// Implementations of Service should have an init() function to construct the service that
/// returns a Runner, which the user is expected to spawn a task for.
pub trait Service<'hw>: Sized {
    /// A type that can be used to run the service. This is returned by the init() function and the user is
    /// expected to call its run() method in an embassy task (or similar parallel execution context on other
    /// async runtimes).
    type Runner: ServiceRunner<'hw>;

    /// Any memory resources that your service needs.  This is typically an opaque types that is only used by the service
    /// and is not interacted with by users of the service. Must be default-constructible for spawn_service!() to work.
    type Resources: ServiceResources;

    /// The error type that your `init` function can return on failure.
    type ErrorType;

    /// Any initialization parameters that your service needs to run.
    type InitParams;

    /// Initializes an instance of the service using the provided storage and returns a control handle for the service and
    /// a runner that can be used to run the service.
    fn new(
        storage: &'hw mut Self::Resources,
        params: Self::InitParams,
    ) -> impl core::future::Future<Output = Result<(Self, Self::Runner), Self::ErrorType>>;

embedded-service/src/service.rs:51

  • spawn_service! docs mention creating a OnceLock and calling init(), but the macro body uses StaticCell + Service::new(...). Please update this doc block to reflect the actual storage mechanism and call flow.
/// Initializes a service, creates an embassy task to run it, and spawns that task.
///
/// This macro handles the boilerplate of:
/// 1. Creating a `static` [`OnceLock`](embassy_sync::once_lock::OnceLock) to hold the service
/// 2. Calling the service's `init()` method
/// 3. Defining an embassy_executor::task to run the service
/// 4. Spawning the task on the provided executor

time-alarm-service/src/lib.rs:300

  • Resources::new() is only provided via the ServiceResources trait impl, so callers must import that trait just to construct resources. Consider adding an inherent pub fn new() (or Default) and have the trait impl forward to it, to make manual instantiation ergonomic.

impl<'hw> embedded_services::service::ServiceResources for Resources<'hw> {
    /// Allocate storage for the service's resources.
    fn new() -> Self {
        Resources { inner: None }

espi-service/src/espi_service.rs:45

  • Resources::new() is only available via the ServiceResources trait method, so callers need that trait in scope (or UFCS) to construct resources. If you expect manual instantiation, consider adding an inherent pub fn new()/Default and forwarding the trait method to it.
{
    fn new() -> Self {
        Self { inner: None }
    }
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@williampMSFT
Copy link
Contributor Author

@asasine Robert mentioned you might have an interest in this - would love to get your feedback :)

@asasine
Copy link
Contributor

asasine commented Feb 28, 2026

Hey @williampMSFT, he's right, this is interesting. I think it has a lot of merit and we've been experimenting with a similar idea internally. We ultimately walked back on something as predefined like Service but did find some related benefits.

We found that the perceived rigidity of a Service-like trait wasn't appealing to devs without a strong background or inclination to Rust's more opinionated design guidelines as compared to C. One goal of the experiment was to simplify the process of breaking existing systems down into smaller components to improve extensibility and testability. However, the prototypes actually showed the opposite: the highly generic Service trait made boundaries between systems too fuzzy to identify clean cut points, and refactoring existing systems into the Service trait led to massive objects for the associated types (an accurate reflection of our existing monoliths but I digress) and that didn't garner much buy-in.

The main benefit we did discover was adding new traits for the behavior, which effectively makes the async fn run(self) -> ! generic over that trait. These can be spliced into existing systems with as small of a surface area as desired, making it low risk and immediately testable.

It's possible that a consolidated Service trait may become workable down the road, once we've broken down more of our monoliths. I'm hopeful.

@williampMSFT
Copy link
Contributor Author

We found that the perceived rigidity of a Service-like trait wasn't appealing to devs without a strong background or inclination to Rust's more opinionated design guidelines as compared to C.

This makes sense, but I think the tradeoffs might be different in this repo than in Surface / an OEM's code? Embedded-services is sort of a 'library' repo, intended for consumption by many different applications. I'm definitely not suggesting that we require Surface / other OEMs to adopt this pattern in their code, but I think there's value in this sort of consistency in the public API surface for embedded-services, even if it's a bit more annoying for ODP devs to implement - thoughts?

One goal of the experiment was to simplify the process of breaking existing systems down into smaller components to improve extensibility and testability. However, the prototypes actually showed the opposite: the highly generic Service trait made boundaries between systems too fuzzy to identify clean cut points, and refactoring existing systems into the Service trait led to massive objects for the associated types (an accurate reflection of our existing monoliths but I digress) and that didn't garner much buy-in.

Interesting - I think I may be going for something slightly different here? I'm trying to simplify and standardize how we initialize ODP services so it's easier for users of those services to reason about them, mostly by moving all init parameters for a service to a single place, hiding the implementation details of worker tasks from end users, and coming up with a standard pattern for starting those. It's not strictly necessary to use traits at all to achieve this, though - it could just be a convention we adopt for ODP services that they do all the following:

  • take a &'hw mut Resources (where Resources is a service-defined type) as their first argument to new()
  • return a Result<(ControlHandle, Runner)> (where ControlHandle and Runner are service-defined types) from new()
  • have Runner always have a single method fn run(self) -> !

It seems to me like in this case, the main benefit of using a trait rather than a convention is for the authors of the ODP services - it's an easy way for the compiler to check that they've upheld these conventions - but the client of the API is unlikely to try to be generic over Service since you're right that it's extremely generic and the only thing you can really do with it is start it.

The main benefit we did discover was adding new traits for the behavior, which effectively makes the async fn run(self) -> ! generic over that trait. These can be spliced into existing systems with as small of a surface area as desired, making it low risk and immediately testable.

Not 100% sure I'm parsing this correctly, but I think you're saying that you found value in implementing traits for the behavior of individual services rather than in the common behaviors between all services (e.g. a TimeAlarmService trait that implements get_real_time()/set_real_time() / etc)? If that's what you meant, then I can definitely see the value in that as well and we should probably look into doing that too, but it might be orthogonal to what I'm trying to achieve here.

It's possible that a consolidated Service trait may become workable down the road, once we've broken down more of our monoliths. I'm hopeful.

What are your thoughts on trying to move ODP services to function this way now? I'm not trying to impose this pattern on Surface code, I'm just trying to push for a set of API conventions that make ODP code more easily usable by Surface (and other OEMs') code :)

@asasine
Copy link
Contributor

asasine commented Mar 3, 2026

I see two facets to this RFC that I want to address:

  1. embedded-services is a library that Surface/OEMs consume for many different applications.
  2. ODP is an organization that wants to standardize patterns.

These overlap when APIs are public but I think they have many orthogonalities.


Gonna start with the second one because it's simpler from my perspective. I'm totally on board with Service as a useful trait to ensure ODP developers follow standard practices when designing long-running tasks that are intended to be wrapped up in an #[embassy_executor::task]. As an API convention, I think it accomplishes your goals. It doesn't seem to have much effect on users of the embedded-services crate.


Now, the first one: embedded-services as a library. This particular RFC doesn't bring much for me as a consumer of this library. Regarding this question:

Not 100% sure I'm parsing this correctly, but I think you're saying that you found value in implementing traits for the behavior of individual services rather than in the common behaviors between all services (e.g. a TimeAlarmService trait that implements get_real_time()/set_real_time() / etc)? If that's what you meant, then I can definitely see the value in that as well and we should probably look into doing that too, but it might be orthogonal to what I'm trying to achieve here.

Yeah, that's what I mean. It is mostly orthogonal but there's overlap as mentioned because it influences the scope of each Service and the way that each of its generics/associated types are designed. I've been fortunate to be part of various conversations where targetting embedded-services as a library has come up and I want to keep chasing it as an invaluable project goal to improve the adoption experience.

Writing tests over a generic async fn power_led_task(led: impl Led, power_state: impl Stream<Item = bool>) -> ! has been useful. Even though a trait Led and embedded_hal::digital::OutputPin have very similar APIs (turn on/off vs. set high/low), the level of abstraction for trait Led is more semantically appropriate for the "power LED" and its associated tasks. Once that trait boundary is in place, we can make other tasks that want to control the LED generic over an impl Led, whether that's backed by the real HAL peripheral, by a Channel that implements the trait and feeds into power_led_task, or a Mutex<Peripheral> that is locked and driven directly where it's used.

If embedded-services is vying to be a library that OEMs adopt, these types of traits are the key for us. There's so much customization that happens internally that rigid tasks become difficult to extend, which leads to extension-through-modification (e.g., #734), a violation of the open-closed principle in SOLID. When the only way to consume embedded-services is through async fn task(_) -> ! (whatever form that takes) and other static items (e.g., static functions or private static globals), extension becomes very difficult. Our internal experiments have been centered on how to apply SOLID principles to embedded designs using Rust's specific set of features that help us maximize flexibility, testability, and maintainability.

If we can get the right boundaries into ODP, we can start building up a comprehensive suite of crates on both sides of those traits. We can write testabe tasks (less contrived than power_led_task but hopefully the gist is there) that are generic over these traits, and we can write testable drivers and utilities that implement these traits. Straightforward use cases create an impl with hardware, provide it to a task, and spawn that in a dedicated #[task] (or super-loop it in main). The real world isn't always straightforward, so when product-specific needs demand it, the dream here is multifold:

  1. extend the behaviors of the embedded-services tasks by substituting generic objects with new types that wrap or replace existing impls provided by embedded-services, without needing to send PRs or publically expose proprietary details (O, D)
  2. what needs to be wrapped/replaced has a finite blast radius because the right boundaries are already in place: nothing more and nothing less than exactly what needs to be customized (S, I)
    • if it doesn't need to be customized, the existing embedded-services implementation/task should be usable outright.
  3. if it compiles (and tests pass), it works (L)

What are your thoughts on trying to move ODP's embedded-services crate (and friends) to function this way now?

@williampMSFT
Copy link
Contributor Author

williampMSFT commented Mar 3, 2026

Gonna start with the second one because it's simpler from my perspective. I'm totally on board with Service as a useful trait to ensure ODP developers follow standard practices when designing long-running tasks that are intended to be wrapped up in an #[embassy_executor::task]. As an API convention, I think it accomplishes your goals. It doesn't seem to have much effect on users of the embedded-services crate.

Great, will proceed with that, thanks! I think the intended benefits to users of the crates are

  • making it harder to incorrectly use the APIs
  • abstracting the details of tasks away from end users
  • making it easier to uptake breaking changes by putting all the information required for service initialization in the same place.

Now, the first one: embedded-services as a library. This particular RFC doesn't bring much for me as a consumer of this library. Regarding this question:

Not 100% sure I'm parsing this correctly, but I think you're saying that you found value in implementing traits for the behavior of individual services rather than in the common behaviors between all services (e.g. a TimeAlarmService trait that implements get_real_time()/set_real_time() / etc)? If that's what you meant, then I can definitely see the value in that as well and we should probably look into doing that too, but it might be orthogonal to what I'm trying to achieve here.

Yeah, that's what I mean. It is mostly orthogonal but there's overlap as mentioned because it influences the scope of each Service and the way that each of its generics/associated types are designed. I've been fortunate to be part of various conversations where targetting embedded-services as a library has come up and I want to keep chasing it as an invaluable project goal to improve the adoption experience.

Totally on board with adding traits to wrap interfaces for services/logical devices and linking between services via traits rather than direct references.

Writing tests over a generic async fn power_led_task(led: impl Led, power_state: impl Stream<Item = bool>) -> ! has been useful. Even though a trait Led and embedded_hal::digital::OutputPin have very similar APIs (turn on/off vs. set high/low), the level of abstraction for trait Led is more semantically appropriate for the "power LED" and its associated tasks. Once that trait boundary is in place, we can make other tasks that want to control the LED generic over an impl Led, whether that's backed by the real HAL peripheral, by a Channel that implements the trait and feeds into power_led_task, or a Mutex<Peripheral> that is locked and driven directly where it's used.

I bugchecked a little on this one - it definitely makes sense to me to have our interfaces take e.g. impl Led instead of a concrete type and write all our interfaces in terms of traits like that, but it seems to me like tasks are implementation details of a service and shouldn't be exposed to the end user of embedded-services at all. The fact that we need to expose the 'runner' object at all rather than just starting it in our service's new function transparent to the user is a concession to the limitations of embassy tasks / memory allocation in embedded contexts, and reducing user exposure to the implementation details of individual tasks as much as possible is one of the explicit goals of this change. Not sure if I'm overindexing on the example here being specifically a signature for what looks like a task; if that was just an easy signature to demonstrate the principle of taking trait impls then I'm on board, but if you're specifically advocating for making the details of tasks part of the public API then I'd like to chat more about ways to avoid that.

If embedded-services is vying to be a library that OEMs adopt, these types of traits are the key for us. There's so much customization that happens internally that rigid tasks become difficult to extend, which leads to extension-through-modification (e.g., #734), a violation of the open-closed principle in SOLID. When the only way to consume embedded-services is through async fn task(_) -> ! (whatever form that takes) and other static items (e.g., static functions or private static globals), extension becomes very difficult. Our internal experiments have been centered on how to apply SOLID principles to embedded designs using Rust's specific set of features that help us maximize flexibility, testability, and maintainability.

Definitely on board with using traits for interfaces and getting rid of private statics and the like. I think a lot of the private-statics stuff came out of some of the lifetime requirements from the comms service. It required 'static for everything that talked over it, which lead to this pattern of declaring a module-private static instance and having an init() function that returned a handle to it. This also made testing a bit of a nightmare, since you couldn't really blow up an instance of your service and recreate it for the next test.
We've started moving away from that pattern to having application-provided storage (#716 demonstrates this end-to-end for time-alarm and #732 and #735 have started some of the other services along this path), which enables a bunch of test scenarios and product scenarios involving multiple instances of services. One of the other things I'm hoping to get out of this change is demonstrating a way to avoid prescribing a type of long-lived storage (e.g. moving from taking a oncelock out-parameter to the '&'hw mut resources' thing we're doing here), which I'm hoping will also help with the flexibility you're looking for.

If we can get the right boundaries into ODP, we can start building up a comprehensive suite of crates on both sides of those traits. We can write testable tasks (less contrived than power_led_task but hopefully the gist is there) that are generic over these traits, and we can write testable drivers and utilities that implement these traits. Straightforward use cases create an impl with hardware, provide it to a task, and spawn that in a dedicated #[task] (or super-loop it in main).

Bugchecking here as well - when you say "we can write testable tasks", where do you envision those tasks existing? I'm totally fine with and expect that OEMs will write their own tasks to drive their services, but I'd ideally like to not have any tasks as part of the public interface for services in embedded-services, because I think those are implementation details of individual services that we're currently failing to encapsulate, and one of the goals of this change is to improve encapsulation of those details.

The real world isn't always straightforward, so when product-specific needs demand it, the dream here is multifold:

  1. extend the behaviors of the embedded-services tasks by substituting generic objects with new types that wrap or replace existing impls provided by embedded-services, without needing to send PRs or publically expose proprietary details (O, D)

  2. what needs to be wrapped/replaced has a finite blast radius because the right boundaries are already in place: nothing more and nothing less than exactly what needs to be customized (S, I)

    • if it doesn't need to be customized, the existing embedded-services implementation/task should be usable outright.
  3. if it compiles (and tests pass), it works (L)

I'm definitely on board with ensuring that we have the extension points we need for services such that OEMs don't need to modify any code in embedded-services. I think if an OEM ever does have to do that to achieve their use case, that's indicative of a defect in our API design that we need to address. I think allowing user-provided task replacements is probably not the way, though? It seems like if we need to allow OEMs to control some behavior on a task, the right way to do that would be to take some sort of OEM-provided trait impl at service construction time that our internal, implementation-detail task(s) call into to achieve whatever customization is needed rather than having the OEM control the outer loop? i.e. rather than saying "you must write a task that calls these functions in this order or the service will misbehave:"

loop {
  service.foo();
  // ???
  service.bar();
  // ???
  service.baz();
}

We'd write a trait that enables extension points as appropriate

pub trait MyServiceExtension {
  fn post_foo();
  fn post_bar();
  fn post_baz();
}
/* (private - called by runner.run()) */
fn service_task(service: &Service) {
  loop {
   service.foo()
   service.extension.post_foo();
   service.bar();
   service.extension.post_bar();
   service.baz();
   service.extension.post_baz();
  }
}

or something like that (probably with domain-specific names rather than just post_bar, but you get the idea) - that way, the user can't call foo()/bar()/baz() out of order, and if we need to add a quux() in there or something later we can do it without the user needing to update their task.

What are your thoughts on trying to move ODP's embedded-services crate (and friends) to function this way now?

Definitely on board with using traits for interfaces, getting rid of statics / moving all allocations outside of embedded-services, and increasing flexibility as much as possible. I do want to make sure we're aligned on the task thing, though - I think we do need extensibility points, and some of those extensibility points will need to be invoked on worker tasks, but we need to be careful about making those extensibility points difficult to accidentally use incorrectly, and I think making the user responsible for providing the full task implementation rather than a trait that the service's code calls into is very easy to accidentally use incorrectly.

kurtjd
kurtjd previously approved these changes Mar 3, 2026
@asasine
Copy link
Contributor

asasine commented Mar 3, 2026

Great, will proceed with that

Totally on board with adding traits to wrap interfaces for services/logical devices and linking between services via traits rather than direct references.

👍

Not sure if I'm overindexing on the example here being specifically a signature for what looks like a task; if that was just an easy signature to demonstrate the principle of taking trait impls then I'm on board, but if you're specifically advocating for making the details of tasks part of the public API then I'd like to chat more about ways to avoid that.

Just an easy signature to demonstrate, I don't think tasks should generally be part of the public signature of a library. The primary thing I was advocating for was this:

we can make other tasks that want to control the LED generic over an impl Led, whether that's backed by the real HAL peripheral, by a Channel that implements the trait and feeds into power_led_task, or a Mutex<Peripheral> that is locked and driven directly where it's used.

Whether those other tasks (ODP or OEM) are using "real hardware" or arbitrary extensions is unimportant to the task's implementation.

it seems to me like tasks are implementation details of a service and shouldn't be exposed to the end user of embedded-services at all.

I'm definitely on board with ensuring that we have the extension points we need for services such that OEMs don't need to modify any code in embedded-services. I think if an OEM ever does have to do that to achieve their use case, that's indicative of a defect in our API design that we need to address. I think allowing user-provided task replacements is probably not the way, though? It seems like if we need to allow OEMs to control some behavior on a task, the right way to do that would be to take some sort of OEM-provided trait impl at service construction time that our internal, implementation-detail task(s) call into to achieve whatever customization is needed rather than having the OEM control the outer loop?

Agreed on this that tasks are implementation details and using trait objects as API boundaries and extension points feels right and jives wel with Rust's language, type safety, and the above impl Led example. Unfortunately, we run into a limitation in #[embassy_executor::task] that it can't be generic therefore extension becomes limited when using trait objects as the API boundary if embedded_services is spawning its own tasks. AFIT aren't dyn-compatible unless you rely on heap allocations. Otherwise, we'd be able to have nice extension like this

// embedded_services
pub trait MyService {
    async fn foo();
    async fn bar();
    async fn baz();
}

#[derive(Default)]
pub struct MyCanonicalService;
impl MyService for MyCanonicalService {
    async fn foo() {
        // do the canonical foo things for this service
    }

    async fn bar() {
        // --snip--
    }
    async fn baz() {
        // --snip--
    }
}

pub fn init(spawner: Spawner, service: impl Service) {
   spawner.spawn(task(service)).unwrap()
}

#[embassy_executor::task]
async fn service_task(service: impl Service) {
    loop {
        service.foo().await;
        service.bar().await;
        service.baz().await;
    }
}

// OEM
/// Does special OEM things before and after doing standard MyService things.
struct DoSpecialOemThings<S>(S);
impl<S: MyService> MyService for DoSpecialOemThings<S> {
    async fn foo() {
        // do OEM things before foo
        self.0.foo().await; // do other foo things
        // do OEM thigns after foo
    }

    async fn bar() {
        // this OEM doesn't want to do bar at all so just no-op
    }

    async fn baz() {
        // --snip--
    }
}

embedded_services::init(DoSpecialOemThings(embedded_services::MyCanonicalService::default()));

The goal would be to write tests over DoSpecialOemThings that asserts that "do OEM things before foo" happens before the inner foo is called (as measured by the side effect of the inner <S as MyService>::foo getting called). It's not that I want to test service_task; I want to test DoSpecialOemThings, but I can't really write code this way because of the AFIT-isn't-dyn-compatible limitation.


Hit send too early, will send another comment shortly.

@asasine
Copy link
Contributor

asasine commented Mar 4, 2026

Since we can't write #[task] generic over the traits we want, we end up writing something like this instead

// embedded_services: mostly the same minus init() and now with this new service_task()
pub async fn service_task(impl MyService) -> ! {
    loop {
        service.foo().await;
        service.bar().await;
        service.baz().await;
    }
}

// OEM: mostly the same, plus the new init() and service_task()
pub async fn init(spawner: Spawner) {
    let service = DoSpecialOemThings(embedded_services::MyCanonicalService::default());
    spawner.spawn(service_task(service)).unwrap();
}

#[embasy_executor::task]
async fn service_task(service: DoSpecialOemThings<embedded_services::MyCanonicalService>) -> ! {
    // e x t e n s i o n woo
    embedded_services::service_task(service).await
}

And this brings me full-circle to the Service trait: we need #[task] functions to perform work concurrently (e.g., IO), (they're unavoidable and necessary in embassy-based projects), we want trait-based extension for OEMs to benefit from SOLID principles to customize that IO behavior, and we can't utilize heap allocations to make AFIT dyn-compatible, so the only solution I see is for #[task] functions to have full type information. The only valid location this is possible becomes the final binary crate. A Service trait makes sense in this context to make it easier for OEMs to correctly initialize and spawn those tasks, but it is too coarse for OEM extension on its own: you have to replace the entire run with a new one and can't reuse what's there (async fn run(self) -> ! is both moving and blocking).

@williampMSFT
Copy link
Contributor Author

williampMSFT commented Mar 4, 2026

Agreed on this that tasks are implementation details and using trait objects as API boundaries and extension points feels right and jives wel with Rust's language, type safety, and the above impl Led example. Unfortunately, we run into a limitation in #[embassy_executor::task] that it can't be generic therefore extension becomes limited when using trait objects as the API boundary if embedded_services is spawning its own tasks. AFIT aren't dyn-compatible unless you rely on heap allocations.

And this brings me full-circle to the Service trait: we need #[task] functions to perform work concurrently (e.g., IO), and we want trait-based extension for OEMs to benefit from SOLID principles to customize that IO behavior, so #[task] functions need to have full type information. The only valid location becomes the final binary crate. A Service trait makes sense in this context to make it easier for OEMs to correctly initialize and spawn those tasks, but it is too coarse for OEM extension on its own: you have to replace the entire run with a new one and can't reuse what's there (async fn run(self) -> ! is both moving and blocking).

Exactly! This is one of the things I'm trying to solve for with this change.

In a world where memory is allocated in a static OnceLock or something inside the embedded-services module, you're right, we have to use dyn for storage and we also probably require the OEMs to control the tasks so they can pass stuff in. However, if we make the 'application layer' (i.e. Surface/OEM code) responsible for memory allocation for services, then the service itself can be generic over some OEM extension type, i.e. rather than usage looking like

// this initializes some static OnceLock in the embedded_services module, which means we have to use dyn
example_service::init(OemPluginType(embedded_services::MyCanonicalService::default()));

it's something like

// Note here - memory is allocated by OEM, which means they can specify concrete type parameters
// for their extensions
static MEMORY_RESOURCES: StaticCell<example_service::Resources<MyOemPluginType>> = StaticCell::new();
let memory_resources = MEMORY_RESOURCES.init(example_service::Resources::new());
let oem_plugin = MyOemPluginType::new();

// Note here - we return an instance of a control handle and a runner rather than storing stuff in 
// some statically-allocated memory in the example_service module
let (control_handle, runner) = example_service::Service::new(&mut memory_resources, oem_plugin);

// Note here - the task is declared by the 'application layer', which means they know the concrete type
// and therefore how large the future needs to be.
// However, they can't really screw it up unintentionally because there's only one thing you can do
// with a runner, and that's call run() on it, which returns Never.
// It's hard to get this wrong because if you forget to use runner, the compiler will yell
// at you about unused variables.  If the caller does the '_' thing I guess you sunk my battleship,
// but I haven't found a way to do better than this, and the embassy-net crate does something similar,
// so I suspect there's not a much better way to do it.
// This is a concession to the limitations of a heapless environment, but I think it's as small a
// concession as we can make.
#[embassy_task]
async fn service_task_fn(runner: example_service::Runner<MyOemPluginType>) {
    runner.run();
}

spawner.must_spawn(service_task_fn(runner));

The spawn_service!() macro writes all this for you, and in this PR I have this demonstrated working on the eSPI service, which is generic over an OEM-provided type (RelayHandler). Doing it this way where all memory allocations are done outside of embedded-services means we sidestep the AFIT/dyn-compat thing, (and dyn usage in general), and the embassy-tasks-can't-be-generic thing. We reduce the number of embassy tasks we need to get arbitrary parallelism by select()ing over all the things that need to happen in parallel (which works because those things return Never), and we
make it easy for OEMs to spawn the single task we need per service by encapsulating the knowledge of how to do that in a service-specific Runner type. This also lets us plumb new state to our worker tasks in a way that's transparent to the user by just modifying the Runner type.

It requires a few lines of boilerplate when you instantiate the service, but you were going to need that boilerplate if OEMs are writing their own more-than-one-line tasks anyway, and in the product case, the spawn_service!() macro spits out all of that boilerplate for you (tests still need to do it manually, but they also don't need to live forever so you can ditch the StaticCell stuff and just stack-allocate everything)

The goal would be to write tests over DoSpecialOemThings that asserts that "do OEM things before foo" happens before the inner foo is called (as measured by the side effect of the inner <S as MyService>::foo getting called). It's not that I want to test service_task; I want to test DoSpecialOemThings, but I can't really write code this way because of the AFIT-isn't-dyn-compatible limitation.

This approach means that the client doesn't even see the details of service_task - they just call runner.run(). This is kind of what I meant by 'we don't really need a trait to get 95% of what I'm interested in' - most of the value comes from hoisting memory allocation to the 'application layer' and then making it easy for a user of the service to launch that service without reasoning about its implementation details, and a trait is just the easiest way to enforce that we actually do that in ODP services.

@asasine
Copy link
Contributor

asasine commented Mar 4, 2026

Totally onboard with the goals here: sidestepping AFIT/dyn-compat with OEM-allocated objects is goodness and I think consistency is good.

My concern is that Service on its own is insufficient as a useful extension point for OEMs to customize behaviors of services. What does extension look like in this design? Let's say you had some "foo" service that had a single extension point in its canonical implementation.

pub trait FooService: Service<Runner: FooRunner> {}
pub trait FooRunner {
    fn foo(&self);
}

(removed async and storage for clarity)

This is what I came up with, and it's got some rough edges. The canoncial impl resembles this:

pub struct FooRunnerLoop<R>(R);
impl<R: FooRunner> FooRunnerLoop<R> {
    pub fn new(runner: R) -> Self {
        FooRunnerLoop(runner)
    }
}

impl<R: FooRunner> Runner for FooRunnerLoop<R> {
    fn run(self) -> ! {
        loop {
            self.0.foo();
            std::thread::sleep(std::time::Duration::from_secs(1));
        }
    }
}

impl<R: FooRunner> FooRunner for FooRunnerLoop<R> {
    fn foo(&self) {
        self.0.foo();
    }
}

pub struct Canonical;
pub struct CanonicalRunner;
impl Service for Canonical {
    type InitParams = ();
    type Runner = FooRunnerLoop<CanonicalRunner>;
    fn new(_params: ()) -> (Self, Self::Runner) {
        (Canonical, FooRunnerLoop::new(CanonicalRunner))
    }
}

impl FooService for Canonical {}
impl FooRunner for CanonicalRunner {
    fn foo(&self) {
        eprintln!("foo: canonical");
    }
}

and extension looks like this:

pub struct Customized<S>(S);
pub struct CustomizedRunner<R>(R, &'static str);
impl<S: FooService> Service for Customized<S> {
    type InitParams = (S::InitParams, &'static str);
    type Runner = FooRunnerLoop<CustomizedRunner<S::Runner>>;
    fn new(params: (S::InitParams, &'static str)) -> (Self, Self::Runner) {
        let (service, runner) = S::new(params.0);
        (Customized(service), FooRunnerLoop::new(CustomizedRunner(runner, params.1)))
    }
}

impl<S: FooService> FooService for Customized<S> {}
impl<R: FooRunner> FooRunner for CustomizedRunner<R> {
    fn foo(&self) {
        eprintln!("before foo: {}", self.1);
        self.0.foo();
    }
}

All to insert some logic that runs before the default foo. Is this the intent?

@williampMSFT
Copy link
Contributor Author

williampMSFT commented Mar 4, 2026

Totally onboard with the goals here: sidestepping AFIT/dyn-compat with OEM-allocated objects is goodness and I think consistency is good.

My concern is that Service on its own is insufficient as a useful extension point for OEMs to customize behaviors of services. What does extension look like in this design? Let's say you had some "foo" service that had a single extension point in its canonical implementation.

pub trait FooService: Service<Runner: FooRunner> {}
pub trait FooRunner {
    fn foo(&self);
}

(removed async and storage for clarity)

This is what I came up with, and it's got some rough edges. The canoncial impl resembles this:

pub struct FooRunnerLoop<R>(R);
impl<R: FooRunner> FooRunnerLoop<R> {
    pub fn new(runner: R) -> Self {
        FooRunnerLoop(runner)
    }
}

impl<R: FooRunner> Runner for FooRunnerLoop<R> {
    fn run(self) -> ! {
        loop {
            self.0.foo();
            std::thread::sleep(std::time::Duration::from_secs(1));
        }
    }
}

impl<R: FooRunner> FooRunner for FooRunnerLoop<R> {
    fn foo(&self) {
        self.0.foo();
    }
}

pub struct Canonical;
pub struct CanonicalRunner;
impl Service for Canonical {
    type InitParams = ();
    type Runner = FooRunnerLoop<CanonicalRunner>;
    fn new(_params: ()) -> (Self, Self::Runner) {
        (Canonical, FooRunnerLoop::new(CanonicalRunner))
    }
}

impl FooService for Canonical {}
impl FooRunner for CanonicalRunner {
    fn foo(&self) {
        eprintln!("foo: canonical");
    }
}

and extension looks like this:

pub struct Customized<S>(S);
pub struct CustomizedRunner<R>(R, &'static str);
impl<S: FooService> Service for Customized<S> {
    type InitParams = (S::InitParams, &'static str);
    type Runner = FooRunnerLoop<CustomizedRunner<S::Runner>>;
    fn new(params: (S::InitParams, &'static str)) -> (Self, Self::Runner) {
        let (service, runner) = S::new(params.0);
        (Customized(service), FooRunnerLoop::new(CustomizedRunner(runner, params.1)))
    }
}

impl<S: FooService> FooService for Customized<S> {}
impl<R: FooRunner> FooRunner for CustomizedRunner<R> {
    fn foo(&self) {
        eprintln!("before foo: {}", self.1);
        self.0.foo();
    }
}

All to insert some logic that runs before the default foo. Is this the intent?

I think the idea is that ideally, an OEM wouldn't have to reach for implementing their own FooService/FooRunner at all - the constructor for CanonicalFooService would take as an argument a (perhaps optional) FooServiceOemPlugin implementation that CanonicalFooService can be generic over. The shape of FooServiceOemPlugin defines the extension points available in the canonical implementation - in this case, we'd have one of the methods on FooServiceOemPlugin be fn pre_foo() and the canonical runner (which is the only runner that can be used) calls into it.

We have an example of this in the form of the eSPI service in this PR. The OEM can define the 'addresses' and types used to refer to and handle serialization/deserialization for each service, including any custom OEM-only services that they want to use - this is the 'RelayHandler' trait - and they pass a type that implements that trait in at service declaration time and an instance of that type at service construction time. The eSPI service calls into that instance to handle that service.

A simplified non-Service-trait mostly-non-async version looks something like this -

pub trait OemPlugin {
   fn foo(&self) -> ()
}

struct ServiceInner<P: OemPlugin> {
  oem_plugin: P,
 // TODO - additional state as necessary
}
impl<P: OemPlugin> ServiceInner<P> {
  fn new(oem_plugin: P) -> Self {
    Self { oem_plugin }
  }

  fn run_first_worker_operation_returns_never(&self) -> ! {
    loop {
      self.oem_plugin.foo();
      self.do_the_actual_thing();
    }
  }
  // TODO implement service logic
}

pub struct Resources<P: OemPlugin> {
   inner: Option<ServiceInner<P>>
}
impl<P: OemPlugin> Resources<P> {
  pub fn new() -> Self {
    Self { inner: None }
  }
}

pub struct Runner<'hw, P: OemPlugin> {
  inner: &'hw Resources<P>
}
impl<'hw, P: OemPlugin> {
   pub async fn run(self) -> ! {
       select2(
         inner.run_first_worker_operation_returns_never(),
         inner.run_second_worker_operation_returns_never());
   }
}

pub struct Service<'hw, P: OemPlugin> {
  inner: &'hw Resources<P>
}

impl<'hw, P: OemPlugin> Service<'hw, P> {
  fn new(resources: &'hw mut Resources, oem_plugin: P) {
     let inner = resources.inner.insert(ServiceInner::new(oem_plugin));
     (Self {inner}, Runner {inner} )
  }

  // TODO impl the public-facing API for the service
}

Extension is just

// in OEM code
struct MyOemPlugin {}

impl OemPlugin for MyOemPlugin {
  fn foo(&self) {
    eprintln!("before foo: {}")
  }
}

and then instantiating the service with P = MyOemPlugin.

The Service trait just codifies the need for a service to define all of the following types:

  • Runner
  • Service
  • Resources
    to enforce this pattern and force the public interface for the runner to be dead simple. This has the incidental benefit of enabling the spawn_service macro, which handles a lot of the memory/task allocation boilerplate for you.

If you want to be able to make direct calls against the service once it's started, you can additionally define a trait for those and impl those against Service - let's call this ServiceTypeATrait for this example. I have not demonstrated that in this PR, but it's probably worth a follow-up PR.

If an OEM has some customization that can't be achieved via the extension points in OemPlugin, then their path forward would be either
A) Report to ODP that our OemPlugin trait doesn't provide sufficient extensibility, and we can fix our API (or they can contribute a fix if they're feeling generous) (if this ever actually happens, that's a flaw in the API design - we should strive to build our interfaces such that it never does), or
B) Implement ServiceTypeATrait themselves

In this case, I have a strong preference for (A), but (B) provides an escape valve for anything really crazy / tight scheduling.

@williampMSFT williampMSFT force-pushed the user/williamp/es-uniform-spawn-with-innersvc branch from e111f4a to c79b241 Compare March 5, 2026 18:47
Copilot AI review requested due to automatic review settings March 9, 2026 21:30
Copilot AI review requested due to automatic review settings March 10, 2026 17:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 18 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 10, 2026 19:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 18 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 10, 2026 20:11
@williampMSFT williampMSFT force-pushed the user/williamp/es-uniform-spawn-with-innersvc branch from 29fd571 to 5d3d0c9 Compare March 10, 2026 20:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 18 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants