Skip to content

fix: use async db in sim_round#206

Merged
Fraser999 merged 4 commits intomainfrom
fraser/eng-1944/fix-hang
Mar 6, 2026
Merged

fix: use async db in sim_round#206
Fraser999 merged 4 commits intomainfrom
fraser/eng-1944/fix-hang

Conversation

@Fraser999
Copy link
Contributor

@Fraser999 Fraser999 commented Mar 5, 2026

Description

This updates sim_round to use async DB functions for the preflight checks.

Without this, the tokio runtime would become deadlocked due to many calls to block_in_place(|| handle.block_on(f)). See the related ticket for further info.

Related Issue

Main fix for ENG-1944.

Testing

  • ad hoc testing via running a builder shows that this no longer locks up under load.
  • Tests pass locally

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Fraser999 Fraser999 marked this pull request as ready for review March 5, 2026 10:59
@Fraser999 Fraser999 requested a review from a team as a code owner March 5, 2026 10:59
Comment on lines +125 to +127
/// using the provided [`StateSource`]s. This avoids the tokio I/O
/// driver starvation deadlock that occurs when sync `DatabaseRef` calls
/// go through `block_in_place` + `Handle::block_on`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh I see what the problem was. We were using sync db ops on an async backed db, leading to parked OS threads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's it in a nutshell.

Copy link
Member

@prestwich prestwich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our code is being colored in real time :/

pub trait StateSource {
/// The error type for state lookups, usually a database error.
type Error: core::error::Error + 'static;
pub trait StateSource: Send + Sync {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this new bound is significant, as it prevents implementation for the signet-storage MDBX revm accessors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - that's unfortunate indeed. I don't see a way round adding these bounds for now, since the crux of the fix here is to use async DBs for the preflight work.

I guess it's something we might need to revisit when we come to using signet-storage here?

Copy link
Contributor

@dylanlott dylanlott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock testing and debugging in the builder.

Let's make follow-up tickets to investigate possible solutions for the Async / Sync DB duplication and the function-coloring issues presented by new trait bounds.

@Fraser999 Fraser999 merged commit da458c1 into main Mar 6, 2026
7 checks passed
@Fraser999 Fraser999 deleted the fraser/eng-1944/fix-hang branch March 6, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants