Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback mechanism for the native price estimator to improve reliability during primary estimator outages. While the implementation is well-structured, includes comprehensive tests, and uses a state machine for failover, two critical areas require attention. A high-severity issue was found regarding the use of .unwrap() on a Mutex, which could lead to a service panic if the lock is poisoned; while panicking is acceptable for unrecoverable errors, using .expect() improves debuggability. Additionally, a potential 'thundering herd' issue was identified in the probe logic that could lead to a burst of requests when recovering from a failure. Addressing these concerns will significantly improve the robustness and stability of the system.
| let mut state = self.state.lock().unwrap(); | ||
| if let Err(PriceEstimationError::ProtocolInternal(err)) = result { | ||
| let State::Primary { | ||
| consecutive_errors, .. | ||
| } = &mut *state | ||
| else { | ||
| return false; | ||
| }; | ||
| *consecutive_errors += 1; | ||
| if *consecutive_errors >= CONSECUTIVE_ERRORS_THRESHOLD { | ||
| tracing::info!( | ||
| ?err, | ||
| "primary native price estimator down after {} consecutive errors, switching \ | ||
| to fallback", | ||
| *consecutive_errors | ||
| ); | ||
| *state = State::Fallback { | ||
| last_probe: Instant::now(), | ||
| }; | ||
| return true; | ||
| } | ||
| tracing::debug!( | ||
| ?err, | ||
| consecutive_errors = *consecutive_errors, | ||
| "primary native price estimator error, not yet switching to fallback" | ||
| ); | ||
| false | ||
| } else { | ||
| if let State::Primary { | ||
| consecutive_errors, .. | ||
| } = &mut *state | ||
| { | ||
| *consecutive_errors = 0; | ||
| } | ||
| false | ||
| } | ||
| } |
There was a problem hiding this comment.
I find that using early returns will improve readability here
| let mut state = self.state.lock().unwrap(); | |
| if let Err(PriceEstimationError::ProtocolInternal(err)) = result { | |
| let State::Primary { | |
| consecutive_errors, .. | |
| } = &mut *state | |
| else { | |
| return false; | |
| }; | |
| *consecutive_errors += 1; | |
| if *consecutive_errors >= CONSECUTIVE_ERRORS_THRESHOLD { | |
| tracing::info!( | |
| ?err, | |
| "primary native price estimator down after {} consecutive errors, switching \ | |
| to fallback", | |
| *consecutive_errors | |
| ); | |
| *state = State::Fallback { | |
| last_probe: Instant::now(), | |
| }; | |
| return true; | |
| } | |
| tracing::debug!( | |
| ?err, | |
| consecutive_errors = *consecutive_errors, | |
| "primary native price estimator error, not yet switching to fallback" | |
| ); | |
| false | |
| } else { | |
| if let State::Primary { | |
| consecutive_errors, .. | |
| } = &mut *state | |
| { | |
| *consecutive_errors = 0; | |
| } | |
| false | |
| } | |
| } | |
| /// Returns `true` if the fallback should be used. | |
| fn should_use_fallback(&self, result: &NativePriceEstimateResult) -> bool { | |
| let mut state = self.state.lock().unwrap(); | |
| let State::Primary { | |
| ref mut consecutive_errors, | |
| } = *state | |
| else { | |
| return false; | |
| }; | |
| let Err(err) = result else { | |
| *consecutive_errors = 0; | |
| return false; | |
| }; | |
| *consecutive_errors += 1; | |
| if *consecutive_errors >= CONSECUTIVE_ERRORS_THRESHOLD { | |
| tracing::info!( | |
| ?err, | |
| "primary native price estimator down after {} consecutive errors, switching \ | |
| to fallback", | |
| *consecutive_errors | |
| ); | |
| *state = State::Fallback { | |
| last_probe: Instant::now(), | |
| }; | |
| return true; | |
| } | |
| tracing::debug!( | |
| ?err, | |
| consecutive_errors = *consecutive_errors, | |
| "primary native price estimator error, not yet switching to fallback" | |
| ); | |
| false | |
| } |
| crate::price_estimation::PriceEstimationError, | ||
| alloy::primitives::Address, | ||
| anyhow::Context, | ||
| anyhow::Context as _, |
There was a problem hiding this comment.
Why change the import style?
| fn token() -> Address { | ||
| Address::with_last_byte(1) | ||
| } | ||
|
|
||
| fn timeout() -> Duration { | ||
| Duration::from_secs(5) | ||
| } |
There was a problem hiding this comment.
both these functions are const fns
| fn token() -> Address { | |
| Address::with_last_byte(1) | |
| } | |
| fn timeout() -> Duration { | |
| Duration::from_secs(5) | |
| } | |
| const TOKEN: Address = Address::with_last_byte(1); | |
| const TIMEOUT: Duration = Duration::from_secs(5); |
Background
The orderbook's native price estimator is currently configured to use a Forwarder estimator, which is basically the Autopilot's API. In case Autopilot is down, quote competition can't proceed without native prices, and no new orders can be placed during that time.
Description
Adds an optional fallback native price estimator that kicks in when the primary estimator experiences sustained failures. This protects native price availability during primary estimator outages.
The fallback estimator tracks consecutive
ProtocolInternalerrors from the primary. After a configurable threshold (3 errors), it switches to the fallback estimator and periodically probes the primary to detect recovery.Changes
FallbackNativePriceEstimator, which wraps a primary and fallback estimator with automatic failover logic:ProtocolInternalerrors on the primaryForwardererror mapping: HTTP send failures now returnProtocolInternalinstead of a generic error, so the fallback estimator can detect them--fallback-native-price-estimatorson the orderbook to optionally configure fallback estimatorscaching_native_price_estimator_from_innerto allow injecting a pre-built inner estimator (with fallback wrapping) into the caching layerHow to test
New unit and e2e tests.