Skip to content

Orderbook native price estimators fallback#4161

Open
squadgazzz wants to merge 8 commits intomainfrom
orderbook/native-prices-estimator-fallback
Open

Orderbook native price estimators fallback#4161
squadgazzz wants to merge 8 commits intomainfrom
orderbook/native-prices-estimator-fallback

Conversation

@squadgazzz
Copy link
Contributor

Background

The orderbook's native price estimator is currently configured to use a Forwarder estimator, which is basically the Autopilot's API. In case Autopilot is down, quote competition can't proceed without native prices, and no new orders can be placed during that time.

Description

Adds an optional fallback native price estimator that kicks in when the primary estimator experiences sustained failures. This protects native price availability during primary estimator outages.

The fallback estimator tracks consecutive ProtocolInternal errors from the primary. After a configurable threshold (3 errors), it switches to the fallback estimator and periodically probes the primary to detect recovery.

Changes

  • New FallbackNativePriceEstimator, which wraps a primary and fallback estimator with automatic failover logic:
    • Tracks consecutive ProtocolInternal errors on the primary
    • Switches to fallback after 3 consecutive errors
    • Probes primary every 60s while in fallback mode
    • Recovers to primary when a probe succeeds, otherwise, continue using the fallback
  • Forwarder error mapping: HTTP send failures now return ProtocolInternal instead of a generic error, so the fallback estimator can detect them
  • New CLI argument --fallback-native-price-estimators on the orderbook to optionally configure fallback estimators
  • New factory method caching_native_price_estimator_from_inner to allow injecting a pre-built inner estimator (with fallback wrapping) into the caching layer

How to test

New unit and e2e tests.

@squadgazzz squadgazzz marked this pull request as ready for review February 16, 2026 13:46
@squadgazzz squadgazzz requested a review from a team as a code owner February 16, 2026 13:46
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fallback mechanism for the native price estimator to improve reliability during primary estimator outages. While the implementation is well-structured, includes comprehensive tests, and uses a state machine for failover, two critical areas require attention. A high-severity issue was found regarding the use of .unwrap() on a Mutex, which could lead to a service panic if the lock is poisoned; while panicking is acceptable for unrecoverable errors, using .expect() improves debuggability. Additionally, a potential 'thundering herd' issue was identified in the probe logic that could lead to a burst of requests when recovering from a failure. Addressing these concerns will significantly improve the robustness and stability of the system.

Copy link
Contributor

@jmg-duarte jmg-duarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits only, nice job

Comment on lines +100 to +136
let mut state = self.state.lock().unwrap();
if let Err(PriceEstimationError::ProtocolInternal(err)) = result {
let State::Primary {
consecutive_errors, ..
} = &mut *state
else {
return false;
};
*consecutive_errors += 1;
if *consecutive_errors >= CONSECUTIVE_ERRORS_THRESHOLD {
tracing::info!(
?err,
"primary native price estimator down after {} consecutive errors, switching \
to fallback",
*consecutive_errors
);
*state = State::Fallback {
last_probe: Instant::now(),
};
return true;
}
tracing::debug!(
?err,
consecutive_errors = *consecutive_errors,
"primary native price estimator error, not yet switching to fallback"
);
false
} else {
if let State::Primary {
consecutive_errors, ..
} = &mut *state
{
*consecutive_errors = 0;
}
false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that using early returns will improve readability here

Suggested change
let mut state = self.state.lock().unwrap();
if let Err(PriceEstimationError::ProtocolInternal(err)) = result {
let State::Primary {
consecutive_errors, ..
} = &mut *state
else {
return false;
};
*consecutive_errors += 1;
if *consecutive_errors >= CONSECUTIVE_ERRORS_THRESHOLD {
tracing::info!(
?err,
"primary native price estimator down after {} consecutive errors, switching \
to fallback",
*consecutive_errors
);
*state = State::Fallback {
last_probe: Instant::now(),
};
return true;
}
tracing::debug!(
?err,
consecutive_errors = *consecutive_errors,
"primary native price estimator error, not yet switching to fallback"
);
false
} else {
if let State::Primary {
consecutive_errors, ..
} = &mut *state
{
*consecutive_errors = 0;
}
false
}
}
/// Returns `true` if the fallback should be used.
fn should_use_fallback(&self, result: &NativePriceEstimateResult) -> bool {
let mut state = self.state.lock().unwrap();
let State::Primary {
ref mut consecutive_errors,
} = *state
else {
return false;
};
let Err(err) = result else {
*consecutive_errors = 0;
return false;
};
*consecutive_errors += 1;
if *consecutive_errors >= CONSECUTIVE_ERRORS_THRESHOLD {
tracing::info!(
?err,
"primary native price estimator down after {} consecutive errors, switching \
to fallback",
*consecutive_errors
);
*state = State::Fallback {
last_probe: Instant::now(),
};
return true;
}
tracing::debug!(
?err,
consecutive_errors = *consecutive_errors,
"primary native price estimator error, not yet switching to fallback"
);
false
}

crate::price_estimation::PriceEstimationError,
alloy::primitives::Address,
anyhow::Context,
anyhow::Context as _,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the import style?

Comment on lines +209 to +215
fn token() -> Address {
Address::with_last_byte(1)
}

fn timeout() -> Duration {
Duration::from_secs(5)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both these functions are const fns

Suggested change
fn token() -> Address {
Address::with_last_byte(1)
}
fn timeout() -> Duration {
Duration::from_secs(5)
}
const TOKEN: Address = Address::with_last_byte(1);
const TIMEOUT: Duration = Duration::from_secs(5);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants