Skip to content

OEV-1150: add latest rpc nonce metric#402

Open
augustbleeds wants to merge 3 commits intodevelopfrom
augustus.OEV-1150.add-local-rpc-nonce
Open

OEV-1150: add latest rpc nonce metric#402
augustbleeds wants to merge 3 commits intodevelopfrom
augustus.OEV-1150.add-local-rpc-nonce

Conversation

@augustbleeds
Copy link
Contributor

@augustbleeds augustbleeds commented Mar 26, 2026

Adds a local txm nonce metric and rpc nonce metric (pending vs latest)

Copilot AI review requested due to automatic review settings March 26, 2026 21:20
@augustbleeds augustbleeds requested review from a team and dimriou as code owners March 26, 2026 21:20
@github-actions
Copy link
Contributor

👋 augustbleeds, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Mar 26, 2026

✅ API Diff Results - github.com/smartcontractkit/chainlink-evm

✅ Compatible Changes (1)

pkg/txm.(*txmMetrics) (1)
  • SetRPCNonce — ➕ Added

📄 View full apidiff report

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds nonce-related instrumentation to the EVM TXM so operators can observe the TXM’s local nonce and the RPC-reported nonce (pending vs latest).

Changes:

  • Add Prometheus gauges (and Beholder/OTel gauges) for txm_local_nonce and txm_rpc_nonce (with source=pending|latest).
  • Record RPC nonce at several RPC fetch points and record local nonce whenever the TXM nonce map is updated.
  • Add unit tests validating the new Prometheus gauges’ behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
pkg/txm/txm.go Emits local nonce and RPC nonce metrics during TXM nonce initialization, broadcast flow, and backfill.
pkg/txm/metrics.go Defines/registers new nonce gauges and exposes SetLocalNonce/SetRPCNonce helpers.
pkg/txm/metrics_test.go Adds tests for the new nonce gauge setters (Prometheus side).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}, []string{"chainID", "address"})
promRPCNonce = promauto.NewGaugeVec(prometheus.GaugeOpts{
Name: "txm_rpc_nonce",
Help: "The latest nonce reported by the RPC node for a given address.",
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text for txm_rpc_nonce says it is "The latest nonce reported by the RPC node", but the metric is also used for the pending nonce (distinguished by the source label). Update the help text to reflect that it can represent either pending or latest based on source, so dashboards/alerts interpret it correctly.

Suggested change
Help: "The latest nonce reported by the RPC node for a given address.",
Help: "The nonce reported by the RPC node for a given address (either 'pending' or 'latest', as indicated by the 'source' label).",

Copilot uses AI. Check for mistakes.
pkg/txm/txm.go Outdated
if e != nil {
return false, e
}
t.Metrics.SetRPCNonce(ctx, address, pendingNonce, "pending")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BroadcastTransaction now calls t.Metrics.SetRPCNonce(...) unconditionally. Txm.Metrics is only initialized in Start(), and several unit tests (and any direct use of NewTxm without starting) call BroadcastTransaction without setting Metrics, which will panic with a nil-pointer dereference. Consider guarding this call with if t.Metrics != nil { ... } (consistent with initializeNonce / SetNonce), or ensure Metrics is always non-nil after NewTxm (e.g., a noop metrics implementation).

Suggested change
t.Metrics.SetRPCNonce(ctx, address, pendingNonce, "pending")
if t.Metrics != nil {
t.Metrics.SetRPCNonce(ctx, address, pendingNonce, "pending")
}

Copilot uses AI. Check for mistakes.
Comment on lines 349 to 354
pendingNonce, pErr := t.client.PendingNonceAt(ctx, fromAddress)
if pErr != nil {
return pErr
}
t.Metrics.SetRPCNonce(ctx, fromAddress, pendingNonce, "pending")
if pendingNonce <= *tx.Nonce {
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sendTransactionWithError now calls t.Metrics.SetRPCNonce(...) without a nil check. Since Metrics is only set in Start() (and can be nil in tests/when Txm is used without starting), this introduces a potential nil-pointer panic on the error path. Add a nil guard here or ensure Metrics is always initialized before this method can run.

Copilot uses AI. Check for mistakes.
pkg/txm/txm.go Outdated
Comment on lines +368 to +372
@@ -361,6 +369,7 @@
if err != nil {
return err
}
t.Metrics.SetRPCNonce(ctx, address, latestNonce, "latest")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackfillTransactions now records the RPC nonce via t.Metrics.SetRPCNonce(...) unconditionally. Because Metrics is initialized in Start() and is nil on a freshly constructed Txm, calling BackfillTransactions directly (as some tests do) will panic. Guard this call with if t.Metrics != nil { ... } or make Metrics non-nil by construction (e.g., noop metrics).

Copilot uses AI. Check for mistakes.
Comment on lines 145 to 153
m.localNonce.Record(ctx, int64(nonce))
}

func (m *txmMetrics) SetRPCNonce(ctx context.Context, address common.Address, nonce uint64, source string) {
promRPCNonce.WithLabelValues(m.chainID.String(), address.String(), source).Set(float64(nonce))
m.rpcNonce.Record(ctx, int64(nonce))
}

func (m *txmMetrics) EmitTxMessage(ctx context.Context, txHash common.Hash, fromAddress common.Address, tx *types.Transaction) error {
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetLocalNonce/SetRPCNonce accept address (and source) and the Prometheus gauges are labeled accordingly, but the Beholder/OTel gauges are recorded without any attributes. In a multi-address TXM this produces a single time series that will effectively flap/overwrite between different addresses (and between pending/latest), making the Beholder metric ambiguous or incorrect. Consider either (a) recording with attributes for chainID, address and source (using the Labeler), or (b) dropping the Beholder gauges entirely if high-cardinality dimensions are intentionally avoided.

Suggested change
m.localNonce.Record(ctx, int64(nonce))
}
func (m *txmMetrics) SetRPCNonce(ctx context.Context, address common.Address, nonce uint64, source string) {
promRPCNonce.WithLabelValues(m.chainID.String(), address.String(), source).Set(float64(nonce))
m.rpcNonce.Record(ctx, int64(nonce))
}
func (m *txmMetrics) EmitTxMessage(ctx context.Context, txHash common.Hash, fromAddress common.Address, tx *types.Transaction) error {
}
func (m *txmMetrics) SetRPCNonce(ctx context.Context, address common.Address, nonce uint64, source string) {
promRPCNonce.WithLabelValues(m.chainID.String(), address.String(), source).Set(float64(nonce))
}
func (m *txmMetrics) EmitTxMessage(ctx context.Context, txHash common.Hash, fromAddress common.Address, tx *types.Transaction) error {
func (m *txmMetrics) EmitTxMessage(ctx context.Context, txHash common.Hash, fromAddress common.Address, tx *types.Transaction) error {

Copilot uses AI. Check for mistakes.
if t.Metrics != nil {
t.Metrics.SetRPCNonce(ctxWithTimeout, address, pendingNonce, "pending")
}
t.lggr.Debugf("Set initial nonce for address: %v to %d", address, pendingNonce)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you upgrade this log message to info? This should log once and it's quite valuable.

Name: "txm_time_until_tx_confirmed",
Help: "The amount of time elapsed from a transaction being broadcast to being included in a block.",
}, []string{"chainID"})
promLocalNonce = promauto.NewGaugeVec(prometheus.GaugeOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need prom metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keeping the pattern, but I think it's good practice if NOPs want to see the metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, I don't find much value in tracking the local nonce. It is initialized once and gets incremented sequentially. You can also infer that information by looking at the transactions of the node. Similar case for the pending nonce, which is used once for initialization and then there are clear log errors if you're getting throttled. I would keep the latest nonce value to understand which transactions are being confirmed and which are not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, in theory the latest nonce is going to be much more valuable. Yeah, I'll remove it then.

@augustbleeds augustbleeds changed the title OEV-1150: add local/rpc nonce metric OEV-1150: add latest rpc nonce metric Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants