Conversation
|
👋 augustbleeds, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
✅ API Diff Results -
|
There was a problem hiding this comment.
Pull request overview
Adds nonce-related instrumentation to the EVM TXM so operators can observe the TXM’s local nonce and the RPC-reported nonce (pending vs latest).
Changes:
- Add Prometheus gauges (and Beholder/OTel gauges) for
txm_local_nonceandtxm_rpc_nonce(withsource=pending|latest). - Record RPC nonce at several RPC fetch points and record local nonce whenever the TXM nonce map is updated.
- Add unit tests validating the new Prometheus gauges’ behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pkg/txm/txm.go | Emits local nonce and RPC nonce metrics during TXM nonce initialization, broadcast flow, and backfill. |
| pkg/txm/metrics.go | Defines/registers new nonce gauges and exposes SetLocalNonce/SetRPCNonce helpers. |
| pkg/txm/metrics_test.go | Adds tests for the new nonce gauge setters (Prometheus side). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/txm/metrics.go
Outdated
| }, []string{"chainID", "address"}) | ||
| promRPCNonce = promauto.NewGaugeVec(prometheus.GaugeOpts{ | ||
| Name: "txm_rpc_nonce", | ||
| Help: "The latest nonce reported by the RPC node for a given address.", |
There was a problem hiding this comment.
The help text for txm_rpc_nonce says it is "The latest nonce reported by the RPC node", but the metric is also used for the pending nonce (distinguished by the source label). Update the help text to reflect that it can represent either pending or latest based on source, so dashboards/alerts interpret it correctly.
| Help: "The latest nonce reported by the RPC node for a given address.", | |
| Help: "The nonce reported by the RPC node for a given address (either 'pending' or 'latest', as indicated by the 'source' label).", |
pkg/txm/txm.go
Outdated
| if e != nil { | ||
| return false, e | ||
| } | ||
| t.Metrics.SetRPCNonce(ctx, address, pendingNonce, "pending") |
There was a problem hiding this comment.
BroadcastTransaction now calls t.Metrics.SetRPCNonce(...) unconditionally. Txm.Metrics is only initialized in Start(), and several unit tests (and any direct use of NewTxm without starting) call BroadcastTransaction without setting Metrics, which will panic with a nil-pointer dereference. Consider guarding this call with if t.Metrics != nil { ... } (consistent with initializeNonce / SetNonce), or ensure Metrics is always non-nil after NewTxm (e.g., a noop metrics implementation).
| t.Metrics.SetRPCNonce(ctx, address, pendingNonce, "pending") | |
| if t.Metrics != nil { | |
| t.Metrics.SetRPCNonce(ctx, address, pendingNonce, "pending") | |
| } |
| pendingNonce, pErr := t.client.PendingNonceAt(ctx, fromAddress) | ||
| if pErr != nil { | ||
| return pErr | ||
| } | ||
| t.Metrics.SetRPCNonce(ctx, fromAddress, pendingNonce, "pending") | ||
| if pendingNonce <= *tx.Nonce { |
There was a problem hiding this comment.
sendTransactionWithError now calls t.Metrics.SetRPCNonce(...) without a nil check. Since Metrics is only set in Start() (and can be nil in tests/when Txm is used without starting), this introduces a potential nil-pointer panic on the error path. Add a nil guard here or ensure Metrics is always initialized before this method can run.
pkg/txm/txm.go
Outdated
| @@ -361,6 +369,7 @@ | |||
| if err != nil { | |||
| return err | |||
| } | |||
| t.Metrics.SetRPCNonce(ctx, address, latestNonce, "latest") | |||
There was a problem hiding this comment.
BackfillTransactions now records the RPC nonce via t.Metrics.SetRPCNonce(...) unconditionally. Because Metrics is initialized in Start() and is nil on a freshly constructed Txm, calling BackfillTransactions directly (as some tests do) will panic. Guard this call with if t.Metrics != nil { ... } or make Metrics non-nil by construction (e.g., noop metrics).
pkg/txm/metrics.go
Outdated
| m.localNonce.Record(ctx, int64(nonce)) | ||
| } | ||
|
|
||
| func (m *txmMetrics) SetRPCNonce(ctx context.Context, address common.Address, nonce uint64, source string) { | ||
| promRPCNonce.WithLabelValues(m.chainID.String(), address.String(), source).Set(float64(nonce)) | ||
| m.rpcNonce.Record(ctx, int64(nonce)) | ||
| } | ||
|
|
||
| func (m *txmMetrics) EmitTxMessage(ctx context.Context, txHash common.Hash, fromAddress common.Address, tx *types.Transaction) error { |
There was a problem hiding this comment.
SetLocalNonce/SetRPCNonce accept address (and source) and the Prometheus gauges are labeled accordingly, but the Beholder/OTel gauges are recorded without any attributes. In a multi-address TXM this produces a single time series that will effectively flap/overwrite between different addresses (and between pending/latest), making the Beholder metric ambiguous or incorrect. Consider either (a) recording with attributes for chainID, address and source (using the Labeler), or (b) dropping the Beholder gauges entirely if high-cardinality dimensions are intentionally avoided.
| m.localNonce.Record(ctx, int64(nonce)) | |
| } | |
| func (m *txmMetrics) SetRPCNonce(ctx context.Context, address common.Address, nonce uint64, source string) { | |
| promRPCNonce.WithLabelValues(m.chainID.String(), address.String(), source).Set(float64(nonce)) | |
| m.rpcNonce.Record(ctx, int64(nonce)) | |
| } | |
| func (m *txmMetrics) EmitTxMessage(ctx context.Context, txHash common.Hash, fromAddress common.Address, tx *types.Transaction) error { | |
| } | |
| func (m *txmMetrics) SetRPCNonce(ctx context.Context, address common.Address, nonce uint64, source string) { | |
| promRPCNonce.WithLabelValues(m.chainID.String(), address.String(), source).Set(float64(nonce)) | |
| } | |
| func (m *txmMetrics) EmitTxMessage(ctx context.Context, txHash common.Hash, fromAddress common.Address, tx *types.Transaction) error { | |
| func (m *txmMetrics) EmitTxMessage(ctx context.Context, txHash common.Hash, fromAddress common.Address, tx *types.Transaction) error { |
| if t.Metrics != nil { | ||
| t.Metrics.SetRPCNonce(ctxWithTimeout, address, pendingNonce, "pending") | ||
| } | ||
| t.lggr.Debugf("Set initial nonce for address: %v to %d", address, pendingNonce) |
There was a problem hiding this comment.
Can you upgrade this log message to info? This should log once and it's quite valuable.
pkg/txm/metrics.go
Outdated
| Name: "txm_time_until_tx_confirmed", | ||
| Help: "The amount of time elapsed from a transaction being broadcast to being included in a block.", | ||
| }, []string{"chainID"}) | ||
| promLocalNonce = promauto.NewGaugeVec(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Do we actually need prom metrics?
There was a problem hiding this comment.
Just keeping the pattern, but I think it's good practice if NOPs want to see the metrics
There was a problem hiding this comment.
IMHO, I don't find much value in tracking the local nonce. It is initialized once and gets incremented sequentially. You can also infer that information by looking at the transactions of the node. Similar case for the pending nonce, which is used once for initialization and then there are clear log errors if you're getting throttled. I would keep the latest nonce value to understand which transactions are being confirmed and which are not.
There was a problem hiding this comment.
I see what you're saying, in theory the latest nonce is going to be much more valuable. Yeah, I'll remove it then.
Adds a
local txm nonce metric andrpc nonce metric (pending vslatest)