feat: structured benchmark result output via BENCH_RESULT_OUTPUT by chatton · Pull Request #3195 · evstack/ev-node

chatton · 2026-03-24T11:45:51Z

Summary

Add run_result.go with structured JSON output for benchmark runs, written to BENCH_RESULT_OUTPUT when set
Each test emits full config, metrics (throughput, block fill, intervals, trace-derived timings), tags, and field descriptions
Wire up emitRunResult in TestGasBurner, TestERC20Throughput, and TestDeFiSimulation

Context

Supports the external-mode matrix runner in the benchmarks repo. Each matrix entry sets BENCH_RESULT_OUTPUT to collect structured results that can be aggregated into comparison tables.

This output will be used across many runs with different permutations of configurations to generate markdown tables.

local test output

  {
    "schema_version": 1,
    "test": "TestResult",
    "timestamp": "2026-03-24T11:56:17Z",
    "platform": "darwin/arm64",
    "config": {
      "block_time": "100ms",
      "slot_duration": "50ms",
      "gas_limit": "0x5F5E100",
      "scrape_interval": "25ms",
      "num_spammers": 4,
      "count_per_spammer": 10000,
      "throughput": 60,
      "warmup_txs": 100,
      "gas_units_to_burn": 0,
      "max_wallets": 200,
      "wait_timeout": "5m0s"
    },
    "tags": {
      "ev_reth": "latest",
      "ev_node": "unknown"
    },
    "metrics": {
      "mgas_per_sec": 150,
      "tps": 2250,
      "blocks_per_sec": 7.5,
      "non_empty_ratio_pct": 75,
      "avg_gas_per_block": 20000000,
      "avg_tx_per_block": 300,
      "gas_block_p50": 20000000,
      "gas_block_p99": 20000000,
      "tx_block_p50": 300,
      "tx_block_p99": 300,
      "avg_block_interval_ms": 100,
      "block_interval_p50_ms": 100,
      "block_interval_p99_ms": 100,
      "steady_state_sec": 2,
      "wall_clock_sec": 10,
      "overhead_pct": 23.52941176470588,
      "ev_reth_ggas_per_sec": 6,
      "secs_per_gigagas": 6.666666666666667,
      "produce_block_avg_ms": 85,
      "produce_block_min_ms": 80,
      "produce_block_max_ms": 90,
      "get_payload_avg_ms": 40,
      "get_payload_min_ms": 40,
      "get_payload_max_ms": 40,
      "new_payload_avg_ms": 50,
      "new_payload_min_ms": 50,
      "new_payload_max_ms": 50
    },
    "block_range": {
      "start": 100,
      "end": 120,
      "total": 20,
      "non_empty": 15
    },
    "spamoor": {
      "sent": 40000,
      "failed": 5
    },
    "field_descriptions": { ... }
  }

Summary by CodeRabbit

Tests
- Enhanced benchmark test infrastructure with structured result emission and configurable parameters.
- Improved metrics collection and resource attribute extraction from execution traces.
- Added support for transaction spamming configuration (pending limits, rebroadcast settings, base and tip fees).
Refactor
- Refactored trace processing helpers for improved composability and maintainability.

emit full benchmark run metadata (config, tags, metrics, block range, spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by external matrix runner for table generation.

claude · 2026-03-24T11:46:06Z

Claude finished @chatton's task in 1m 57s —— View job

Code Review

Gather context and read changed files
Analyze code and post review

Overall, this is a clean and well-structured addition. The defer-based emission pattern is consistent and safe, the schema is self-describing with field_descriptions, and the opt-in via BENCH_RESULT_OUTPUT is the right design for a benchmark artifact.

A few issues worth fixing before merge, ranging from a functional gap to minor nits.

Functional Issues

1. New `benchConfig` fields missing from JSON output (`run_result.go`, `config.go`)

MaxPending, Rebroadcast, BaseFee, and TipFee were added to benchConfig and wired into all spammer configs, but they are absent from runConfig (the struct serialized to BENCH_RESULT_OUTPUT). For a matrix runner that's comparing runs with different permutations, this makes the output incomplete — you won't be able to tell from the artifact alone which fee or pending limits were in effect.

// run_result.go - runConfig is missing these fields:
type runConfig struct {
    // ... existing fields ...
    // MaxPending, Rebroadcast, BaseFee, TipFee are absent
}

They also have no entries in fieldDescriptions(). Fix this →

2. `evRethAttrs` collected but never emitted (`suite_test.go:248`, `run_result.go:166–172`)

collectTraces populates tr.evRethAttrs but buildRunResult only uses br.traces.evNodeAttrs for the host field. The reth-side resource attributes (host, CPU, memory) are silently dropped from the artifact. This matters if ev-reth runs on a different machine.

CodeRabbit already flagged this. Either include it in the output as a separate reth_host field, or document that it's intentionally excluded. Fix this →

Code Quality

3. `BENCH_OBJECTIVE` read directly from env in `buildRunResult` (`run_result.go:169`)

All other env-driven config goes through benchConfig, but BENCH_OBJECTIVE is read inside buildRunResult. This is inconsistent and makes the function impure (hidden I/O). It also means objective is absent from fieldDescriptions().

// run_result.go:169
Objective: os.Getenv("BENCH_OBJECTIVE"),

Consider adding it to benchConfig alongside the other BENCH_* vars, or at minimum document it in fieldDescriptions().

4. `extractResourceAttr` uses substring matching (`traces.go:337`)

for k, v := range raw {
    if strings.Contains(k, attr) {
        return v
    }
}

A search for "host.name" would also match a hypothetical "host.name2" or "host.namespace". More critically, each attribute extraction call in extractResourceAttrs re-unmarshals the entire JSON line — 7 separate json.Unmarshal calls for one span record. For test code this is fine on correctness, but a single unmarshal into map[string]string shared across all attribute lookups would be cleaner. Fix this →

5. `cfg.log(t)` called inconsistently across tests

TestGasBurner calls cfg.log(t) at the top, but TestERC20Throughput and TestStatePressure do not. Minor, but consistent logging is useful for debugging CI runs.

Also, cfg.log at config.go:63 doesn't log the four new fields (MaxPending, Rebroadcast, BaseFee, TipFee), so even where it's called, the new parameters are invisible in test output.

6. Negative env var validation (`config.go:50–53`)

CodeRabbit flagged this correctly. envInt accepts any integer, including negatives. Passing BENCH_MAX_PENDING=-1 to Spamoor would likely produce confusing failures. A max(0, n) clamp or an explicit check on construction would make misconfiguration fail fast. (Chatton acknowledged the timeout issue for follow-up; this one is independent.)

Minor / Nits

7. `runSpamoorStats.Sent`/`Failed` typed as `float64` but used as counts

The float64 type is carried over from Prometheus counter semantics (which waitForSpamoorDone and GetMetrics return). This is understandable, but the JSON output field name "sent" suggests an integer count. A comment explaining why it's float64 (Prometheus counters are always float64) would prevent future confusion.

8. Schema version increment policy undocumented

SchemaVersion: 1 is hardcoded. Consider adding a comment in run_result.go explaining what constitutes a breaking schema change (e.g., removing or renaming fields) vs. a backwards-compatible addition (e.g., new omitempty fields), so consumers of the artifact know when to expect breakage.

Summary

	Issue	Severity
1	New config fields missing from JSON output	Medium — functional gap
2	`evRethAttrs` collected but not emitted	Medium — silent data loss
3	`BENCH_OBJECTIVE` read from env inside builder	Low — inconsistency
4	7× JSON unmarshal per span for resource attrs	Low — clean-up
5	`cfg.log` called inconsistently, missing new fields	Low — observability
6	Negative env int values not validated	Low — fail-fast
7	`float64` count type unexplained	Nit
8	Schema version policy undocumented	Nit

The fetchResourceAttrs timeout issue (item from CodeRabbit / @chatton) is tracked for follow-up — no action needed here.

coderabbitai · 2026-03-24T11:46:07Z

📝 Walkthrough

Walkthrough

Adds structured JSON "run result" emission for benchmarks, expands benchConfig with spammer parameters, captures final spamoor metrics and resource attributes, refactors trace/stat helpers, and integrates result emission into multiple spamoor/gasburner benchmark tests.

Changes

Cohort / File(s)	Summary
Result emission infrastructure `test/e2e/benchmark/run_result.go`	New file: defines runResult schema, builders, and `emitRunResult()` to write JSON when `BENCH_RESULT_OUTPUT` is set. Populates metrics, block-range, tags, host attributes, and optional spamoor stats and trace-derived fields.
Benchmark config `test/e2e/benchmark/config.go`	Added benchConfig fields `MaxPending`, `Rebroadcast`, `BaseFee`, `TipFee` and env-var parsing for `BENCH_MAX_PENDING`, `BENCH_REBROADCAST`, `BENCH_BASE_FEE`, `BENCH_TIP_FEE`.
Test integrations (emit + spamoor metrics) `test/e2e/benchmark/gasburner_test.go`, `test/e2e/benchmark/spamoor_defi_test.go`, `test/e2e/benchmark/spamoor_erc20_test.go`, `test/e2e/benchmark/spamoor_state_pressure_test.go`	Declare run-level `result`, `wallClock`, `spamoorStats`; defer conditional `emitRunResult(...)`; replace hardcoded spamoor params with `cfg.*`; collect final spamoor metrics via `api.GetMetrics()` or `waitForSpamoorDone()` and populate `runSpamoorStats`.
Trace/resource attr handling `test/e2e/benchmark/traces.go`, `test/e2e/benchmark/suite_test.go`, `test/e2e/benchmark/spamoor_smoke_test.go`	Introduce optional `resourceAttrCollector` and `resourceAttrs`; implement `fetchResourceAttrs` for victoriaTraceProvider; `env` now stores `evNodeServiceName`/`evRethServiceName`; `collectTraces` no longer takes serviceName argument and fetches resource attrs. Update smoke test to call `s.collectTraces(e)` (no explicit service name).
Span/stat helpers & result shape `test/e2e/benchmark/helpers.go`, `test/e2e/benchmark/result.go`	Add `overheadFromStats` and `rethRateFromStats`, refactor `evNodeOverhead`/`rethExecutionRate` to use them; adjust engine span timing units (use microseconds->milliseconds float); extend `traceResult` with `evNodeAttrs` and `evRethAttrs`.
Minor test updates `test/e2e/benchmark/gasburner_test.go` (also)	Adjustments to variable scoping, trace collection calls, and benchmark CreateSpammer parameters to use `cfg` fields.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Benchmark Test
  participant SpammerAPI as Spamoor API
  participant Traces as VictoriaTraces
  participant FS as Filesystem

  Test->>SpammerAPI: run workload / waitForSpamoorDone
  SpammerAPI-->>Test: final sent/failed counters
  Test->>Traces: collect spans & resource attrs
  Traces-->>Test: spans + resource attributes
  Test->>Test: buildRunResult(result, wallClock, spans, attrs, spamoorStats)
  Test->>FS: emitRunResult -> write JSON file (BENCH_RESULT_OUTPUT)
  FS-->>Test: write success/failure

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(benchmarking): adding gas burner test #3115: Touches gasburner benchmark and related emitRunResult integrations—strong overlap with result emission changes.
feat: add TestDeFiSimulation benchmark for Uniswap V2 workload #3187: Modifies spamoor_deFi/ERC20 benchmark flows and benchConfig use; closely related to parameterization and test updates here.
feat: add TestStatePressure benchmark for state trie stress test #3188: Adjusts e2e benchmark suite and benchConfig consumption; likely overlaps with service-name and trace/resource changes.

Suggested reviewers

alpe
tac0turtle

Poem

🐰
I hopped through tests with a pen of light,
Counting gas and traces through the night,
JSON carrots left for curious eyes,
Metrics snug beneath moonlit skies,
Hop—benchmark done, and all data bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description provides a comprehensive summary with context, but lacks the explicit structure and sections required by the template (Overview section with issue/background/goal).	Restructure the description to follow the template: add an Overview section with the goal, background, and relevant context/issue references before the Summary section.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: structured benchmark result output via BENCH_RESULT_OUTPUT' clearly and concisely summarizes the main change: adding structured JSON benchmark output written to an environment variable.
Docstring Coverage	✅ Passed	Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cian/benchmark-results-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-24T11:46:26Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Mar 30, 2026, 2:27 PM

codecov · 2026-03-24T11:49:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.45%. Comparing base (146e6e1) to head (9012f2f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3195      +/-   ##
==========================================
+ Coverage   61.43%   61.45%   +0.01%     
==========================================
  Files         120      120              
  Lines       12470    12470              
==========================================
+ Hits         7661     7663       +2     
+ Misses       3949     3948       -1     
+ Partials      860      859       -1

Flag	Coverage Δ
combined	`61.45% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Deduplicate overhead/reth-rate computation: move stats-based helpers to helpers.go, make span-based wrappers delegate to them - Fix sub-millisecond precision loss in engine span timings by using microsecond-based float division instead of integer truncation - Add spamoor stats to TestGasBurner for consistency with other tests

- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig - Fix ERC20 test hardcoding max_wallets=200 instead of using cfg - Replace all hardcoded spamoor params with cfg fields across tests

- Add resourceAttrs struct with host, OS, and service fields - Extract attributes from VictoriaTraces LogsQL span data via resourceAttrCollector interface - Include host metadata in structured benchmark result JSON

Move emitRunResult into a deferred closure in all three test functions. If the test fails after metrics are collected, the structured JSON is still written. If it fails before result data exists, the defer is a no-op.

Remove the 3-second sleep before requireSpammersRunning that caused all transactions to be mined before the measurement window started, leaving SteadyState at 0s. Also add deferred emitRunResult, configurable spamoor params, and spamoorStats collection to match the other benchmark tests.

… mode In external mode the sequencer reports spans as "ev-node" (not the test-specific name like "ev-node-erc20"), so trace queries returned zero spans. Store service names on env: local mode uses the test-specific name, external mode defaults to "ev-node"/"ev-reth" with BENCH_EVNODE_SERVICE_NAME/BENCH_EVRETH_SERVICE_NAME overrides.

fetchResourceAttrs only needs one span but was streaming the full result set from VictoriaTraces. Add limit=1 to the LogsQL query to avoid wasting bandwidth on long-lived instances with many spans.

alpe

👍 uAck

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

test/e2e/benchmark/helpers.go (1)
390-395: Avoid silent metric disappearance on “zero-count” paths.

At Line 390 / Line 394 / Line 413, returning ok=false drops the metric downstream with no diagnostic signal (field omitted in JSON/log paths). For matrix comparisons, this can make runs harder to interpret. Consider surfacing an explicit reason (or a stable placeholder field) so missing values are distinguishable from truly unavailable traces.

Also applies to: 413-414
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/helpers.go` around lines 390 - 395, The current
metric-extraction code returns ok=false whenever a metric exists but has
Count==0 (e.g., the checks around produce, execute and stats[spanExecuteTxs]),
which causes the metric to be dropped downstream and indistinguishable from a
truly missing metric; change the logic in the function that inspects stats (the
blocks referencing produce, execute, stats and spanExecuteTxs) so that instead
of returning ok=false for zero-count cases you return a stable placeholder value
(e.g., NaN or a sentinel like -1) or set ok=true plus an explicit reason/flag
(e.g., zeroCount=true or reason="zero_count") so downstream JSON/loggers always
include the field and can differentiate “zero count” from “no data.” Ensure the
change updates all similar branches (including the checks at lines handling
produce/execute and the block noted at 413–414) and preserves the original
non-zero behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/benchmark/config.go`:
- Around line 50-53: The BENCH_* knobs can be negative because envInt allows -1;
update the benchConfig construction (where MaxPending, Rebroadcast, BaseFee,
TipFee are set) to validate or clamp values returned by envInt so negatives are
rejected or set to sane defaults—e.g., ensure MaxPending >= 0, Rebroadcast >= 0,
BaseFee >= 0, TipFee >= 0—and return an error (or panic) from the bench config
constructor (the function that builds benchConfig) on invalid inputs so
misconfiguration fails fast before hitting Spamoor.

In `@test/e2e/benchmark/suite_test.go`:
- Around line 246-248: collectTraces currently populates tr.evRethAttrs but
buildRunResult only includes traces.evNodeAttrs in the serialized
BENCH_RESULT_OUTPUT, so evRethAttrs is never exported; to fix, update
buildRunResult (or the serialization path that produces BENCH_RESULT_OUTPUT) to
include tr.evRethAttrs alongside tr.evNodeAttrs (e.g., add a field for
evRethAttrs in the RunResult/traces struct and serialize it), and ensure any
consumers expecting the output are updated to read the new field; touch the
functions buildRunResult and the traces/RunResult type so evRethAttrs is carried
through to output.

In `@test/e2e/benchmark/traces.go`:
- Around line 266-304: fetchResourceAttrs can block forever because it uses the
caller context (t.Context()) for the HTTP request; wrap the incoming ctx in a
short child timeout and use that for the request to ensure the optional
resource-attrs query cannot hang the whole test. Modify fetchResourceAttrs to
create a child context with a small timeout (e.g. 5s) via
context.WithTimeout(ctx, 5*time.Second) and defer cancel(), then pass that child
context to http.NewRequestWithContext / http.DefaultClient.Do so the request and
response read will be cancelled if it stalls; ensure the cancel is deferred
immediately after creating the timed context.

---

Nitpick comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 390-395: The current metric-extraction code returns ok=false
whenever a metric exists but has Count==0 (e.g., the checks around produce,
execute and stats[spanExecuteTxs]), which causes the metric to be dropped
downstream and indistinguishable from a truly missing metric; change the logic
in the function that inspects stats (the blocks referencing produce, execute,
stats and spanExecuteTxs) so that instead of returning ok=false for zero-count
cases you return a stable placeholder value (e.g., NaN or a sentinel like -1) or
set ok=true plus an explicit reason/flag (e.g., zeroCount=true or
reason="zero_count") so downstream JSON/loggers always include the field and can
differentiate “zero count” from “no data.” Ensure the change updates all similar
branches (including the checks at lines handling produce/execute and the block
noted at 413–414) and preserves the original non-zero behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ceebf01-d369-4248-87a4-dbfcd7f438df

📥 Commits

Reviewing files that changed from the base of the PR and between 23d0718 and f0f71bd.

📒 Files selected for processing (11)

test/e2e/benchmark/config.go
test/e2e/benchmark/gasburner_test.go
test/e2e/benchmark/helpers.go
test/e2e/benchmark/result.go
test/e2e/benchmark/run_result.go
test/e2e/benchmark/spamoor_defi_test.go
test/e2e/benchmark/spamoor_erc20_test.go
test/e2e/benchmark/spamoor_smoke_test.go
test/e2e/benchmark/spamoor_state_pressure_test.go
test/e2e/benchmark/suite_test.go
test/e2e/benchmark/traces.go

🚧 Files skipped from review as they are similar to previous changes (4)

test/e2e/benchmark/spamoor_erc20_test.go
test/e2e/benchmark/spamoor_defi_test.go
test/e2e/benchmark/gasburner_test.go
test/e2e/benchmark/run_result.go

coderabbitai · 2026-03-30T10:53:04Z

test/e2e/benchmark/config.go

+		MaxPending:      envInt("BENCH_MAX_PENDING", 50_000),
+		Rebroadcast:     envInt("BENCH_REBROADCAST", 0),
+		BaseFee:         envInt("BENCH_BASE_FEE", 20),
+		TipFee:          envInt("BENCH_TIP_FEE", 2),


⚠️ Potential issue | 🟡 Minor

Reject negative BENCH_* load knobs up front.

envInt accepts -1 here, so max_pending, rebroadcast, and the fee fields can go negative and only fail much later inside Spamoor. Validate or clamp these values when building benchConfig so misconfigured runs fail fast.

Possible fix

+func envNonNegativeInt(key string, fallback int) int { + n := envInt(key, fallback) + if n < 0 { + return fallback + } + return n +} + - MaxPending: envInt("BENCH_MAX_PENDING", 50_000), - Rebroadcast: envInt("BENCH_REBROADCAST", 0), - BaseFee: envInt("BENCH_BASE_FEE", 20), - TipFee: envInt("BENCH_TIP_FEE", 2), + MaxPending: envNonNegativeInt("BENCH_MAX_PENDING", 50_000), + Rebroadcast: envNonNegativeInt("BENCH_REBROADCAST", 0), + BaseFee: envNonNegativeInt("BENCH_BASE_FEE", 20), + TipFee: envNonNegativeInt("BENCH_TIP_FEE", 2),

As per coding guidelines, "Validate all inputs from external sources in Go code."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

MaxPending: envInt("BENCH_MAX_PENDING", 50_000),

Rebroadcast: envInt("BENCH_REBROADCAST", 0),

BaseFee: envInt("BENCH_BASE_FEE", 20),

TipFee: envInt("BENCH_TIP_FEE", 2),

func envNonNegativeInt(key string, fallback int) int {

n := envInt(key, fallback)

if n < 0 {

return fallback

}

return n

}

MaxPending: envNonNegativeInt("BENCH_MAX_PENDING", 50_000),

Rebroadcast: envNonNegativeInt("BENCH_REBROADCAST", 0),

BaseFee: envNonNegativeInt("BENCH_BASE_FEE", 20),

TipFee: envNonNegativeInt("BENCH_TIP_FEE", 2),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/config.go` around lines 50 - 53, The BENCH_* knobs can be negative because envInt allows -1; update the benchConfig construction (where MaxPending, Rebroadcast, BaseFee, TipFee are set) to validate or clamp values returned by envInt so negatives are rejected or set to sane defaults—e.g., ensure MaxPending >= 0, Rebroadcast >= 0, BaseFee >= 0, TipFee >= 0—and return an error (or panic) from the bench config constructor (the function that builds benchConfig) on invalid inputs so misconfiguration fails fast before hitting Spamoor.

coderabbitai · 2026-03-30T10:53:05Z

test/e2e/benchmark/suite_test.go

+	if rac, ok := e.traces.(resourceAttrCollector); ok {
+		tr.evNodeAttrs = rac.fetchResourceAttrs(ctx, e.evNodeServiceName)
+		tr.evRethAttrs = rac.fetchResourceAttrs(ctx, e.evRethServiceName)


⚠️ Potential issue | 🟡 Minor

evRethAttrs is fetched but never reaches the artifact.

collectTraces fills tr.evRethAttrs here, but buildRunResult only serializes traces.evNodeAttrs (test/e2e/benchmark/run_result.go:166-172). Right now the second lookup is dead work and the reth-side host metadata is silently dropped from BENCH_RESULT_OUTPUT.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/suite_test.go` around lines 246 - 248, collectTraces currently populates tr.evRethAttrs but buildRunResult only includes traces.evNodeAttrs in the serialized BENCH_RESULT_OUTPUT, so evRethAttrs is never exported; to fix, update buildRunResult (or the serialization path that produces BENCH_RESULT_OUTPUT) to include tr.evRethAttrs alongside tr.evNodeAttrs (e.g., add a field for evRethAttrs in the RunResult/traces struct and serialize it), and ensure any consumers expecting the output are updated to read the new field; touch the functions buildRunResult and the traces/RunResult type so evRethAttrs is carried through to output.

coderabbitai · 2026-03-30T10:53:05Z

test/e2e/benchmark/traces.go

+func (v *victoriaTraceProvider) fetchResourceAttrs(ctx context.Context, serviceName string) *resourceAttrs {
+	end := time.Now()
+	query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)
+	baseURL := strings.TrimRight(v.queryURL, "/")
+	url := fmt.Sprintf("%s/select/logsql/query?query=%s&start=%s&end=%s&limit=1",
+		baseURL,
+		neturl.QueryEscape(query),
+		v.startTime.Format(time.RFC3339Nano),
+		end.Format(time.RFC3339Nano))
+
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
+	if err != nil {
+		v.t.Logf("warning: failed to create resource attrs request: %v", err)
+		return nil
+	}
+
+	resp, err := http.DefaultClient.Do(req)
+	if err != nil {
+		v.t.Logf("warning: failed to fetch resource attrs: %v", err)
+		return nil
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		v.t.Logf("warning: unexpected status %d fetching resource attrs", resp.StatusCode)
+		return nil
+	}
+
+	scanner := bufio.NewScanner(resp.Body)
+	scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
+	for scanner.Scan() {
+		line := scanner.Bytes()
+		if len(line) == 0 {
+			continue
+		}
+		attrs := extractResourceAttrs(line)
+		return &attrs
+	}
+	return nil


⚠️ Potential issue | 🟠 Major

Give the optional resource-attrs query its own timeout.

collectTraces passes t.Context() into this method, so a stalled VictoriaTraces endpoint can keep http.DefaultClient.Do blocked until the whole test times out. That makes optional host metadata capable of hanging the benchmark and delaying artifact emission after span collection already succeeded.

Possible fix

func (v *victoriaTraceProvider) fetchResourceAttrs(ctx context.Context, serviceName string) *resourceAttrs { + ctx, cancel := context.WithTimeout(ctx, 15*time.Second) + defer cancel() + end := time.Now() query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)

As per coding guidelines, "Use context.Context for cancellation in Go."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/traces.go` around lines 266 - 304, fetchResourceAttrs can block forever because it uses the caller context (t.Context()) for the HTTP request; wrap the incoming ctx in a short child timeout and use that for the request to ensure the optional resource-attrs query cannot hang the whole test. Modify fetchResourceAttrs to create a child context with a small timeout (e.g. 5s) via context.WithTimeout(ctx, 5*time.Second) and defer cancel(), then pass that child context to http.NewRequestWithContext / http.DefaultClient.Do so the request and response read will be cancelled if it stalls; ensure the cancel is deferred immediately after creating the timed context.

will address in follow up

@chatton, understood — thanks for the heads-up! Would you like me to open a GitHub issue to track adding the context.WithTimeout guard to fetchResourceAttrs so it doesn't get lost?

🧠 Learnings used

Learnt from: CR Repo: evstack/ev-node PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-03-20T13:39:38.827Z Learning: Applies to **/*.go : Use context.Context for cancellation in Go

Learnt from: alpe Repo: evstack/ev-node PR: 3131 File: block/internal/syncing/syncer_backoff_test.go:241-245 Timestamp: 2026-03-06T09:40:36.029Z Learning: In evstack/ev-node, the module declares go 1.25.6. Since Go 1.22, loop variables are per-iteration by default, so loop variable capture is not a concern. Do not flag or fix loop variable capture in this codebase for any Go files; this rule applies to all Go files in the repository, not just the specific test file.

feat(benchmarking): add structured result output via BENCH_RESULT_OUTPUT

23d0718

emit full benchmark run metadata (config, tags, metrics, block range, spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by external matrix runner for table generation.

github-actions bot assigned chatton Mar 24, 2026

chatton marked this pull request as draft March 24, 2026 11:49

chatton added 13 commits March 24, 2026 12:17

Merge branch 'main' into cian/benchmark-results-artifact

3dc3e35

refactor: make spamoor config fully configurable via BENCH_* env vars

c0592ee

- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig - Fix ERC20 test hardcoding max_wallets=200 instead of using cfg - Replace all hardcoded spamoor params with cfg fields across tests

fix: defer emitRunResult so results are written even on test failure

c55bec1

Move emitRunResult into a deferred closure in all three test functions. If the test fails after metrics are collected, the structured JSON is still written. If it fails before result data exists, the defer is a no-op.

perf: use limit=1 for resource attribute trace queries

f2936c8

fetchResourceAttrs only needs one span but was streaming the full result set from VictoriaTraces. Add limit=1 to the LogsQL query to avoid wasting bandwidth on long-lived instances with many spans.

Merge branch 'main' into cian/benchmark-results-artifact

1028b10

Merge branch 'main' into cian/benchmark-results-artifact

41a4e40

docs: add missing doc comments to run_result.go functions

3fb75ee

Merge branch 'main' into cian/benchmark-results-artifact

f0eb3a7

Merge branch 'main' into cian/benchmark-results-artifact

f0f71bd

chatton marked this pull request as ready for review March 30, 2026 10:41

alpe approved these changes Mar 30, 2026

View reviewed changes

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

chatton added this pull request to the merge queue Mar 30, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026

Merge branch 'main' into cian/benchmark-results-artifact

9012f2f

chatton enabled auto-merge March 30, 2026 14:21

chatton added this pull request to the merge queue Mar 30, 2026

Merged via the queue into main with commit 5e3db23 Mar 30, 2026
36 checks passed

chatton deleted the cian/benchmark-results-artifact branch March 30, 2026 15:10

-		MaxPending:      envInt("BENCH_MAX_PENDING", 50_000),
-		Rebroadcast:     envInt("BENCH_REBROADCAST", 0),
-		BaseFee:         envInt("BENCH_BASE_FEE", 20),
-		TipFee:          envInt("BENCH_TIP_FEE", 2),
+func envNonNegativeInt(key string, fallback int) int {
+	n := envInt(key, fallback)
+	if n < 0 {
+		return fallback
+	}
+	return n
+}
+		MaxPending:      envNonNegativeInt("BENCH_MAX_PENDING", 50_000),
+		Rebroadcast:     envNonNegativeInt("BENCH_REBROADCAST", 0),
+		BaseFee:         envNonNegativeInt("BENCH_BASE_FEE", 20),
+		TipFee:          envNonNegativeInt("BENCH_TIP_FEE", 2),

Conversation

chatton commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Summary by CodeRabbit

Uh oh!

claude bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Functional Issues

1. New benchConfig fields missing from JSON output (run_result.go, config.go)

2. evRethAttrs collected but never emitted (suite_test.go:248, run_result.go:166–172)

Code Quality

3. BENCH_OBJECTIVE read directly from env in buildRunResult (run_result.go:169)

4. extractResourceAttr uses substring matching (traces.go:337)

5. cfg.log(t) called inconsistently across tests

6. Negative env var validation (config.go:50–53)

Minor / Nits

7. runSpamoorStats.Sent/Failed typed as float64 but used as counts

8. Schema version increment policy undocumented

Summary

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alpe left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatton Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chatton commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

claude bot commented Mar 24, 2026 •

edited

Loading

1. New `benchConfig` fields missing from JSON output (`run_result.go`, `config.go`)

2. `evRethAttrs` collected but never emitted (`suite_test.go:248`, `run_result.go:166–172`)

3. `BENCH_OBJECTIVE` read directly from env in `buildRunResult` (`run_result.go:169`)

4. `extractResourceAttr` uses substring matching (`traces.go:337`)

5. `cfg.log(t)` called inconsistently across tests

6. Negative env var validation (`config.go:50–53`)

7. `runSpamoorStats.Sent`/`Failed` typed as `float64` but used as counts

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

codecov bot commented Mar 24, 2026 •

edited

Loading

coderabbitai bot Mar 30, 2026 •

edited

Loading