Skip to content

feat: structured benchmark result output via BENCH_RESULT_OUTPUT#3195

Merged
chatton merged 15 commits intomainfrom
cian/benchmark-results-artifact
Mar 30, 2026
Merged

feat: structured benchmark result output via BENCH_RESULT_OUTPUT#3195
chatton merged 15 commits intomainfrom
cian/benchmark-results-artifact

Conversation

@chatton
Copy link
Copy Markdown
Contributor

@chatton chatton commented Mar 24, 2026

Summary

  • Add run_result.go with structured JSON output for benchmark runs, written to BENCH_RESULT_OUTPUT when set
  • Each test emits full config, metrics (throughput, block fill, intervals, trace-derived timings), tags, and field descriptions
  • Wire up emitRunResult in TestGasBurner, TestERC20Throughput, and TestDeFiSimulation

Context

Supports the external-mode matrix runner in the benchmarks repo. Each matrix entry sets BENCH_RESULT_OUTPUT to collect structured results that can be aggregated into comparison tables.

This output will be used across many runs with different permutations of configurations to generate markdown tables.

local test output

  {
    "schema_version": 1,
    "test": "TestResult",
    "timestamp": "2026-03-24T11:56:17Z",
    "platform": "darwin/arm64",
    "config": {
      "block_time": "100ms",
      "slot_duration": "50ms",
      "gas_limit": "0x5F5E100",
      "scrape_interval": "25ms",
      "num_spammers": 4,
      "count_per_spammer": 10000,
      "throughput": 60,
      "warmup_txs": 100,
      "gas_units_to_burn": 0,
      "max_wallets": 200,
      "wait_timeout": "5m0s"
    },
    "tags": {
      "ev_reth": "latest",
      "ev_node": "unknown"
    },
    "metrics": {
      "mgas_per_sec": 150,
      "tps": 2250,
      "blocks_per_sec": 7.5,
      "non_empty_ratio_pct": 75,
      "avg_gas_per_block": 20000000,
      "avg_tx_per_block": 300,
      "gas_block_p50": 20000000,
      "gas_block_p99": 20000000,
      "tx_block_p50": 300,
      "tx_block_p99": 300,
      "avg_block_interval_ms": 100,
      "block_interval_p50_ms": 100,
      "block_interval_p99_ms": 100,
      "steady_state_sec": 2,
      "wall_clock_sec": 10,
      "overhead_pct": 23.52941176470588,
      "ev_reth_ggas_per_sec": 6,
      "secs_per_gigagas": 6.666666666666667,
      "produce_block_avg_ms": 85,
      "produce_block_min_ms": 80,
      "produce_block_max_ms": 90,
      "get_payload_avg_ms": 40,
      "get_payload_min_ms": 40,
      "get_payload_max_ms": 40,
      "new_payload_avg_ms": 50,
      "new_payload_min_ms": 50,
      "new_payload_max_ms": 50
    },
    "block_range": {
      "start": 100,
      "end": 120,
      "total": 20,
      "non_empty": 15
    },
    "spamoor": {
      "sent": 40000,
      "failed": 5
    },
    "field_descriptions": { ... }
  }

Summary by CodeRabbit

  • Tests

    • Enhanced benchmark test infrastructure with structured result emission and configurable parameters.
    • Improved metrics collection and resource attribute extraction from execution traces.
    • Added support for transaction spamming configuration (pending limits, rebroadcast settings, base and tip fees).
  • Refactor

    • Refactored trace processing helpers for improved composability and maintainability.

emit full benchmark run metadata (config, tags, metrics, block range,
spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by
external matrix runner for table generation.
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 24, 2026

Claude finished @chatton's task in 1m 57s —— View job


Code Review

  • Gather context and read changed files
  • Analyze code and post review

Overall, this is a clean and well-structured addition. The defer-based emission pattern is consistent and safe, the schema is self-describing with field_descriptions, and the opt-in via BENCH_RESULT_OUTPUT is the right design for a benchmark artifact.

A few issues worth fixing before merge, ranging from a functional gap to minor nits.


Functional Issues

1. New benchConfig fields missing from JSON output (run_result.go, config.go)

MaxPending, Rebroadcast, BaseFee, and TipFee were added to benchConfig and wired into all spammer configs, but they are absent from runConfig (the struct serialized to BENCH_RESULT_OUTPUT). For a matrix runner that's comparing runs with different permutations, this makes the output incomplete — you won't be able to tell from the artifact alone which fee or pending limits were in effect.

// run_result.go - runConfig is missing these fields:
type runConfig struct {
    // ... existing fields ...
    // MaxPending, Rebroadcast, BaseFee, TipFee are absent
}

They also have no entries in fieldDescriptions(). Fix this →

2. evRethAttrs collected but never emitted (suite_test.go:248, run_result.go:166–172)

collectTraces populates tr.evRethAttrs but buildRunResult only uses br.traces.evNodeAttrs for the host field. The reth-side resource attributes (host, CPU, memory) are silently dropped from the artifact. This matters if ev-reth runs on a different machine.

CodeRabbit already flagged this. Either include it in the output as a separate reth_host field, or document that it's intentionally excluded. Fix this →


Code Quality

3. BENCH_OBJECTIVE read directly from env in buildRunResult (run_result.go:169)

All other env-driven config goes through benchConfig, but BENCH_OBJECTIVE is read inside buildRunResult. This is inconsistent and makes the function impure (hidden I/O). It also means objective is absent from fieldDescriptions().

// run_result.go:169
Objective: os.Getenv("BENCH_OBJECTIVE"),

Consider adding it to benchConfig alongside the other BENCH_* vars, or at minimum document it in fieldDescriptions().

4. extractResourceAttr uses substring matching (traces.go:337)

for k, v := range raw {
    if strings.Contains(k, attr) {
        return v
    }
}

A search for "host.name" would also match a hypothetical "host.name2" or "host.namespace". More critically, each attribute extraction call in extractResourceAttrs re-unmarshals the entire JSON line — 7 separate json.Unmarshal calls for one span record. For test code this is fine on correctness, but a single unmarshal into map[string]string shared across all attribute lookups would be cleaner. Fix this →

5. cfg.log(t) called inconsistently across tests

TestGasBurner calls cfg.log(t) at the top, but TestERC20Throughput and TestStatePressure do not. Minor, but consistent logging is useful for debugging CI runs.

Also, cfg.log at config.go:63 doesn't log the four new fields (MaxPending, Rebroadcast, BaseFee, TipFee), so even where it's called, the new parameters are invisible in test output.

6. Negative env var validation (config.go:50–53)

CodeRabbit flagged this correctly. envInt accepts any integer, including negatives. Passing BENCH_MAX_PENDING=-1 to Spamoor would likely produce confusing failures. A max(0, n) clamp or an explicit check on construction would make misconfiguration fail fast. (Chatton acknowledged the timeout issue for follow-up; this one is independent.)


Minor / Nits

7. runSpamoorStats.Sent/Failed typed as float64 but used as counts

The float64 type is carried over from Prometheus counter semantics (which waitForSpamoorDone and GetMetrics return). This is understandable, but the JSON output field name "sent" suggests an integer count. A comment explaining why it's float64 (Prometheus counters are always float64) would prevent future confusion.

8. Schema version increment policy undocumented

SchemaVersion: 1 is hardcoded. Consider adding a comment in run_result.go explaining what constitutes a breaking schema change (e.g., removing or renaming fields) vs. a backwards-compatible addition (e.g., new omitempty fields), so consumers of the artifact know when to expect breakage.


Summary

Issue Severity
1 New config fields missing from JSON output Medium — functional gap
2 evRethAttrs collected but not emitted Medium — silent data loss
3 BENCH_OBJECTIVE read from env inside builder Low — inconsistency
4 7× JSON unmarshal per span for resource attrs Low — clean-up
5 cfg.log called inconsistently, missing new fields Low — observability
6 Negative env int values not validated Low — fail-fast
7 float64 count type unexplained Nit
8 Schema version policy undocumented Nit

The fetchResourceAttrs timeout issue (item from CodeRabbit / @chatton) is tracked for follow-up — no action needed here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Adds structured JSON "run result" emission for benchmarks, expands benchConfig with spammer parameters, captures final spamoor metrics and resource attributes, refactors trace/stat helpers, and integrates result emission into multiple spamoor/gasburner benchmark tests.

Changes

Cohort / File(s) Summary
Result emission infrastructure
test/e2e/benchmark/run_result.go
New file: defines runResult schema, builders, and emitRunResult() to write JSON when BENCH_RESULT_OUTPUT is set. Populates metrics, block-range, tags, host attributes, and optional spamoor stats and trace-derived fields.
Benchmark config
test/e2e/benchmark/config.go
Added benchConfig fields MaxPending, Rebroadcast, BaseFee, TipFee and env-var parsing for BENCH_MAX_PENDING, BENCH_REBROADCAST, BENCH_BASE_FEE, BENCH_TIP_FEE.
Test integrations (emit + spamoor metrics)
test/e2e/benchmark/gasburner_test.go, test/e2e/benchmark/spamoor_defi_test.go, test/e2e/benchmark/spamoor_erc20_test.go, test/e2e/benchmark/spamoor_state_pressure_test.go
Declare run-level result, wallClock, spamoorStats; defer conditional emitRunResult(...); replace hardcoded spamoor params with cfg.*; collect final spamoor metrics via api.GetMetrics() or waitForSpamoorDone() and populate runSpamoorStats.
Trace/resource attr handling
test/e2e/benchmark/traces.go, test/e2e/benchmark/suite_test.go, test/e2e/benchmark/spamoor_smoke_test.go
Introduce optional resourceAttrCollector and resourceAttrs; implement fetchResourceAttrs for victoriaTraceProvider; env now stores evNodeServiceName/evRethServiceName; collectTraces no longer takes serviceName argument and fetches resource attrs. Update smoke test to call s.collectTraces(e) (no explicit service name).
Span/stat helpers & result shape
test/e2e/benchmark/helpers.go, test/e2e/benchmark/result.go
Add overheadFromStats and rethRateFromStats, refactor evNodeOverhead/rethExecutionRate to use them; adjust engine span timing units (use microseconds->milliseconds float); extend traceResult with evNodeAttrs and evRethAttrs.
Minor test updates
test/e2e/benchmark/gasburner_test.go (also)
Adjustments to variable scoping, trace collection calls, and benchmark CreateSpammer parameters to use cfg fields.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Benchmark Test
  participant SpammerAPI as Spamoor API
  participant Traces as VictoriaTraces
  participant FS as Filesystem

  Test->>SpammerAPI: run workload / waitForSpamoorDone
  SpammerAPI-->>Test: final sent/failed counters
  Test->>Traces: collect spans & resource attrs
  Traces-->>Test: spans + resource attributes
  Test->>Test: buildRunResult(result, wallClock, spans, attrs, spamoorStats)
  Test->>FS: emitRunResult -> write JSON file (BENCH_RESULT_OUTPUT)
  FS-->>Test: write success/failure
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • alpe
  • tac0turtle

Poem

🐰
I hopped through tests with a pen of light,
Counting gas and traces through the night,
JSON carrots left for curious eyes,
Metrics snug beneath moonlit skies,
Hop—benchmark done, and all data bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description provides a comprehensive summary with context, but lacks the explicit structure and sections required by the template (Overview section with issue/background/goal). Restructure the description to follow the template: add an Overview section with the goal, background, and relevant context/issue references before the Summary section.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: structured benchmark result output via BENCH_RESULT_OUTPUT' clearly and concisely summarizes the main change: adding structured JSON benchmark output written to an environment variable.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cian/benchmark-results-artifact

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 30, 2026, 2:27 PM

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.45%. Comparing base (146e6e1) to head (9012f2f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3195      +/-   ##
==========================================
+ Coverage   61.43%   61.45%   +0.01%     
==========================================
  Files         120      120              
  Lines       12470    12470              
==========================================
+ Hits         7661     7663       +2     
+ Misses       3949     3948       -1     
+ Partials      860      859       -1     
Flag Coverage Δ
combined 61.45% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chatton chatton marked this pull request as draft March 24, 2026 11:49
chatton added 13 commits March 24, 2026 12:17
- Deduplicate overhead/reth-rate computation: move stats-based helpers
  to helpers.go, make span-based wrappers delegate to them
- Fix sub-millisecond precision loss in engine span timings by using
  microsecond-based float division instead of integer truncation
- Add spamoor stats to TestGasBurner for consistency with other tests
- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig
- Fix ERC20 test hardcoding max_wallets=200 instead of using cfg
- Replace all hardcoded spamoor params with cfg fields across tests
- Add resourceAttrs struct with host, OS, and service fields
- Extract attributes from VictoriaTraces LogsQL span data via
  resourceAttrCollector interface
- Include host metadata in structured benchmark result JSON
Move emitRunResult into a deferred closure in all three test functions.
If the test fails after metrics are collected, the structured JSON is
still written. If it fails before result data exists, the defer is a
no-op.
Remove the 3-second sleep before requireSpammersRunning that caused all
transactions to be mined before the measurement window started, leaving
SteadyState at 0s. Also add deferred emitRunResult, configurable spamoor
params, and spamoorStats collection to match the other benchmark tests.
… mode

In external mode the sequencer reports spans as "ev-node" (not the
test-specific name like "ev-node-erc20"), so trace queries returned
zero spans. Store service names on env: local mode uses the
test-specific name, external mode defaults to "ev-node"/"ev-reth"
with BENCH_EVNODE_SERVICE_NAME/BENCH_EVRETH_SERVICE_NAME overrides.
fetchResourceAttrs only needs one span but was streaming the full
result set from VictoriaTraces. Add limit=1 to the LogsQL query to
avoid wasting bandwidth on long-lived instances with many spans.
@chatton chatton marked this pull request as ready for review March 30, 2026 10:41
Copy link
Copy Markdown
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 uAck

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/e2e/benchmark/helpers.go (1)

390-395: Avoid silent metric disappearance on “zero-count” paths.

At Line 390 / Line 394 / Line 413, returning ok=false drops the metric downstream with no diagnostic signal (field omitted in JSON/log paths). For matrix comparisons, this can make runs harder to interpret. Consider surfacing an explicit reason (or a stable placeholder field) so missing values are distinguishable from truly unavailable traces.

Also applies to: 413-414

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/helpers.go` around lines 390 - 395, The current
metric-extraction code returns ok=false whenever a metric exists but has
Count==0 (e.g., the checks around produce, execute and stats[spanExecuteTxs]),
which causes the metric to be dropped downstream and indistinguishable from a
truly missing metric; change the logic in the function that inspects stats (the
blocks referencing produce, execute, stats and spanExecuteTxs) so that instead
of returning ok=false for zero-count cases you return a stable placeholder value
(e.g., NaN or a sentinel like -1) or set ok=true plus an explicit reason/flag
(e.g., zeroCount=true or reason="zero_count") so downstream JSON/loggers always
include the field and can differentiate “zero count” from “no data.” Ensure the
change updates all similar branches (including the checks at lines handling
produce/execute and the block noted at 413–414) and preserves the original
non-zero behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/benchmark/config.go`:
- Around line 50-53: The BENCH_* knobs can be negative because envInt allows -1;
update the benchConfig construction (where MaxPending, Rebroadcast, BaseFee,
TipFee are set) to validate or clamp values returned by envInt so negatives are
rejected or set to sane defaults—e.g., ensure MaxPending >= 0, Rebroadcast >= 0,
BaseFee >= 0, TipFee >= 0—and return an error (or panic) from the bench config
constructor (the function that builds benchConfig) on invalid inputs so
misconfiguration fails fast before hitting Spamoor.

In `@test/e2e/benchmark/suite_test.go`:
- Around line 246-248: collectTraces currently populates tr.evRethAttrs but
buildRunResult only includes traces.evNodeAttrs in the serialized
BENCH_RESULT_OUTPUT, so evRethAttrs is never exported; to fix, update
buildRunResult (or the serialization path that produces BENCH_RESULT_OUTPUT) to
include tr.evRethAttrs alongside tr.evNodeAttrs (e.g., add a field for
evRethAttrs in the RunResult/traces struct and serialize it), and ensure any
consumers expecting the output are updated to read the new field; touch the
functions buildRunResult and the traces/RunResult type so evRethAttrs is carried
through to output.

In `@test/e2e/benchmark/traces.go`:
- Around line 266-304: fetchResourceAttrs can block forever because it uses the
caller context (t.Context()) for the HTTP request; wrap the incoming ctx in a
short child timeout and use that for the request to ensure the optional
resource-attrs query cannot hang the whole test. Modify fetchResourceAttrs to
create a child context with a small timeout (e.g. 5s) via
context.WithTimeout(ctx, 5*time.Second) and defer cancel(), then pass that child
context to http.NewRequestWithContext / http.DefaultClient.Do so the request and
response read will be cancelled if it stalls; ensure the cancel is deferred
immediately after creating the timed context.

---

Nitpick comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 390-395: The current metric-extraction code returns ok=false
whenever a metric exists but has Count==0 (e.g., the checks around produce,
execute and stats[spanExecuteTxs]), which causes the metric to be dropped
downstream and indistinguishable from a truly missing metric; change the logic
in the function that inspects stats (the blocks referencing produce, execute,
stats and spanExecuteTxs) so that instead of returning ok=false for zero-count
cases you return a stable placeholder value (e.g., NaN or a sentinel like -1) or
set ok=true plus an explicit reason/flag (e.g., zeroCount=true or
reason="zero_count") so downstream JSON/loggers always include the field and can
differentiate “zero count” from “no data.” Ensure the change updates all similar
branches (including the checks at lines handling produce/execute and the block
noted at 413–414) and preserves the original non-zero behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ceebf01-d369-4248-87a4-dbfcd7f438df

📥 Commits

Reviewing files that changed from the base of the PR and between 23d0718 and f0f71bd.

📒 Files selected for processing (11)
  • test/e2e/benchmark/config.go
  • test/e2e/benchmark/gasburner_test.go
  • test/e2e/benchmark/helpers.go
  • test/e2e/benchmark/result.go
  • test/e2e/benchmark/run_result.go
  • test/e2e/benchmark/spamoor_defi_test.go
  • test/e2e/benchmark/spamoor_erc20_test.go
  • test/e2e/benchmark/spamoor_smoke_test.go
  • test/e2e/benchmark/spamoor_state_pressure_test.go
  • test/e2e/benchmark/suite_test.go
  • test/e2e/benchmark/traces.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/e2e/benchmark/spamoor_erc20_test.go
  • test/e2e/benchmark/spamoor_defi_test.go
  • test/e2e/benchmark/gasburner_test.go
  • test/e2e/benchmark/run_result.go

Comment on lines +50 to +53
MaxPending: envInt("BENCH_MAX_PENDING", 50_000),
Rebroadcast: envInt("BENCH_REBROADCAST", 0),
BaseFee: envInt("BENCH_BASE_FEE", 20),
TipFee: envInt("BENCH_TIP_FEE", 2),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject negative BENCH_* load knobs up front.

envInt accepts -1 here, so max_pending, rebroadcast, and the fee fields can go negative and only fail much later inside Spamoor. Validate or clamp these values when building benchConfig so misconfigured runs fail fast.

Possible fix
+func envNonNegativeInt(key string, fallback int) int {
+	n := envInt(key, fallback)
+	if n < 0 {
+		return fallback
+	}
+	return n
+}
+
-		MaxPending:      envInt("BENCH_MAX_PENDING", 50_000),
-		Rebroadcast:     envInt("BENCH_REBROADCAST", 0),
-		BaseFee:         envInt("BENCH_BASE_FEE", 20),
-		TipFee:          envInt("BENCH_TIP_FEE", 2),
+		MaxPending:      envNonNegativeInt("BENCH_MAX_PENDING", 50_000),
+		Rebroadcast:     envNonNegativeInt("BENCH_REBROADCAST", 0),
+		BaseFee:         envNonNegativeInt("BENCH_BASE_FEE", 20),
+		TipFee:          envNonNegativeInt("BENCH_TIP_FEE", 2),
As per coding guidelines, "Validate all inputs from external sources in Go code."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MaxPending: envInt("BENCH_MAX_PENDING", 50_000),
Rebroadcast: envInt("BENCH_REBROADCAST", 0),
BaseFee: envInt("BENCH_BASE_FEE", 20),
TipFee: envInt("BENCH_TIP_FEE", 2),
func envNonNegativeInt(key string, fallback int) int {
n := envInt(key, fallback)
if n < 0 {
return fallback
}
return n
}
MaxPending: envNonNegativeInt("BENCH_MAX_PENDING", 50_000),
Rebroadcast: envNonNegativeInt("BENCH_REBROADCAST", 0),
BaseFee: envNonNegativeInt("BENCH_BASE_FEE", 20),
TipFee: envNonNegativeInt("BENCH_TIP_FEE", 2),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/config.go` around lines 50 - 53, The BENCH_* knobs can be
negative because envInt allows -1; update the benchConfig construction (where
MaxPending, Rebroadcast, BaseFee, TipFee are set) to validate or clamp values
returned by envInt so negatives are rejected or set to sane defaults—e.g.,
ensure MaxPending >= 0, Rebroadcast >= 0, BaseFee >= 0, TipFee >= 0—and return
an error (or panic) from the bench config constructor (the function that builds
benchConfig) on invalid inputs so misconfiguration fails fast before hitting
Spamoor.

Comment on lines +246 to +248
if rac, ok := e.traces.(resourceAttrCollector); ok {
tr.evNodeAttrs = rac.fetchResourceAttrs(ctx, e.evNodeServiceName)
tr.evRethAttrs = rac.fetchResourceAttrs(ctx, e.evRethServiceName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

evRethAttrs is fetched but never reaches the artifact.

collectTraces fills tr.evRethAttrs here, but buildRunResult only serializes traces.evNodeAttrs (test/e2e/benchmark/run_result.go:166-172). Right now the second lookup is dead work and the reth-side host metadata is silently dropped from BENCH_RESULT_OUTPUT.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/suite_test.go` around lines 246 - 248, collectTraces
currently populates tr.evRethAttrs but buildRunResult only includes
traces.evNodeAttrs in the serialized BENCH_RESULT_OUTPUT, so evRethAttrs is
never exported; to fix, update buildRunResult (or the serialization path that
produces BENCH_RESULT_OUTPUT) to include tr.evRethAttrs alongside tr.evNodeAttrs
(e.g., add a field for evRethAttrs in the RunResult/traces struct and serialize
it), and ensure any consumers expecting the output are updated to read the new
field; touch the functions buildRunResult and the traces/RunResult type so
evRethAttrs is carried through to output.

Comment on lines +266 to +304
func (v *victoriaTraceProvider) fetchResourceAttrs(ctx context.Context, serviceName string) *resourceAttrs {
end := time.Now()
query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)
baseURL := strings.TrimRight(v.queryURL, "/")
url := fmt.Sprintf("%s/select/logsql/query?query=%s&start=%s&end=%s&limit=1",
baseURL,
neturl.QueryEscape(query),
v.startTime.Format(time.RFC3339Nano),
end.Format(time.RFC3339Nano))

req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
v.t.Logf("warning: failed to create resource attrs request: %v", err)
return nil
}

resp, err := http.DefaultClient.Do(req)
if err != nil {
v.t.Logf("warning: failed to fetch resource attrs: %v", err)
return nil
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
v.t.Logf("warning: unexpected status %d fetching resource attrs", resp.StatusCode)
return nil
}

scanner := bufio.NewScanner(resp.Body)
scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
for scanner.Scan() {
line := scanner.Bytes()
if len(line) == 0 {
continue
}
attrs := extractResourceAttrs(line)
return &attrs
}
return nil
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Give the optional resource-attrs query its own timeout.

collectTraces passes t.Context() into this method, so a stalled VictoriaTraces endpoint can keep http.DefaultClient.Do blocked until the whole test times out. That makes optional host metadata capable of hanging the benchmark and delaying artifact emission after span collection already succeeded.

Possible fix
 func (v *victoriaTraceProvider) fetchResourceAttrs(ctx context.Context, serviceName string) *resourceAttrs {
+	ctx, cancel := context.WithTimeout(ctx, 15*time.Second)
+	defer cancel()
+
 	end := time.Now()
 	query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)
As per coding guidelines, "Use context.Context for cancellation in Go."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/traces.go` around lines 266 - 304, fetchResourceAttrs can
block forever because it uses the caller context (t.Context()) for the HTTP
request; wrap the incoming ctx in a short child timeout and use that for the
request to ensure the optional resource-attrs query cannot hang the whole test.
Modify fetchResourceAttrs to create a child context with a small timeout (e.g.
5s) via context.WithTimeout(ctx, 5*time.Second) and defer cancel(), then pass
that child context to http.NewRequestWithContext / http.DefaultClient.Do so the
request and response read will be cancelled if it stalls; ensure the cancel is
deferred immediately after creating the timed context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address in follow up

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chatton, understood — thanks for the heads-up! Would you like me to open a GitHub issue to track adding the context.WithTimeout guard to fetchResourceAttrs so it doesn't get lost?


🧠 Learnings used
Learnt from: CR
Repo: evstack/ev-node PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-20T13:39:38.827Z
Learning: Applies to **/*.go : Use context.Context for cancellation in Go

Learnt from: alpe
Repo: evstack/ev-node PR: 3131
File: block/internal/syncing/syncer_backoff_test.go:241-245
Timestamp: 2026-03-06T09:40:36.029Z
Learning: In evstack/ev-node, the module declares go 1.25.6. Since Go 1.22, loop variables are per-iteration by default, so loop variable capture is not a concern. Do not flag or fix loop variable capture in this codebase for any Go files; this rule applies to all Go files in the repository, not just the specific test file.

@chatton chatton added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
@chatton chatton enabled auto-merge March 30, 2026 14:21
@chatton chatton added this pull request to the merge queue Mar 30, 2026
Merged via the queue into main with commit 5e3db23 Mar 30, 2026
36 checks passed
@chatton chatton deleted the cian/benchmark-results-artifact branch March 30, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants