Skip to content

WIP: reaper goroutine consolidation via pgx.Batch queries#1316

Draft
pashagolub wants to merge 11 commits intomasterfrom
batch_metric_fetching
Draft

WIP: reaper goroutine consolidation via pgx.Batch queries#1316
pashagolub wants to merge 11 commits intomasterfrom
batch_metric_fetching

Conversation

@pashagolub
Copy link
Copy Markdown
Collaborator

The improvements stem from consolidating N per-metric goroutines into a single per-source SourceReaper that uses a GCD-based (Greatest Common Divisor) tick loop and pgx.Batch for batched SQL execution, dramatically reducing goroutine overhead, connection contention, scheduling pressure, and memory allocation churn.

Performance Comparison: Batch Metric Fetching vs Master

Test Environment

Parameter Value
Date 2026-03-19
Host OS Windows, Docker Desktop
PostgreSQL 18.1 (Debian)
Sources 6 (debug preset — 55+ metrics at 30s intervals)
Sink PostgreSQL
Log Level debug
Build Tags pprof (enables /debug/pprof/ on port 6060)
Test Duration ~8 minutes per implementation

Branches Tested

Branch Commit Description
master 393b9529a Baseline — one goroutine per metric per source
batch_metric_fetching 3d70c78a4 New — one SourceReaper goroutine per source, GCD-based tick loop with pgx.Batch

Architecture Comparison

Aspect Master Batch
Metric collection model 1 goroutine per metric per source 1 SourceReaper goroutine per source
SQL execution model Individual queries, one per goroutine Batched queries via pgx.Batch
Scheduling Per-metric time.After in each goroutine Single GCD-based tick loop per source

Goroutine Analysis (via pprof /debug/pprof/goroutine)

Total Goroutine Count

Measurement Master Batch Reduction
Total goroutines 435 28 93.6% (15.5× fewer)

Goroutine Breakdown by Component

Component Master Batch Notes
reapMetricMeasurements 414 0 Replaced by SourceReaper
SourceReaper.Run 0 6 1 per source (new batch model)
pgxpool.backgroundHealthCheck 9 9 Connection pool health (unchanged)
log.BrokerHook.poll 2 2 Log dispatch (unchanged)
sinks.PostgresWriter 2 2 Sink writer goroutines (unchanged)
reaper.Reap (main loop) 1 1 Unchanged
reaper.WriteMeasurements 1 1 Unchanged
webserver.Init 1 1 Web UI (unchanged)
net/http 3 3 HTTP servers (pprof + web)
signal 1 1 OS signal handler
Other/runtime 1 2 Runtime overhead

The batch implementation eliminates 414 per-metric goroutines, replacing them with just 6 per-source goroutines. All infrastructure goroutines (pool health checks, sinks, logging, web server) remain identical between both implementations.

Stack Memory

Metric Master Batch Reduction
Stack in use 6.0 – 6.3 MB 1.4 – 1.6 MB ~74%

The 4× reduction in stack memory directly reflects the goroutine count difference — each goroutine requires its own stack.


Memory Statistics (Go runtime MemStats via logs)

Heap Allocation (Alloc — live objects on the heap)

Time (relative) Master (KB) Batch (KB) Reduction
t=0 (startup) 2,147 2,327
t=2min 29,698 3,983 87%
t=4min 27,257 3,837 86%
t=6min 29,209 3,903 87%
t=8min 17,651 3,865 78%

Cumulative Allocations (TotalAlloc — total bytes allocated over time)

Time (relative) Master (KB) Batch (KB) Reduction
t=0 (startup) 3,326 3,328
t=2min 865,826 207,017 76%
t=4min 1,620,935 294,202 82%
t=6min 2,402,248 382,324 84%
t=8min 3,178,864 474,467 85%

System Memory (Sys — total memory obtained from OS)

Time (relative) Master (KB) Batch (KB) Reduction
t=2min 96,890 28,914 70%
t=8min 96,890 29,170 70%

Heap Profile Analysis (via pprof /debug/pprof/heap)

Metric Master Batch Reduction
HeapAlloc (live) 35.9 – 39.7 MB 5.4 – 5.7 MB ~85%
HeapSys (reserved from OS) 78.1 – 80.0 MB 18.6 – 18.9 MB ~76%
HeapObjects (live objects) 422K – 510K 45K – 47K ~91%
MaxRSS (peak RSS) 99.6 MB 35.0 MB 65%
GCCPUFraction 0.019 – 0.022% 0.006 – 0.008% ~66%

Docker Container Stats (at steady state)

Metric Master Batch Improvement
Container Memory 56.7 – 63.2 MiB 16.4 MiB ~73% (3.5–3.9× less)
CPU Usage 1.84% 0.03% ~98% (61× less)
PIDs 22 17 23%

Allocation Rate Analysis

Calculated from TotalAlloc deltas between measurements (each 2-minute window):

Window Master (KB/min) Batch (KB/min) Ratio
0→2min 431,250 101,845 4.2× less
2→4min 377,555 43,593 8.7× less
4→6min 390,657 44,061 8.9× less
6→8min 388,308 46,072 8.4× less

At steady state (after initial warmup), the batch implementation allocates approximately 8–9× less memory per minute.


GC Pressure Analysis

Metric Master Batch Improvement
GC cycles in 8 min 176 170 Similar
GCCPUFraction 0.019 – 0.022% 0.006 – 0.008% ~66% less
Avg GC pause (from pprof) Higher variance Lower, more consistent

While GC cycle counts are similar, the CPU fraction spent on GC is ~3× lower for batch because each cycle processes far fewer live objects (45K vs 510K).


Summary

The batch_metric_fetching branch delivers significant resource improvements across all dimensions:

Category Improvement Detail
Goroutines 93.6% fewer 28 vs 435 (414 per-metric goroutines eliminated)
Live heap memory ~85% less 5.5 MB vs 37 MB (HeapAlloc via pprof)
Heap objects ~91% less 46K vs 466K live objects
Stack memory ~74% less 1.5 MB vs 6.2 MB
Peak RSS 65% less 35 MB vs 100 MB
System memory ~70% less 29 MB vs 97 MB (Sys)
Container memory ~73% less 16 MiB vs 60 MiB
Allocation rate ~8–9× lower 45 MB/min vs 389 MB/min (steady state)
CPU usage ~98% less 0.03% vs 1.84%
GC CPU fraction ~66% less 0.007% vs 0.020%

@pashagolub pashagolub requested a review from 0xgouda March 19, 2026 18:59
@pashagolub pashagolub self-assigned this Mar 19, 2026
@pashagolub pashagolub added enhancement New feature or request epic Large body of work broken down into a number of smaller issues labels Mar 19, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 19, 2026

Coverage Report for CI Build 24205226414

Coverage increased (+1.1%) to 84.471%

Details

  • Coverage increased (+1.1%) from the base build.
  • Patch coverage: 96 uncovered changes across 3 files (549 of 645 lines covered, 85.12%).
  • 5 coverage regressions across 1 file.

Uncovered Changes

File Changed Covered %
internal/reaper/source_reaper.go 210 163 77.62%
internal/reaper/database.go 394 366 92.89%
internal/reaper/reaper.go 35 14 40.0%

Coverage Regressions

5 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
internal/reaper/reaper.go 5 37.23%

Coverage Stats

Coverage Status
Relevant Lines: 5390
Covered Lines: 4553
Line Coverage: 84.47%
Coverage Strength: 0.96 hits per line

💛 - Coveralls

Comment on lines +42 to +48
func (sr *SourceReaper) activeMetrics() map[string]time.Duration {
sr.md.RLock()
defer sr.md.RUnlock()
src := sr.md.Metrics
if sr.md.IsInRecovery && len(sr.md.MetricsStandby) > 0 {
src = sr.md.MetricsStandby
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of RLock() is confusing me, not that we shouldn't be using it, but are we using Lock() and RLock() in other places that update/read from md.Metrics? From what I can see we don't.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fixed this specific case for md.Metrics and md.MetricsStandby in d26c03d, so now the LoadMetrics() will always grab md.Lock() before updating these structs, and given that the source reaper always uses md.RLock() before accessing them, we are safe.

  2. An important note: there are other places in the code that read from md.Metrics and md.MetricsStandby, but all of them execute in the main reaper routine - which is the only routine that issues a write in LoadMetrics() as stated above - so I saw that there is no need to grab additional md.RLock()[s] for them, as the sequential order guarantees safety.

  3. Also, I noticed that the source RWMutex is rarely used to protect fields on write to them, I doubt if we do have race conditions because of that. This needs careful investigation in a separate issue.

@pashagolub I need your review on this. Specifically, what do you think of point 2 in terms of maintainability? Should we explicitly use md.RLock() so everything is extra clear and hence avoid introducing buggy changes in future updates/PRs? Or document that in a comment that states that the user should add RLock() if the method is to be accessed from another routine or when another writer gets introduced?

Comment on lines +147 to +184
switch {
case name == specialMetricServerLogEventCounts:
if sr.lastFetch[name].IsZero() {
go func() {
if e := sr.runLogParser(ctx); e != nil {
l.WithError(e).Error("log parser error")
}
}()
}
case IsDirectlyFetchableMetric(sr.md, name):
err = sr.fetchOSMetric(ctx, name)
case name == specialMetricChangeEvents || name == specialMetricInstanceUp:
err = sr.fetchSpecialMetric(ctx, name)
default:
metric, ok := metricDefs.GetMetricDef(name)
if !ok {
l.WithField("metric", name).Warning("metric definition not found")
continue
}
if sr.isRoleExcluded(metric) {
continue
}
if cached := sr.reaper.GetMeasurementCache(sr.cacheKey(metric, name)); len(cached) > 0 {
sr.sendEnvelope(name, metric.StorageName, cached)
break
}
sql := metric.GetSQL(sr.md.Version)
if sql == "" {
l.WithField("source", sr.md.Name).WithField("version", sr.md.Version).Warning("no SQL found for metric version")
break
}
batch = append(batch, batchEntry{name: name, metric: metric, sql: sql})
}
if err != nil {
l.WithError(err).WithField("metric", name).Error("failed to fetch metric")
}
sr.lastFetch[name] = now
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency here, for the special metrics the lastFetch map is updated after the metric fetch but for regular metrics its updated directly after batching the query.

The old approach for doing this was that for all metrics the sleep interval is started (corresponding to updating lastFetch here) after the metric is fetched to include the metric query processing time.

pashagolub and others added 11 commits April 9, 2026 19:55
…l metrics

Move the `isRoleExcluded` check earlier in the metric fetch loop to apply
consistently to all metric types, including special metrics like
`instance_up` and `change_events`.

Previously, special metrics weren't excluded if they have
a required `node_status` different than the server's
Co-authored-by: Mazen Kamal <71020170+Mazen050@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request epic Large body of work broken down into a number of smaller issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants