Skip to content

optimize LoadSources to avoid full restart on metric or tag changes #1350

Closed
nadashaban11 wants to merge 1 commit intocybertec-postgresql:masterfrom
nadashaban11:optimize-loadsources
Closed

optimize LoadSources to avoid full restart on metric or tag changes #1350
nadashaban11 wants to merge 1 commit intocybertec-postgresql:masterfrom
nadashaban11:optimize-loadsources

Conversation

@nadashaban11
Copy link
Copy Markdown
Contributor

Description

current behavior of pgwatch is to full restart after any update on source. I tried to work on the TODO mentioned in LoadSources to optimize that behavior .

Fixes

First, I was confused about how to divide full restart but I got the following approach:
1- change requires full restart because connection config changed this includes any change in(connStr, Kind, Group, Name, IsEnabled, IncludePattern, ExcludePattern, or OnlyIfMaster) .
In this case no code changes, exact current code

2- change does not require full restart only update with the new changes that will be caught without forcing to reconnect source this includes any change in metrics or custom tags.
To achieve that I made

  • method IsSameConnection in internal/sources/types.go to ensure that no full restart required and tested it in internal/sources/types_test.go
  • in internal/reaper/reaper.go I left current code to do the full restart when IsSameConnection returns false
    but if it true then update all existing metrics and tags with the new updates
  • real challenges were in testing the new behavior. I kept the whole testing as it, since it was covering all cases so what I made is to split it into two tests to reflect code changes, the basic change is in expectCancel I replaced it by: removing it from case 1 as it always should be true
  • in case2: slice of strings expectedCancelled to detect changed metric to be stopped

Behavior before ( when modifying any thing all produce such the following logs)

2026-03-30 17:43:21.600 [INFO] Source configs changed, restarting all gatherers...
2026-03-30 17:43:21.600 [INFO] [metric:instance_up] stopping gatherer...
2026-03-30 17:43:21.600 [INFO] [metric:kpi] stopping gatherer...
2026-03-30 17:43:21.628 [DEBUG] [database:sollam] [host:localhost] [pid:38613] Connect
...

Behavior after

1- when modifying connection config

2026-03-31 16:54:49.259 [INFO] [source:sollam_target] Source configs changed, restarting all gatherers... (reaper/reaper.go:441 reaper.(*Reaper).LoadSources)
2026-03-31 16:54:49.278 [DEBUG] [source:sollam_target] [database:sollam] [host:localhost] [pid:87131] [port:5432] [time:10.757444ms] Connect

2- when modifying metrics or tags
here I changed Metrics preset

2026-03-31 17:02:01.474 [INFO] [source:sollam_target] Metric OR Tag configs changed, updating without full restart (reaper/reaper.go:426 reaper.(*Reaper).LoadSources)
2026-03-31 17:02:01.474 [INFO] [sources:1] sources refreshed (reaper/reaper.go:445 reaper.(*Reaper).LoadSources)
2026-03-31 17:02:01.477 [INFO] [metrics:74] [presets:14] metrics and presets refreshed (reaper/metric.go:93 reaper.(*Reaper).LoadMetrics)
2026-03-31 17:02:01.478 [DEBUG] [source:sollam_target] [pid:87400] [time:230.653µs] Acquire
...

here I changed metric interval

2026-03-31 17:08:41.618 [INFO] [source:sollam_target] Metric OR Tag configs changed, updating without full restart (reaper/reaper.go:426 reaper.(*Reaper).LoadSources)
2026-03-31 17:08:41.618 [INFO] [sources:1] sources refreshed (reaper/reaper.go:445 reaper.(*Reaper).LoadSources)
2026-03-31 17:08:41.623 [INFO] [metrics:74] [presets:14] metrics and presets refreshed (reaper/metric.go:93 reaper.(*Reaper).LoadMetrics)
2026-03-31 17:08:41.623 [DEBUG] [source:sollam_target] [pid:87400] [time:156.719µs] Acquire
...

Note
I tried to describe code changes in comments as possible to better understand while I am working and to easy review

  • I am the human author and take full personal responsibility for every change in this PR.
  • No AI or automated generative tool was used in any part of this PR OR I have disclosed all tool(s) below.

AI/automation tools used (leave blank if none):

discussed with gemini which changes can cause full restart

Checklist

  • Code compiles and existing tests pass locally.
  • New or updated tests are included where applicable.
  • Documentation is updated where applicable.

@pashagolub
Copy link
Copy Markdown
Collaborator

Thanks for your work. Have you checked #1316?

@nadashaban11
Copy link
Copy Markdown
Contributor Author

Thanks for pointing that I really have not checked before, but now I found great enhancements. I will close this PR since it conflicts with the new planned architecture.

@nadashaban11 nadashaban11 deleted the optimize-loadsources branch March 31, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants