Feat/detailed coordination window diagnostics#3861
Merged
lionakhnazarov merged 9 commits intothreshold-network:mainfrom Feb 5, 2026
Merged
Conversation
- Updated to include a new peer for the sepolia network. - Added timeout handling in to prevent indefinite hangs. - Introduced new system metrics: CPU load, RAM utilization, and swap utilization, with corresponding updates to the performance metrics registration.
- Introduced a new structure to track detailed metrics for individual coordination windows, including timing, success rates, and fault statistics. - Enhanced the coordination layer to record the start and end of coordination windows, as well as wallet-specific coordination details. - Added new metrics for coordination windows, including total wallets coordinated, successful, and failed, along with fault tracking.
- Introduced new metrics for redemption actions, including total executions, success, and failure counts, as well as duration tracking. - Updated the performance metrics registration to include these new redemption metrics. - Refactored existing code to utilize defined constants for metric names, enhancing consistency and readability. - Improved error handling in redemption proof submissions to accurately record failure metrics.
- Updated the and structures to include JSON tags for improved serialization. - Introduced a new structure to capture detailed fault information during coordination. - Enhanced the method to include error messages for failed wallet actions. - Added a new method to retrieve a summary of coordination window metrics. - Registered coordination windows as a diagnostic source in the client info for better monitoring.
- Added a mutex and a map to track peers that have already been pinged to avoid duplicate ping tests. - Updated the connected and disconnected callback functions to manage the pinged peers set, ensuring each unique peer is only pinged once. - Enhanced disconnection handling to allow re-pinging if a peer reconnects later.
lrsaturnino
reviewed
Feb 2, 2026
Member
lrsaturnino
left a comment
There was a problem hiding this comment.
Good work! Some comments pending clarification.
- Refactored metric increment calls in libp2p to utilize constants for peer connections, disconnections, and ping tests. - Enhanced coordination window metrics by adding a mutex for safe access to previous window data across goroutines. - Introduced a cleanup goroutine to ensure the end time of the last coordination window is recorded on shutdown.
- Updated the coordinationExecutor to return a partial result containing leader and faults information when a follower's routine fails, allowing for better metric recording.
…seconds for improved performance tracking. - Added detailed comments to clarify the behavior of CPU utilization sampling and the prevention of double-recording in coordination window metrics.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Detailed Coordination Window Diagnostics
Summary
This PR introduces comprehensive diagnostics and metrics tracking for tBTC coordination windows, significantly enhancing observability into the coordination process. The changes add detailed per-window and per-wallet metrics, improve network diagnostics, and expand performance monitoring capabilities.
A new comprehensive metrics tracking system for coordination windows that provides:
Per-Window Tracking: Each coordination window is tracked with:
Per-Wallet Details: For each wallet coordinated in a window:
Memory Management: Tracks up to 100 recent windows (~25 hours) with automatic cleanup of older windows to prevent unbounded memory growth