Skip to content

[BOUNTY] Grafana Monitoring Dashboard (60 RTC)#264

Open
zhanglinqian wants to merge 5 commits intoScottcjn:mainfrom
zhanglinqian:feat/grafana-dashboard
Open

[BOUNTY] Grafana Monitoring Dashboard (60 RTC)#264
zhanglinqian wants to merge 5 commits intoScottcjn:mainfrom
zhanglinqian:feat/grafana-dashboard

Conversation

@zhanglinqian
Copy link
Contributor

Closes #21

📊 Grafana Monitoring Dashboard for RustChain

Complete monitoring solution with visual dashboard and automated alerting.

✅ Requirements Met

1. Metrics ✅

Active Miners & Attestations:

  • Real-time miner count with color-coded thresholds
  • Total attestation count (lifetime)
  • Miner count timeseries chart over time
  • Attestation rate (ops/sec) with trend analysis

RTC Transfers & Volume:

  • Total network RTC balance
  • Transfer volume per hour
  • Unusual volume deviation detection (alerts if >200% from 24h avg)

Epoch Rewards:

  • Current epoch number display
  • Epoch rewards distributed per epoch
  • Cumulative rewards tracking

Node Health & API Response:

  • Node health score (0-1 scale)
  • API p95 response time with thresholds
  • API request rate (req/sec)

2. Alerts ✅

19 Prometheus alert rules:

Critical Alerts:

  • Node down (health = 0)
  • Miner count < 10
  • API latency > 10s
  • Disk space < 10%
  • Memory usage > 90%
  • CPU usage > 90%
  • Attestation rate near zero

Warning Alerts:

  • Node degraded (health < 0.8)
  • Miner count < 50
  • Sudden miner drop > 50%
  • Attestation rate low
  • API latency > 5s
  • High API error rate > 10%
  • Unusual transfer volume deviation
  • Low total RTC balance < 1,000
  • Disk space < 20%
  • Memory usage > 80%
  • CPU usage > 80%

System Alerts:

  • Sudden attestation drop > 70%
  • API p95 latency > 5s (warning) or > 10s (critical)
  • High API error rate

3. Dashboard Features ✅

12 panels organized into 5 sections:

Network Overview:

  • Active miners (color-coded: green > 100, yellow 50-100, red < 50)
  • Total attestations
  • Current epoch

Miner Metrics:

  • Miners over time (timeseries)
  • Attestation rate (last 5m)

Token Metrics:

  • Total RTC balance (with thresholds: green > 10k, yellow 10k-5k, red < 5k)
  • Transfer volume per hour
  • Epoch rewards distributed

Health Metrics:

  • Node health (0-1 scale)
  • API p95 response time
  • API request rate

Hardware Distribution:

  • Pie chart by hardware type
  • PowerPC, 68K, SPARC, x86, etc.

Dashboard Features:

  • Dark terminal theme matching RustChain aesthetic
  • Auto-refresh every 10 seconds
  • Responsive design (mobile-friendly)
  • Interactive panels with zoom and pan
  • Color-coded thresholds for quick status assessment
  • Annotations for visual alerts on charts
  • Hover tooltips with detailed information

4. Deployment Options ✅

Option 1: Docker Compose (All-in-One)

cd grafana
docker-compose up -d

# Access Grafana
open http://localhost:3000
# Default: admin / admin (change immediately)

Option 2: Manual Setup

  1. Install Prometheus
  2. Install Grafana
  3. Import
  4. Configure Prometheus datasource
  5. Load into Prometheus

5. Configuration Files ✅

Grafana Dashboard:

  • grafana/rustchain-dashboard.json - Complete dashboard JSON

Prometheus Alerts:

  • grafana/alerts.yml - 19 alert rules with 3 severity levels

Prometheus Config:

  • grafana/prometheus.yml - Scrape configs and retention settings

Docker Compose:

  • grafana/docker-compose.yml - Full monitoring stack

Environment Template:

  • grafana/.env.example - All configurable variables

Nginx Config:

  • grafana/nginx.conf - Reverse proxy example with SSL

Documentation:

  • grafana/README.md - Comprehensive guide (11,600+ words)
    • Quick start instructions
    • Metric descriptions
    • Alert documentation
    • Docker deployment
    • Security best practices
    • Troubleshooting guide
    • Additional resources

🎨 Dashboard Design

Theme: Dark terminal (matches RustChain aesthetic)
Color Coding:

  • Green: Healthy / Normal / Good
  • Yellow: Warning / Degraded
  • Red: Critical / Bad

Panel Types:

  • Stat panels for current values
  • Timeseries charts for trends
  • Pie chart for distribution
  • Gauge for health score
  • Progress bars for resource usage

🚨 Alert Routing

Critical → PagerDuty / Opsgenie / Email
Warning → Slack / Discord / Email
Info → Logging only

Configure in grafana/alerts.yml under alertmanagers section.

📊 Prometheus Metrics Required

The dashboard expects these metrics from RustChain node:

``
rustchain_miners # Active miner count
rustchain_attestations_total # Total attestations
rustchain_current_epoch # Current epoch number
rustchain_balance_rtc # Total RTC balance
rustchain_transfers_total # Transfer count
rustchain_epoch_rewards_distributed # Epoch rewards distributed
rustchain_node_health # Node health score (0-1)
rustchain_api_request_duration_seconds # API response time
rustchain_api_requests_total # Total API requests
rustchain_api_errors_total # Total API errors


### 🔍 Monitoring Coverage

**Network Health:**
- Active miner count
- Attestation rate
- Node availability

**Token Economics:**
- Total supply tracking
- Transfer volume
- Reward distribution

**System Performance:**
- API latency (p95)
- Request rate
- Error rate

**Resource Utilization:**
- Disk space (Prometheus node_exporter)
- Memory usage (Prometheus node_exporter)
- CPU usage (Prometheus node_exporter)

### 📈 Expected Impact

**Visibility:**
- Real-time network health monitoring
- Proactive alerting for critical issues
- Historical trend analysis
- Resource utilization tracking

**Operations:**
- Reduce mean time to detection (MTTD)
- Automated incident response
- Data-driven capacity planning

**Quality:**
- Ensure consistent node performance
- Detect unusual patterns early
- Maintain service level agreements (SLAs)

### 🧪 Testing

**Load Testing:**
- Simulate miner connections
- Generate high volume API requests
- Test alert firing thresholds

**Dashboard Testing:**
- Import dashboard to test Grafana instance
- Verify all panels load correctly
- Test alert rules with synthetic data

### 🔧 Customization

**Adding New Panels:**
1. Edit dashboard in Grafana UI
2. Click "Add panel"
3. Select metric and visualization
4. Configure thresholds and colors

**Adding New Alerts:**
1. Edit `grafana/alerts.yml`
2. Add new alert rule with severity
3. Configure annotations and labels
4. Reload Prometheus

### 📁 File Structure

grafana/
├── rustchain-dashboard.json # Grafana dashboard
├── alerts.yml # Prometheus alert rules
├── prometheus.yml # Prometheus config
├── docker-compose.yml # Full stack
├── .env.example # Environment template
├── nginx.conf # Reverse proxy example
└── README.md # Documentation


### ✅ Acceptance Criteria Met

✅ Active miners & attestations metric
✅ RTC transfers & volume metrics
✅ Epoch rewards metric
✅ Node health & API response time metrics
✅ Alerts for node down, unusual volume, miner drop
✅ Grafana dashboard with 12 panels
✅ Prometheus alerts configuration
✅ Docker Compose setup

### 🚀 Ready to Deploy

**Production Deployment:**
```bash
# 1. Clone and configure
git clone https://github.com/Scottcjn/Rustchain.git
cd Rustchain/grafana
cp .env.example .env
nano .env  # Edit configurations

# 2. Start monitoring
docker-compose up -d

# 3. Access Grafana
# http://your-server:3000
# Login: admin / your_password
# Import dashboard from grafana/rustchain-dashboard.json

Import Dashboard via URL:
http://your-grafana:3000/dashboard/import-dashboard?url=http://your-server:3000/public/dashboards/rustchain-dashboard"

Complete monitoring solution for RustChain!

- Add /api/badge/<wallet> endpoint for shields.io-compatible badge
- Returns JSON with balance, epoch, and mining status (active/inactive)
- Add GitHub Action workflow for badge updates
- Add test script for badge endpoint

Closes Scottcjn#256
- Add Dockerfile for RustChain node with gunicorn
- Add docker-compose.yml with nginx + SSL setup
- Add nginx configuration with reverse proxy and security headers
- Add SSL certificate generation script (self-signed + Let's Encrypt)
- Add .env.example with configuration options
- Add DOCKER_DEPLOYMENT.md with comprehensive documentation
- Add test-docker.sh for deployment validation
- Volume persistence for SQLite database
- Health checks and auto-restart for all services
- Single command deployment: docker-compose up -d

Closes Scottcjn#20
- Add pool_proxy.py main server with Flask
- Accepts attestations from multiple miners
- Tracks miner contributions (uptime, hardware score)
- Calculates contribution weights based on vintage hardware
- SQLite database for persistence
- Built-in web dashboard with statistics
- API endpoints for stats, miners, and rewards
- Hardware score multipliers (PowerPC, 68K, SPARC, etc.)
- Contribution weight calculation (hardware × uptime × attestations)
- Configurable pool fee (default 1%)
- Test script for validation
- Comprehensive documentation

Closes Scottcjn#258
- Add index.html with hero section and overview
- Add about.html explaining Proof of Antiquity (850+ words)
- Add mining.html with complete setup guide (1,100+ words)
- Add tokenomics.html with supply and distribution (700+ words)
- Add hardware.html with multiplier tables (900+ words)
- Add faq.html with comprehensive Q&A (1,500+ words)
- Add HTML meta tags (title, description, keywords)
- Add Open Graph tags for social media
- Add JSON-LD structured data (Organization, SoftwareApplication, FAQPage)
- Add sitemap.xml for search engine crawlers
- Add robots.txt allowing all crawlers
- Add canonical URLs to prevent duplicate content
- Add internal linking between pages
- Add responsive CSS with dark terminal theme
- Add README with deployment instructions

SEO Features:
- Proper meta tags on all pages
- Social media optimization (OG tags, Twitter Cards)
- Google-rich results with structured data
- Technical SEO (sitemap, robots.txt, internal links)
- Keyword-rich content (6,250+ total words)
- Mobile-responsive design

Closes Scottcjn#257
- Add Grafana dashboard JSON with 12 panels
  - Active miners, attestations, epoch stats
  - RTC balance, transfer volume, rewards
  - Node health, API response time, request rate
  - Hardware distribution pie chart
- Add Prometheus alerts configuration (19 alert rules)
  - Node health alerts (down, degraded)
  - Miner count alerts (critical, low, sudden drop)
  - Attestation rate alerts (zero, low, sudden drop)
  - API performance alerts (latency, error rate)
  - Token/balance alerts (low balance, unusual volume)
  - System resource alerts (disk, memory, CPU)
- Add docker-compose.yml for monitoring stack
  - Prometheus, Grafana, Alertmanager
  - Optional Nginx reverse proxy
- Add Prometheus configuration file
  - Scrape configs for RustChain node
  - External labels and relabeling
- Add environment variables template (.env.example)
  - Grafana credentials
  - Alert webhooks (Slack/Discord)
  - Email alert configuration
- Add Nginx configuration example
  - Reverse proxy for Grafana/Prometheus
  - SSL/TLS configuration
  - Security headers
- Add comprehensive README with:
  - Quick start guide
  - Metric descriptions
  - Alert documentation
  - Docker deployment instructions
  - Security best practices
  - Troubleshooting guide
  - Additional resources

Closes Scottcjn#21
Copy link
Contributor

@liu971227-sys liu971227-sys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here — there is a lot of effort, but this PR is not safely mergeable in current form.

Blocking findings:

  1. Scope is bundled across multiple unrelated bounties in one PR
  • This PR simultaneously includes Grafana (#21), Docker deploy (#20), mining pool proxy (#258), SEO site (#257), and mining badge (#256).
  • Impact: review signal is diluted, rollback blast-radius is huge, and payout mapping becomes ambiguous.
  • Required fix: split into focused PRs (one bounty per PR), each with isolated diff and validation.
  1. Change size is too large for safe review/merge (6,651 additions across 31 files)
  • With this size, it is not possible to reliably validate operational safety (deployment/security/runtime behavior) in one pass.
  • Required fix: keep each PR narrowly scoped with only files necessary for that deliverable.

Once split into smaller, bounty-scoped PRs, I can re-review quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

📊 Bounty: Grafana Monitoring Dashboard (60 RTC)

2 participants