Feature/nemoclaw env injection by welshDog · Pull Request #43 · welshDog/HyperCode-V2.0

welshDog · 2026-03-19T23:50:18Z

No description provided.

- Add docker-compose.nim.yml for optional NVIDIA NIM service - Add NemoClaw validation scripts for bash and PowerShell - Add integration documentation detailing NemoClaw setup and usage - Add health status report documenting current system state and recent corrections

- Pin FastAPI and Uvicorn versions in Dockerfile for reproducibility - Run container as non-root user for improved security - Add structured logging, request middleware, and health check with uptime - Expose agent on port 8013 and add Docker healthcheck configuration - Set resource limits, restart policy, and drop Linux capabilities - Add ANTHROPIC_API_KEY environment variable for LLM integration - Include comprehensive analysis document for future reference

- Move time import to consistent location with other imports - Fix indentation of FastAPI endpoint decorators to be properly aligned - Ensure health check endpoint correctly returns uptime in all cases - Reorder graceful shutdown to log before exiting - Remove duplicate uvicorn.run call and redundant comment

Add three documentation files with advanced system recommendations: - RECOMMENDATIONS_SUMMARY.txt: Executive summary and tiered implementation roadmap - QUICK_START.sh: Quick reference guide with verification commands - ADVANCED_RECOMMENDATIONS.md: Detailed implementation guides for 15 features - FINAL_STATUS_AND_ROADMAP.md: Achievement summary and learning outcomes These documents provide production-ready implementation patterns for observability, resilience, and scalability features to elevate the system from functional to enterprise-grade.

Add prometheus-client dependency and expose /metrics endpoint with request count and duration metrics. Update Prometheus configuration to scrape the new endpoint. This enables monitoring of test-agent performance and request patterns.

…for test-agent - Add OpenTelemetry dependencies and instrumentation to test-agent Dockerfile and main.py - Configure environment variables for OTLP exporter and endpoint in docker-compose.yml - Implement telemetry setup with service name, environment, and conditional disabling - Add comprehensive Grafana dashboard for monitoring request rates, latency, and error rates

- Add Grafana Cloud Docker Compose and Prometheus configuration template - Update .gitignore to exclude local Grafana Cloud credentials - Refactor PowerShell scripts to use relative paths and improved error handling - Enhance NemoClaw validation to check sandbox registration status

- Reorder imports to follow standard library → third-party convention - Add type hints for middleware and request handlers - Use lazy formatting in logging calls for better performance - Fix environment variable parsing to ensure correct type handling - Add docstrings to functions for better documentation - Update signal handler to ignore unused arguments

Add a new document outlining the concept for a future "Throttle Agent". The document describes the agent's purpose as a smart resource manager for the Docker stack, including its proposed functionality, a detailed priority order for service startup and shutdown, and health check gates. This serves as foundational planning documentation for a potential future feature.

Introduce a new agent service that monitors system RAM usage and provides automated throttling capabilities for Docker containers based on configured priority tiers. The agent exposes REST endpoints for health checks, metrics, tier status monitoring, and manual throttling operations. - Add FastAPI service with Docker client integration - Implement tier-based container prioritization system - Add Prometheus metrics for monitoring and observability - Configure Docker Compose service with health checks and resource limits - Provide automated decision-making based on RAM usage thresholds - Support manual pause/resume operations per tier

- Implement automatic tier pausing/resuming based on RAM thresholds - Add HTTP integration with healer-agent to prevent healing of paused containers - Introduce configurable protected tiers and containers - Replace simple threshold variables with tier-specific pause/resume logic - Add background autopilot loop with configurable polling interval

- Set AUTO_THROTTLE_ENABLED to true in docker-compose - Add endpoint to query circuit breaker status for a specific agent - Check healer circuit breaker state before resuming containers to prevent resuming unhealthy agents

- Add Prometheus scrape config for throttle-agent - Implement new Prometheus metrics for throttle state, paused containers, circuit breakers, and RAM thresholds - Create comprehensive Grafana dashboard to visualize throttle and healer autopilot state - Integrate metrics updates into existing throttle-agent polling cycles

…ctured logging and predictive metrics - Add JSON logging for better Loki integration and structured audit trails - Implement 8 new Prometheus metrics for decision tracking and container stats - Introduce predictive RAM analysis with 5-minute trend forecasting - Extend health check timeouts and retries to improve reliability - Add comprehensive error logging throughout all key functions

Align API paths and compose commands; add docs inventory, changelog, and review checklist.

Checks canonical Markdown links, enforces 'docker compose' terminology, and requires docs changelog updates when canonical docs change.

Run github-mcp-server in HTTP mode so the container stays alive under Docker Compose.

Use github-mcp-server http on port 8082 to prevent stdio EOF restarts under Compose.

Wire GitHub MCP services to GITHUB_TOKEN_PRIMARY to match the repo .env naming.

Add guidance to avoid sharing .env secrets; keep docs changelog in sync; standardize compose commands in the 2026-03-19 status note.

- Add .env1 to gitignore for consistency - Enhance .env.example with new environment variables and better organization - Update Makefile to use unified docker-compose.yml with profiles - Add network-init target to ensure Docker network exists - Improve setup instructions and add .env creation from example

- Add prometheus-fastapi-instrumentator dependency for automatic metrics exposure - Create comprehensive metrics module with 12 core metrics for Hyper Agents, LLM, Healer, and Agent X - Instrument all agent FastAPI applications to expose /metrics endpoint - Update Prometheus configuration to scrape backend metrics - Provide helper functions for tracking agent requests, LLM calls, and other operations

- Pins all dependencies to exact versions for reproducibility - Adds numerous new packages for expanded functionality (ML, monitoring, etc.) - Updates existing packages to newer versions for security and features

…lity Introduce Grafana Agent as a unified shipper for metrics, logs, and traces to Grafana Cloud. This replaces manual remote write configurations and consolidates observability data pipelines into a single service. The agent scrapes Prometheus metrics, collects Docker container logs, and is configured to forward to Grafana Cloud Prometheus, Loki, and Tempo endpoints. Also updates the Docker Compose file to include both Prometheus and the Grafana Agent service with necessary environment variables and volume mounts.

- Update environment variable names from ANTHROPIC_API_KEY to PERPLEXITY_API_KEY - Modify configuration files, scripts, and documentation throughout codebase - Update imports and client initialization in agent code - Fix broken logger statement in healer agent - Add shared metrics module for observability

- Update environment variable from ANTHROPIC_API_KEY to PERPLEXITY_API_KEY in .env.example and documentation - Fix missing newlines at end of multiple files - Remove UTF-8 BOM from test markdown file

Add four Grafana AI investigation reports documenting observability gaps in the HyperCode infrastructure and Grafana Assistant App. The reports highlight missing metrics, logs, and traces, providing detailed findings and remediation recommendations to establish a functional monitoring pipeline.

- Remove explicit version from docker-compose.monitoring.yml as it's optional in newer Docker Compose versions - Remove trailing whitespace and clean up .gitignore formatting

- Remove duplicate PERPLEXITY_API_KEY environment variable for hypercode-worker service - Update AGENT_PORT from 8008 to 8010 for hypercode-agent-healer service

- Add integration test suite for agent crew with live orchestrator - Mock enabled_agent_keys setting in unit tests to match agent configuration - Simplify nested context managers in health monitor tests

Add two comprehensive documentation files detailing the Hyper Agent ecosystem: - "🦅 THE HYPER AGENT LIFE PLAN": Describes the purpose, roles, and communication principles for all agents in the system. - "this is the Hyper Agent.md": Provides a technical guide for inter-agent communication, including shared me

Introduces modular YAML life plans for all agents, detailing identity, communication, collaboration, work pipeline, play framework, and future-proofing. This foundational document provides each agent with a structured identity, ethical charter, and evolution path, enabling coordinated multi-agent operations and autonomous ecosystem development.

…mpose - Replace multi-stage build with simpler single-stage Dockerfile using Python 3.12 - Add non-root user for security hardening and optimize layer caching - Create comprehensive docker-compose.agents.yml with full agent stack - Define service dependencies, health checks, and networking for all 22 agents - Update health check to use curl with configurable PORT environment variable

Add three detailed architectural review documents to the repository: - HyperCode V2.0 — Complete Architectural Audit - HyperCode V2.0 DEEPSEEK Review - HyperCode V2.0 — Comprehensive Architectural GEMI Review These reports provide critical analysis of the current codebase, identifying strengths, weaknesses, and prioritized action items across neurodivergent-first design, multi-paradigm implementation, AI agent architecture, and technical debt. They serve as a foundation for future architectural improvements and project planning.

Add Tempo monitoring configuration file with OTLP receiver endpoints and local storage settings. Fix TypeScript type annotation for drag event in AgentLibrary component to resolve implicit 'any' warning.

Fix incorrect TypeScript syntax in drag event handler by replacing generic type annotation with proper parameter typing. Add comprehensive environment configuration file with secrets redacted, including AI providers, database, Redis, frontend, observability, and service integrations for production deployment.

Fix incorrect type annotation for React drag event in AgentLibrary component. The event handler was incorrectly typed with a malformed generic syntax, causing TypeScript errors. Change to proper type casting to ensure type safety while maintaining drag functionality.

Fix TypeScript error by properly casting event to React.DragEvent<HTMLDivElement> instead of React.DragEvent

…x in AgentLibrary - Update onDragStart parameter type from React.DragEvent to React.DragEvent<HTMLDivElement> - Fix broken backgroundColor value in motion.whileHover by replacing template literal with static rgba value - Restructure component nesting to ensure proper hover tooltip positioning - Remove unnecessary type casting in onDragStart handler

The hover background color in the AgentLibrary component had a malformed escape sequence (\15) instead of a valid RGBA value. This was causing a syntax error that prevented the component from rendering correctly. Also remove the obsolete fix_agentlib.py script that was previously used to patch this issue.

- frontend-specialist.yaml: UI/UX architect (React, components) - backend-specialist.yaml: API engineer (REST, async) - database-architect.yaml: Data designer (schema, optimization) - qa-engineer.yaml: Test guardian (test generation, validation) - devops-engineer.yaml: Infrastructure expert (CI/CD, deployment) - security-engineer.yaml: Security guardian (audits, scanning) - system-architect.yaml: System designer (architecture, scalability) - healer-agent.yaml: Recovery expert (health monitoring, auto-restart) - test-agent.yaml: Validator (integration tests, smoke tests) - throttle-agent.yaml: Resource guardian (CPU/memory monitoring) - + existing: hypercode-core.yaml, crew-orchestrator.yaml Each life-plan includes: - Service identity & codename - Responsibilities & dependencies - SLOs (latency, error rate, availability) - 2-4 failure modes with MTTR & recovery steps - Key metrics to monitor & alert thresholds - Deployment config (image, resources, healthcheck) - On-call playbooks for common incidents - Contact & escalation info Total: 112KB, 12 comprehensive service runbooks All agents now have living documentation for ops/on-call

Update the fetchAgents function to use the correct API endpoint and include the required authorization header. Also add a .dockerignore file to the dashboard agent directory to exclude unnecessary files from Docker builds.

welshDog added 30 commits March 17, 2026 22:55

feat(dashboard): add /api/health and standalone safety net

6cedeac

chore(dashboard): self-heal static assets on container start

186a014

chore(nemoclaw): verify NVIDIA_API_KEY loading safely

07dea3b

chore(nemoclaw): add safe key connectivity check

978b755

ci: block committed NVIDIA_API_KEY

00df0b4

chore(nemoclaw): add installer wrappers with audit logs

b89da7c

chore(nemoclaw): make install/validate/keycheck auditable

6b5b38a

chore(nemoclaw): add onboarding wrapper

5e0007b

chore(nemoclaw): make scripts find nvm bin automatically

76d6f26

feat(test-agent): add Prometheus metrics endpoint

de027fa

Add prometheus-client dependency and expose /metrics endpoint with request count and duration metrics. Update Prometheus configuration to scrape the new endpoint. This enables monitoring of test-agent performance and request patterns.

feat(throttle-agent): harden telemetry and protect control endpoint

2aafd2b

docs: synchronize docs and add maintenance process

1ff732d

Align API paths and compose commands; add docs inventory, changelog, and review checklist.

ci(docs): add docs-check workflow

29123ec

Checks canonical Markdown links, enforces 'docker compose' terminology, and requires docs changelog updates when canonical docs change.

fix(mcp): stop mcp-github stdio restart loop

90fb37f

Run github-mcp-server in HTTP mode so the container stays alive under Docker Compose.

fix(mcp): run mcp-github in HTTP mode

e1d11de

Use github-mcp-server http on port 8082 to prevent stdio EOF restarts under Compose.

fix(mcp): use primary GitHub token variable

7f50fab

Wire GitHub MCP services to GITHUB_TOKEN_PRIMARY to match the repo .env naming.

docs: tighten secret handling and update status note

2f4145e

Add guidance to avoid sharing .env secrets; keep docs changelog in sync; standardize compose commands in the 2026-03-19 status note.

welshDog added 30 commits March 22, 2026 11:38

build: update backend dependencies with exact versions

d7ab30a

- Pins all dependencies to exact versions for reproducibility - Adds numerous new packages for expanded functionality (ML, monitoring, etc.) - Updates existing packages to newer versions for security and features

🔑 swap ANTHROPIC_API_KEY → PERPLEXITY_API_KEY across all files

b77e941

chore: rename anthropic api key to perplexity in configs and docs

61b9ea6

- Update environment variable from ANTHROPIC_API_KEY to PERPLEXITY_API_KEY in .env.example and documentation - Fix missing newlines at end of multiple files - Remove UTF-8 BOM from test markdown file

chore: remove version and clean up ignored files

89979c2

- Remove explicit version from docker-compose.monitoring.yml as it's optional in newer Docker Compose versions - Remove trailing whitespace and clean up .gitignore formatting

fix: confirm .venv_broken excluded from tracking

5aca961

security: initialise secrets baseline scan

52fde8f

chore(docker-compose): remove duplicate env var and update agent port

39d2d19

- Remove duplicate PERPLEXITY_API_KEY environment variable for hypercode-worker service - Update AGENT_PORT from 8008 to 8010 for hypercode-agent-healer service

test: add integration tests and fix test mocks

f452de1

- Add integration test suite for agent crew with live orchestrator - Mock enabled_agent_keys setting in unit tests to match agent configuration - Simplify nested context managers in health monitor tests

docs: add observability delivery summary (week 1 complete)

869c99c

docs: add executive summary (what you asked for vs. what you got)

eecd77a

docs: add comprehensive README for observability stack

468ca64

feat: NemoClaw clean sweep - 100/100 S-LEGENDARY 🦅

24b785b

feat: add Tempo configuration and fix TypeScript event type

197ee36

Add Tempo monitoring configuration file with OTLP receiver endpoints and local storage settings. Fix TypeScript type annotation for drag event in AgentLibrary component to resolve implicit 'any' warning.

fix(dashboard): correct drag event type in AgentLibrary component

1cc7e0a

Fix TypeScript error by properly casting event to React.DragEvent<HTMLDivElement> instead of React.DragEvent

docs: add life-plan completion summary (all 12 agents documented)

9a4cf9b

fix(dashboard): update agent fetch endpoint and add auth header

3e9b0f6

Update the fetchAgents function to use the correct API endpoint and include the required authorization header. Also add a .dockerignore file to the dashboard agent directory to exclude unnecessary files from Docker builds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/nemoclaw env injection#43

Feature/nemoclaw env injection#43
welshDog wants to merge 84 commits intomainfrom
feature/nemoclaw-env-injection

welshDog commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

welshDog commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant