Skip to content

Feature/nemoclaw env injection#43

Open
welshDog wants to merge 84 commits intomainfrom
feature/nemoclaw-env-injection
Open

Feature/nemoclaw env injection#43
welshDog wants to merge 84 commits intomainfrom
feature/nemoclaw-env-injection

Conversation

@welshDog
Copy link
Owner

No description provided.

welshDog added 30 commits March 17, 2026 22:55
- Add docker-compose.nim.yml for optional NVIDIA NIM service
- Add NemoClaw validation scripts for bash and PowerShell
- Add integration documentation detailing NemoClaw setup and usage
- Add health status report documenting current system state and recent corrections
- Pin FastAPI and Uvicorn versions in Dockerfile for reproducibility
- Run container as non-root user for improved security
- Add structured logging, request middleware, and health check with uptime
- Expose agent on port 8013 and add Docker healthcheck configuration
- Set resource limits, restart policy, and drop Linux capabilities
- Add ANTHROPIC_API_KEY environment variable for LLM integration
- Include comprehensive analysis document for future reference
- Move time import to consistent location with other imports
- Fix indentation of FastAPI endpoint decorators to be properly aligned
- Ensure health check endpoint correctly returns uptime in all cases
- Reorder graceful shutdown to log before exiting
- Remove duplicate uvicorn.run call and redundant comment
Add three documentation files with advanced system recommendations:
- RECOMMENDATIONS_SUMMARY.txt: Executive summary and tiered implementation roadmap
- QUICK_START.sh: Quick reference guide with verification commands
- ADVANCED_RECOMMENDATIONS.md: Detailed implementation guides for 15 features
- FINAL_STATUS_AND_ROADMAP.md: Achievement summary and learning outcomes

These documents provide production-ready implementation patterns for observability, resilience, and scalability features to elevate the system from functional to enterprise-grade.
Add prometheus-client dependency and expose /metrics endpoint with request
count and duration metrics. Update Prometheus configuration to scrape the
new endpoint. This enables monitoring of test-agent performance and request
patterns.
…for test-agent

- Add OpenTelemetry dependencies and instrumentation to test-agent Dockerfile and main.py
- Configure environment variables for OTLP exporter and endpoint in docker-compose.yml
- Implement telemetry setup with service name, environment, and conditional disabling
- Add comprehensive Grafana dashboard for monitoring request rates, latency, and error rates
- Add Grafana Cloud Docker Compose and Prometheus configuration template
- Update .gitignore to exclude local Grafana Cloud credentials
- Refactor PowerShell scripts to use relative paths and improved error handling
- Enhance NemoClaw validation to check sandbox registration status
- Reorder imports to follow standard library → third-party convention
- Add type hints for middleware and request handlers
- Use lazy formatting in logging calls for better performance
- Fix environment variable parsing to ensure correct type handling
- Add docstrings to functions for better documentation
- Update signal handler to ignore unused arguments
Add a new document outlining the concept for a future "Throttle Agent".
The document describes the agent's purpose as a smart resource manager
for the Docker stack, including its proposed functionality, a detailed
priority order for service startup and shutdown, and health check gates.
This serves as foundational planning documentation for a potential
future feature.
Introduce a new agent service that monitors system RAM usage and provides automated throttling capabilities for Docker containers based on configured priority tiers. The agent exposes REST endpoints for health checks, metrics, tier status monitoring, and manual throttling operations.

- Add FastAPI service with Docker client integration
- Implement tier-based container prioritization system
- Add Prometheus metrics for monitoring and observability
- Configure Docker Compose service with health checks and resource limits
- Provide automated decision-making based on RAM usage thresholds
- Support manual pause/resume operations per tier
- Implement automatic tier pausing/resuming based on RAM thresholds
- Add HTTP integration with healer-agent to prevent healing of paused containers
- Introduce configurable protected tiers and containers
- Replace simple threshold variables with tier-specific pause/resume logic
- Add background autopilot loop with configurable polling interval
- Set AUTO_THROTTLE_ENABLED to true in docker-compose
- Add endpoint to query circuit breaker status for a specific agent
- Check healer circuit breaker state before resuming containers to prevent resuming unhealthy agents
- Add Prometheus scrape config for throttle-agent
- Implement new Prometheus metrics for throttle state, paused containers, circuit breakers, and RAM thresholds
- Create comprehensive Grafana dashboard to visualize throttle and healer autopilot state
- Integrate metrics updates into existing throttle-agent polling cycles
…ctured logging and predictive metrics

- Add JSON logging for better Loki integration and structured audit trails
- Implement 8 new Prometheus metrics for decision tracking and container stats
- Introduce predictive RAM analysis with 5-minute trend forecasting
- Extend health check timeouts and retries to improve reliability
- Add comprehensive error logging throughout all key functions
Align API paths and compose commands; add docs inventory, changelog, and review checklist.
Checks canonical Markdown links, enforces 'docker compose' terminology, and requires docs changelog updates when canonical docs change.
Run github-mcp-server in HTTP mode so the container stays alive under Docker Compose.
Use github-mcp-server http on port 8082 to prevent stdio EOF restarts under Compose.
Wire GitHub MCP services to GITHUB_TOKEN_PRIMARY to match the repo .env naming.
Add guidance to avoid sharing .env secrets; keep docs changelog in sync; standardize compose commands in the 2026-03-19 status note.
welshDog added 30 commits March 22, 2026 11:38
- Add .env1 to gitignore for consistency
- Enhance .env.example with new environment variables and better organization
- Update Makefile to use unified docker-compose.yml with profiles
- Add network-init target to ensure Docker network exists
- Improve setup instructions and add .env creation from example
- Add prometheus-fastapi-instrumentator dependency for automatic metrics exposure
- Create comprehensive metrics module with 12 core metrics for Hyper Agents, LLM, Healer, and Agent X
- Instrument all agent FastAPI applications to expose /metrics endpoint
- Update Prometheus configuration to scrape backend metrics
- Provide helper functions for tracking agent requests, LLM calls, and other operations
- Pins all dependencies to exact versions for reproducibility
- Adds numerous new packages for expanded functionality (ML, monitoring, etc.)
- Updates existing packages to newer versions for security and features
…lity

Introduce Grafana Agent as a unified shipper for metrics, logs, and traces to Grafana Cloud. This replaces manual remote write configurations and consolidates observability data pipelines into a single service. The agent scrapes Prometheus metrics, collects Docker container logs, and is configured to forward to Grafana Cloud Prometheus, Loki, and Tempo endpoints.

Also updates the Docker Compose file to include both Prometheus and the Grafana Agent service with necessary environment variables and volume mounts.
- Update environment variable names from ANTHROPIC_API_KEY to PERPLEXITY_API_KEY
- Modify configuration files, scripts, and documentation throughout codebase
- Update imports and client initialization in agent code
- Fix broken logger statement in healer agent
- Add shared metrics module for observability
- Update environment variable from ANTHROPIC_API_KEY to PERPLEXITY_API_KEY in .env.example and documentation
- Fix missing newlines at end of multiple files
- Remove UTF-8 BOM from test markdown file
Add four Grafana AI investigation reports documenting observability gaps in the HyperCode infrastructure and Grafana Assistant App. The reports highlight missing metrics, logs, and traces, providing detailed findings and remediation recommendations to establish a functional monitoring pipeline.
- Remove explicit version from docker-compose.monitoring.yml as it's optional in newer Docker Compose versions
- Remove trailing whitespace and clean up .gitignore formatting
- Remove duplicate PERPLEXITY_API_KEY environment variable for hypercode-worker service
- Update AGENT_PORT from 8008 to 8010 for hypercode-agent-healer service
- Add integration test suite for agent crew with live orchestrator
- Mock enabled_agent_keys setting in unit tests to match agent configuration
- Simplify nested context managers in health monitor tests
Add two comprehensive documentation files detailing the Hyper Agent ecosystem:
- "🦅 THE HYPER AGENT LIFE PLAN": Describes the purpose, roles, and communication principles for all agents in the system.
- "this is the Hyper Agent.md": Provides a technical guide for inter-agent communication, including shared me
Introduces modular YAML life plans for all agents, detailing identity, communication, collaboration, work pipeline, play framework, and future-proofing. This foundational document provides each agent with a structured identity, ethical charter, and evolution path, enabling coordinated multi-agent operations and autonomous ecosystem development.
…mpose

- Replace multi-stage build with simpler single-stage Dockerfile using Python 3.12
- Add non-root user for security hardening and optimize layer caching
- Create comprehensive docker-compose.agents.yml with full agent stack
- Define service dependencies, health checks, and networking for all 22 agents
- Update health check to use curl with configurable PORT environment variable
Add three detailed architectural review documents to the repository:
- HyperCode V2.0 — Complete Architectural Audit
- HyperCode V2.0 DEEPSEEK Review
- HyperCode V2.0 — Comprehensive Architectural GEMI Review

These reports provide critical analysis of the current codebase, identifying strengths, weaknesses, and prioritized action items across neurodivergent-first design, multi-paradigm implementation, AI agent architecture, and technical debt. They serve as a foundation for future architectural improvements and project planning.
Add Tempo monitoring configuration file with OTLP receiver endpoints and local storage settings.
Fix TypeScript type annotation for drag event in AgentLibrary component to resolve implicit 'any' warning.
Fix incorrect TypeScript syntax in drag event handler by replacing generic type annotation with proper parameter typing.

Add comprehensive environment configuration file with secrets redacted, including AI providers, database, Redis, frontend, observability, and service integrations for production deployment.
Fix incorrect type annotation for React drag event in AgentLibrary component. The event handler was incorrectly typed with a malformed generic syntax, causing TypeScript errors. Change to proper type casting to ensure type safety while maintaining drag functionality.
Fix TypeScript error by properly casting event to React.DragEvent<HTMLDivElement> instead of React.DragEvent
…x in AgentLibrary

- Update onDragStart parameter type from React.DragEvent to React.DragEvent<HTMLDivElement>
- Fix broken backgroundColor value in motion.whileHover by replacing template literal with static rgba value
- Restructure component nesting to ensure proper hover tooltip positioning
- Remove unnecessary type casting in onDragStart handler
The hover background color in the AgentLibrary component had a malformed escape sequence (\15) instead of a valid RGBA value. This was causing a syntax error that prevented the component from rendering correctly.

Also remove the obsolete fix_agentlib.py script that was previously used to patch this issue.
- frontend-specialist.yaml: UI/UX architect (React, components)
- backend-specialist.yaml: API engineer (REST, async)
- database-architect.yaml: Data designer (schema, optimization)
- qa-engineer.yaml: Test guardian (test generation, validation)
- devops-engineer.yaml: Infrastructure expert (CI/CD, deployment)
- security-engineer.yaml: Security guardian (audits, scanning)
- system-architect.yaml: System designer (architecture, scalability)
- healer-agent.yaml: Recovery expert (health monitoring, auto-restart)
- test-agent.yaml: Validator (integration tests, smoke tests)
- throttle-agent.yaml: Resource guardian (CPU/memory monitoring)
- + existing: hypercode-core.yaml, crew-orchestrator.yaml

Each life-plan includes:
- Service identity & codename
- Responsibilities & dependencies
- SLOs (latency, error rate, availability)
- 2-4 failure modes with MTTR & recovery steps
- Key metrics to monitor & alert thresholds
- Deployment config (image, resources, healthcheck)
- On-call playbooks for common incidents
- Contact & escalation info

Total: 112KB, 12 comprehensive service runbooks
All agents now have living documentation for ops/on-call
Update the fetchAgents function to use the correct API endpoint and include the required authorization header. Also add a .dockerignore file to the dashboard agent directory to exclude unnecessary files from Docker builds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant