Uroboros is an autonomous software engineering system capable of recursive self-improvement. It implements the "Adversarial Co-Evolution" paradigm, where a Builder Agent (Actor) and a Tester Agent (Adversary) compete in an infinite loop to generate robust, verified code.
Uroboros is an ancient image of a snake eating itself, representing the core behavior of this system: code that writes itself, and critiques itself in a recursive loop.
The system operates on the Uroboros Loop:
- Actor (The Builder): Generates code solutions and tools. It uses Voyager-style Memory (Vector DB) to retrieve past skills and avoid repeating mistakes.
- Adversary (The Critic): Generates "Killer Tests" designed to break the Actor's code. It targets edge cases, boundary conditions, and logic flaws.
- Arbiter (The Judge): Runs the code and tests in a secure, isolated Firecracker MicroVM (via E2B). It provides the ground truth signal (Pass/Fail/Crash).
- Evolver (The Optimizer): Analyzes failure patterns and rewrites the system prompts to improve future performance.
- Python 3.11+
- Poetry (Dependency Manager)
- Docker (Optional, for containerized runs)
- API Keys: OpenAI (
gpt-5-minirecommended) and E2B.
# Clone the repository
git clone git@github.com:renbytes/uroboros.git
cd uroboros
# Install dependencies via Poetry
# Note: This installs numpy<2.0.0 to ensure ChromaDB compatibility
poetry installCopy the template and add your secrets.
cp .env.example .envRequired .env variables:
OPENAI_API_KEY=sk-...
E2B_API_KEY=e2b_...
ACTOR_MODEL=gpt-5-mini
ADVERSARY_MODEL=gpt-5-mini
DEBUG=true # Set to 'true' to save full debugging artifactsRun this script to ensure your E2B sandbox connection is working:
poetry run python scripts/smoke_test.pyExpected Output: 🎉 Infrastructure is HEALTHY.
To assign a specific coding challenge to the agent:
poetry run python -m uroboros.main --task "Write a Flask API with a /users endpoint backed by SQLite."To let the agent generate its own curriculum and evolve indefinitely:
poetry run python -m uroboros.main --loopIf DEBUG=true is set in your .env, the system saves detailed artifacts for every step of the loop in:
data/intermediate_debugging/<task_id>/
Files generated:
_task_definition.txt: What the agent was asked to do._actor_reasoning.md: The Builder's internal monologue._actor_generated_code_X.py: The raw code patches._adversary_attack_plan.md: The logic behind the attack._adversary_test_code_X.py: The generated "Killer Tests"._attempt_X_failure_log.log: Combined stdout/stderr from the sandbox failure.
Cause: The test file tries to import the solution file, but Python's path isn't set correctly in the VM.
Fix: The Arbiter now runs tests using python -m pytest ., which adds the current directory to sys.path.
Cause: The LLM included Markdown fences (```python) or conversational text inside the code file.
Fix: The clean_code_block utility now strips markdown, and prompts explicitly forbid conversational filler in the content field.
Cause: Incompatibility between ChromaDB (v0.4.x) and NumPy 2.0.
Fix: pyproject.toml pins numpy = "<2.0.0". Run poetry lock && poetry install if you see this.
Cause: Using an older model (like gpt-4-turbo or gpt-3.5) that doesn't support "Structured Outputs" (json_schema).
Fix: Ensure ACTOR_MODEL=gpt-4o in your .env.
Cause: Using the synchronous Sandbox class instead of AsyncSandbox or using incorrect SDK v1 syntax.
Fix: The codebase now strictly uses AsyncSandbox.create() and sandbox.commands.run().
To run the internal unit and integration tests for the agent framework itself:
# Run all tests
poetry run pytest
# Run model connectivity check
poetry run pytest tests/integration/test_llm_connectivity.pyMIT License.
