Based on the Flight & Hotel Tracker implementation drift, this document identifies improvement areas in the Ralph Wiggum Loop templates.
| Gap | Impact | Location(s) |
|---|---|---|
| No field-level traceability | Spec fields get skipped/renamed | TASKS.json, CONTEXT.json |
| No spec validation before verification | Agent doesn't check if all fields implemented | wiggum_driver.py |
| Bidirectional sync missing | Spec never updated after implementation | Manual process |
| Glossary underutilized | Naming drift (TripRequest → Route) |
system_architect.md |
| Workflow lacks spec-checking step | Agent skips straight to marking [x] | ralph_mode.md |
Current State:
{
"action": "Define Database Models",
"outcome": "SQLModel classes for Route, PricePoint, and Settings."
}Problem: The task says "create models" but doesn't list which fields from CONTEXT.json must be present. The agent can complete this by creating models with any fields.
Proposed Fix:
{
"action": "Define Database Models",
"outcome": "SQLModel classes for Route, PricePoint, and Settings.",
"field_requirements": {
"TripRequest": ["origin", "destination", "date_range_start", "date_range_end", "max_price_threshold", "cooldown_hours", "flexibility"]
},
"verification": {
"type": "schema_check",
"command": "python -c \"from backend.models import Route; print([f.name for f in Route.__fields__.values()])\"",
"expected_fields": ["origin", "destination", "max_price_threshold", "cooldown_hours"]
}
}Current State (line 439-480):
prompt = f"""Execute the following task from the Ralph Wiggum workflow:
**Task**: {task['action']}
...Problem: The driver injects the task but does NOT inject the required field list from CONTEXT.json. The agent has no way to know what fields are expected.
Proposed Fix: Add a new injection step:
# NEW: Inject required model fields from CONTEXT.json
if "models" in context_data and task.get("field_requirements"):
model_spec = context_data["models"]
prompt += f"""
**Required Model Fields (from CONTEXT.json):**
{json.dumps(model_spec, indent=2)}
You MUST implement ALL fields listed above. Missing fields = task failure.
"""Current State (line 177-192):
{
"primary": "invoice",
"synonyms": ["bill", "statement"],
"aliases_in_code": ["INV", "inv_id"]
}Problem: The glossary defines aliases_in_code but this is NOT enforced. The agent named the model Route when the spec said TripRequest.
Proposed Fix: Add to CONTEXT.json:
"naming_enforcement": {
"model_names": {
"TripRequest": "Route", // Explicit mapping OR enforcement
"PricePoint": "PricePoint"
},
"enforcement": "warn" // warn | block
}And add to the validation phase:
### Naming Validation
- [ ] All model names match CONTEXT.json `naming_enforcement.model_names`
- [ ] Divergence is logged in changelog with justificationCurrent State:
1. Read the Task List
2. Execute Task
3. Update Task List
4. Report CompletionProblem: No step to compare implementation against spec before marking complete.
Proposed Fix:
---
description: Ralph Wiggum Loop Workflow
---
# Ralph Wiggum Workflow
1. **Read the Task List**
- Open `.agent/task.md` and find the first unchecked item.
2. **Check Spec Requirements** ← NEW
- Open `.agent/CONTEXT.json` and identify the relevant models/fields
- If `field_requirements` exist in TASKS.json, list them explicitly
3. **Execute Task**
- Perform the necessary code changes
- **Crucial**: Implement ALL fields from the spec, not just "enough to work"
4. **Validate Against Spec** ← NEW
- Compare implemented fields to CONTEXT.json requirements
- If mismatch: DO NOT mark as complete; report discrepancy
5. **Update Task List**
- Mark the item as `[x]` in `.agent/task.md`
6. **Report Completion**
- State clearly "I have completed the task: [Task Name]"Problem: After implementation, the spec (CONTEXT.json) is never updated to reflect reality. This creates permanent drift.
Proposed Fix: Add to system_architect.md Phase 5:
**Phase 5B: Post-Implementation Sync (NEW)**
- Triggered after ALL tasks complete
- Compare CONTEXT.json models to actual implementations
- Generate changelog entry for any divergence
- Options:
- Update CONTEXT.json to match code (preferred for organic evolution)
- Create issue/task to fix code to match spec (preferred for strict compliance)And add a new verified artifact:
specs/
├── ...existing files...
└── IMPLEMENTATION_DELTA.json # Tracks spec vs reality differences
| # | Improvement | Effort | Impact | Status |
|---|---|---|---|---|
| 1 | Add field_requirements to TASKS.json schema |
Low | High | ✅ Solved (task_selector.py) |
| 2 | Update ralph_mode.md with validation step |
Low | High | ✅ Solved |
| 3 | Add spec field injection to wiggum_driver.py | Medium | High | ✅ Solved (wiggum_driver.py) |
| 4 | Add naming_enforcement to CONTEXT.json schema |
Low | Medium | ✅ Solved (spec_validator.py) |
| 5 | Add Phase 5B post-implementation sync | Medium | Medium | 🚧 Deferred |
| File | Changes Made |
|---|---|
ralph_mode.md |
Added "SPEC = REQUIREMENT" caution, validation steps 2 & 4, divergence reporting |
wiggum_driver.py |
Automated spec validation with retry loop (see below) |
system_architect.md |
Added field_requirements, naming_enforcement, Phase 5B sync |
| QA Integration | Added mandatory testing_strategy and Run Tests steps to all templates |
README.md |
Documented spec enforcement features |
The driver now includes real enforcement, not just advisory prompts:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Agent marks [x] │ ───▶ │ validate_spec_ │ ───▶ │ If FAIL: Unmark │
│ │ │ compliance() │ │ + retry prompt │
└─────────────────┘ └─────────────────┘ └─────────────────┘
| Function | Purpose |
|---|---|
validate_spec_compliance() |
Reads CONTEXT.json, parses models.py, compares fields |
unmark_task() |
Removes [x] from task.md so agent must retry |
- MAX_VALIDATION_RETRIES = 2 — Agent gets 2 chances to fix missing fields
- On failure: Driver injects error prompt listing exactly which fields are missing
- After max retries: Task is marked as failed and driver moves on
To ensure all future apps have testing built-in, the following changes were made:
| Template | Change |
|---|---|
system_architect.md |
MANDATES defining a "QA Strategy" (Frameworks, Golden Path) in Phase 2 |
ralph_mode.md |
Added "Regression Check" (Step 1b) and "Run Tests" (Step 3b) |
wiggum_driver.py |
Injects "RUN TESTS" instruction in every prompt |
- Update template schemas — Add
field_requirementsandnaming_enforcementto the spec format - Patch ralph_mode.md — Add steps 2 and 4 (spec check, validation)
- Enhance wiggum_driver.py — Inject model field requirements into prompts
The following improvements were implemented to finalize V1:
- Validation:
TaskSelectornow validatesTASKS.jsonschema on load. - Error Handling:
wiggum_driver.pyincludes robust retries for CDP connections.
- Aggregation:
qa_verification.pyaggregates all checks into.agent/qa_report.json. - Visual Checks:
qa/visual_checker.pynow includes strict rule-based checks (PNG header, size) before LLM calls.
- E2E: Expanded
tests/test_e2e_flow.pywith dependency handling and context rotation scenarios. - CI: Configured GitHub Actions to archive QA reports.
- Performance:
ProgressMonitornow logs task duration to.agent/metrics.json.
- Workflow: Added Mermaid sequence diagram for triggers to
README.md.