azalio
diff --git a/‎.claude/agents/actor.md‎
Lines changed: 31 additions & 5 deletions b/‎.claude/agents/actor.md‎
Lines changed: 31 additions & 5 deletions
diff --git a/‎.claude/agents/monitor.md‎
Lines changed: 44 additions & 6 deletions b/‎.claude/agents/monitor.md‎
Lines changed: 44 additions & 6 deletions
diff --git a/‎.claude/agents/task-decomposer.md‎
Lines changed: 31 additions & 5 deletions b/‎.claude/agents/task-decomposer.md‎
Lines changed: 31 additions & 5 deletions
diff --git a/‎.claude/commands/map-debate.md‎
Lines changed: 5 additions & 1 deletion b/‎.claude/commands/map-debate.md‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎.claude/commands/map-efficient.md‎
Lines changed: 4 additions & 0 deletions b/‎.claude/commands/map-efficient.md‎
Lines changed: 4 additions & 0 deletions
@@ -375,8 +375,12 @@ Document key decisions using this structure:
 - [ ] Error cases (invalid input, failures)
 - [ ] Security cases (injection, auth bypass) — if applicable
 
+**Validation criteria → tests (MANDATORY when test_strategy is not N/A)**:
+- For each `VCn:` item in `validation_criteria`, implement or update at least one automated test that would fail without your change and pass with it.
+- Prefer naming tests with `vc<n>` (e.g., `test_vc1_*`, `TestVC1*`) so Monitor can deterministically confirm coverage.
+
 **Format**:
-```
+```text
 1. test_[function]_[scenario]_[expected]
    Input: [specific input]
    Expected: [specific output/behavior]
@@ -392,7 +396,18 @@ Document key decisions using this structure:
    Expected: 409, {"error": "Email already registered"}
 </example>
 
-## 6. Used Patterns (ACE Learning)
+## 6. Validation Criteria Coverage (Evidence)
+
+If the subtask packet includes `validation_criteria`, list each `VCn:` and where it is enforced.
+
+**Format**:
+```text
+VC1: <criterion text>
+- Code: path/to/file.ext#SymbolOrLocation
+- Tests: path/to/test_file.ext::test_name (or N/A with reason)
+```
+
+## 7. Used Patterns (ACE Learning)
 
 **Format**: `["impl-0012", "sec-0034"]` or `[]` if none
 
@@ -403,7 +418,7 @@ Document key decisions using this structure:
 
 **If no patterns match**: `[]` with note "No relevant patterns in current mem0"
 
-## 7. Integration Notes (If Applicable)
+## 8. Integration Notes (If Applicable)
 
 Only include if changes affect:
 - Database schema (migrations needed?)
@@ -449,6 +464,7 @@ Only include if changes affect:
 - [ ] AAG contract stated BEFORE code (Section 1)
 - [ ] Trade-offs documented with alternatives
 - [ ] Test cases cover happy + edge + error paths
+- [ ] Each `validation_criteria` item has at least one automated test (or explicit N/A with reason)
 - [ ] Used patterns tracked (or `[]` if none)
 - [ ] Template variables `{{...}}` preserved in generated code
 
@@ -518,6 +534,14 @@ with the following JSON content:
   "summary": "<one-line description of what was implemented>",
   "aag_contract": "<the AAG contract line>",
   "files_changed": ["<list of modified file paths>"],
+  "tests_changed": ["<list of modified/added test file paths>"],
+  "validation_criteria_coverage": [
+    {
+      "criterion": "VC1: ...",
+      "tests": ["path/to/test_file.ext::test_name"],
+      "notes": "Short justification if tests are N/A or partial"
+    }
+  ],
   "status": "applied"
 }
 ```
@@ -715,7 +739,10 @@ output:
 
 {{feedback}}
 
-**Action Required**: Address ALL issues above. Focus on:
+**Action Required**: Address ALL issues above. Do NOT dismiss feedback as "out of scope" or "separate task".
+If you believe an item should be deferred, STOP and ask the user for explicit approval to defer.
+
+Focus on:
 1. Specific line items mentioned
 2. Quality checklist items that failed
 3. Security or constraint violations
@@ -1083,4 +1110,3 @@ export class ReconnectingWebSocket {
 **Used Bullets**: `[]` (No similar patterns in mem0. Novel implementation.)
 
 </Actor_Reference_Examples>
-
 
@@ -821,15 +821,27 @@ When `{{requirements}}` or `{{subtask_description}}` includes `validation_criter
 ```
 FOR each criterion in validation_criteria:
   1. PARSE criterion into testable assertion
-  2. VERIFY assertion against {{solution}}
-  3. RECORD result: PASS | FAIL | PARTIAL | UNTESTABLE
+  2. VERIFY assertion against {{solution}} (code-path evidence)
+  3. VERIFY test coverage using test_strategy (if not N/A)
+  4. RECORD result: PASS | FAIL | PARTIAL | UNTESTABLE
 
 CONTRACT_STATUS:
   - ALL PASS → contract_compliant: true
   - ANY FAIL → contract_compliant: false, list violations
   - ANY UNTESTABLE → flag for clarification
 ```
 
+### Test Coverage Rule (Executable Contracts)
+
+Design constraints only become reliable when they are enforced by executable checks.
+
+For each `VCn:` criterion:
+- If `test_strategy` is provided and not `N/A`, require at least one concrete test case that covers it.
+- Prefer deterministic mapping: test names include `vc<n>` (e.g., `test_vc1_*`, `TestVC1*`).
+- Evidence MUST include both:
+  - **Code evidence** (where in code the behavior is implemented), and
+  - **Test evidence** (where in tests it is asserted).
+
 ### Contract Assertion Patterns
 
 | Criterion Type | How to Verify | Example |
@@ -852,15 +864,31 @@ Include in JSON output when validation_criteria provided:
     "failed": 1,
     "untestable": 0,
     "details": [
-      {"criterion": "Returns 401 for expired token", "status": "PASS", "evidence": "Line 45: if token.expired: return 401"},
-      {"criterion": "Creates audit log entry", "status": "FAIL", "evidence": "No audit.log() call found in create_user()"}
+      {
+        "criterion": "VC1: Returns 401 for expired token (auth/middleware.py:validate_token)",
+        "status": "PASS",
+        "code_evidence": "auth/middleware.py:45: if token.expired: return 401",
+        "test_coverage": "PASS",
+        "test_evidence": "tests/test_auth.py::test_vc1_expired_token_returns_401"
+      },
+      {
+        "criterion": "VC2: Creates audit log entry with user_id (audit/logger.py:log_event)",
+        "status": "FAIL",
+        "code_evidence": "No audit.log_event() call found in create_user()",
+        "test_coverage": "MISSING",
+        "test_evidence": "No test found matching vc2 or described in test_strategy"
+      }
     ]
   },
   "contract_compliant": false
 }
 ```
 
-**Decision Rule**: If `contract_compliant: false`, set `valid: false` unless ALL failed contracts are LOW severity (documentation, naming).
+**Decision Rule**:
+- If `contract_compliant: false`, set `valid: false` unless ALL failed contracts are LOW severity (documentation, naming).
+- If any Behavioral/Integration/Edge-case criterion has `test_coverage != PASS` and test_strategy is not `N/A`:
+  - If `security_critical == true`: set `valid: false` (missing executable enforcement is a release blocker).
+  - Otherwise: add a **testability** issue and require Actor to add tests.
 
 </Monitor_Contract_Validation>
 
@@ -2495,6 +2523,10 @@ def check_rate_limit(user_id, action, limit=100, window=3600):
 - Requirements unmet → valid=false
 - Only MEDIUM/LOW issues → valid=true (with feedback)
 
+**Hard-stop semantics**:
+- If you set `valid=false`, the workflow MUST resolve the issues before proceeding.
+- Do not accept "we'll do it later" reasoning as a resolution unless the user explicitly approves deferral.
+
 </Monitor_Critical_Reminders>
 
 ### Evidence File (Artifact-Gated Validation)
@@ -2512,7 +2544,13 @@ with the following JSON content:
   "timestamp": "<ISO 8601 UTC>",
   "valid": true,
   "issues_found": 0,
-  "recommendation": "approve|reject|revise"
+  "recommendation": "approve|reject|revise",
+  "validation_criteria_test_coverage": {
+    "total": 0,
+    "covered": 0,
+    "missing": 0,
+    "notes": "Optional: summarize VC→test coverage findings"
+  }
 }
 ```
 
 
@@ -248,8 +248,21 @@ Return **ONLY** valid JSON in this exact structure:
 **subtasks[].complexity_rationale**: MUST reference factors: "Score N: factor (+X), factor (+Y)..."
 **subtasks[].validation_criteria**: Array of **testable conditions** that prove completion
   - REQUIRED: 2-4 specific, verifiable outcomes
-  - Good: "Returns 401 for expired token", "Creates audit log entry with user_id"
-  - Bad: "Works correctly", "Handles errors"
+  - Format (recommended): Prefix each item with `VC1:`, `VC2:`, ... for stable cross-agent reference.
+  - Each criterion MUST be both:
+    - **Behavior-/artifact-verifiable** (can be checked by reading code), and
+    - **Test-verifiable** (has at least one concrete test case planned in `test_strategy`).
+  - Each criterion SHOULD include a concrete anchor:
+    - endpoint/handler + route, OR
+    - function/class name + file path
+  - Good:
+    - "VC1: POST /users returns 201 and persists normalized email (users/routes.py:create_user)"
+    - "VC2: Returns 401 for expired token (auth/middleware.py:validate_token)"
+    - "VC3: Creates audit log entry with user_id (audit/logger.py:log_event)"
+  - Bad:
+    - "Works correctly"
+    - "Handles errors"
+    - "Tests pass"
 **subtasks[].contracts**: Array of **executable assertion patterns** (optional but recommended for complexity_score ≥ 5)
   - `type`: "precondition" | "postcondition" | "invariant"
   - `assertion`: Executable pattern (e.g., "response.status == 401 WHEN token.expired")
@@ -260,15 +273,26 @@ Return **ONLY** valid JSON in this exact structure:
   - This is the primary handoff artifact to the Actor agent
   - Actor "compiles" this contract into code; Monitor verifies against it
   - Format: `"<Actor> -> <Action>(params) -> <Goal with success criteria>"`
+  - **Integration is part of the contract**:
+    - Prefer describing the *entrypoint + call chain* that makes the behavior real (especially for validation, policy checks, auth, migrations).
+    - Avoid leaf-only contracts that are easy to satisfy in isolation but not wired into production code paths.
   - Examples:
     - `"AuthService -> validate(token) -> returns 401|200 with user_id"`
     - `"ProjectModel -> add_field(archived_at: DateTime?) -> migration passes"`
     - `"RateLimiter -> decorate(endpoint, 100/min) -> returns 429 when exceeded"`
+    - `"ConfigLoader -> load_policy(path) -> calls validate_risk_policy(); raises ConfigValidationError on contradictions"`
 **subtasks[].implementation_hint**: Optional guidance for non-obvious implementations
   - RECOMMENDED when: complexity_score >= 5 OR security_critical OR dependencies.length >= 2
   - OMIT when: standard pattern with obvious implementation
   - Example: "Use existing RateLimiter middleware, configure for /api/* routes"
 **subtasks[].test_strategy**: Required object with unit/integration/e2e keys. Use "N/A" for levels not applicable.
+  - MUST map `validation_criteria` → tests:
+    - For each `VCn:` criterion, include at least one planned test name that covers it.
+    - Recommended naming: include `vc<n>` in the test name (e.g., `test_vc1_*`, `TestVC1*`) for deterministic grep-ability.
+    - Recommended format: `path/to/test_file.ext::test_name_or_symbol`
+  - "N/A" is acceptable ONLY when:
+    - The repository has no automated test harness, and adding one is out-of-scope for this subtask.
+    - In that case: either add a FOUNDATION subtask to introduce a minimal test harness, or document the gap explicitly in risks/assumptions.
 **subtasks[].affected_files**: Precise file paths (NOT "backend", "frontend"); use [] if paths unknown
 
 ### Subtask Ordering
@@ -484,16 +508,18 @@ When invoked with `mode: "re_decomposition"` from the orchestrator, you receive
 - [ ] Each subtask is atomic (independently implementable + testable)
 - [ ] Each subtask has an aag_contract in `Actor -> Action(params) -> Goal` format
 - [ ] AAG contracts are specific (not "does stuff" — name classes, methods, return types)
+- [ ] AAG contracts include wiring/integration when relevant (entrypoint + validator/policy checks, not leaf-only helpers)
 - [ ] All dependencies are explicit and accurate
 - [ ] Subtasks ordered by dependency (foundations first)
 - [ ] 5-8 subtasks (not too granular or too coarse)
 - [ ] Titles are action-oriented (start with verb)
 - [ ] Descriptions explain HOW, not just WHAT
 
 **Acceptance Criteria**:
-- [ ] Each subtask has 3-5 specific criteria
+- [ ] Each subtask has 2-4 specific criteria
 - [ ] Criteria are testable and measurable
-- [ ] Criteria cover: functionality + edge cases + testing
+- [ ] Criteria cover: functionality + edge cases (as applicable)
+- [ ] Each VC has a concrete verification hook in test_strategy (at least one planned test per VC)
 - [ ] No vague criteria ("works", "is good", "done")
 
 **File Paths**:
@@ -510,7 +536,7 @@ When invoked with `mode: "re_decomposition"` from the orchestrator, you receive
 
 **Test Strategy**:
 - [ ] test_strategy object included for each subtask
-- [ ] Unit tests specified (REQUIRED for all subtasks)
+- [ ] Unit tests specified (default). If repo has no test harness: add a FOUNDATION subtask to introduce minimal tests or explicitly justify "N/A".
 - [ ] Integration tests specified when subtask integrates multiple components
 - [ ] E2e tests specified when subtask impacts user-facing functionality
 - [ ] "N/A" used appropriately when test layer not applicable
 
@@ -44,10 +44,14 @@ Task: $ARGUMENTS
 
 Hard requirements:
 - Use `blueprint.subtasks[].validation_criteria` (2-4 testable, verifiable outcomes)
+  - Prefix each criterion with `VC1:`, `VC2:`, ... (stable references for Actor/Monitor)
+  - Include a concrete anchor per VC (endpoint/function + file path)
 - Use `blueprint.subtasks[].dependencies` (array of subtask IDs) and order subtasks by dependency
 - Include `blueprint.subtasks[].complexity_score` (1-10) and `risk_level` (low|medium|high)
 - Include `blueprint.subtasks[].security_critical` (true for auth/crypto/validation/data access)
-- Include `blueprint.subtasks[].test_strategy` with unit/integration/e2e keys"
+- Include `blueprint.subtasks[].test_strategy` with unit/integration/e2e keys
+  - Map every `VCn:` to ≥1 planned test case (prefer test name contains `vc<n>`)
+  - Recommended format: `path/to/test_file.ext::test_name_or_symbol`"
 )
 ```
 
 
@@ -109,10 +109,14 @@ Task: $ARGUMENTS
 
 Hard requirements:
 - Use `blueprint.subtasks[].validation_criteria` (2-4 testable outcomes)
+  - Prefix each criterion with `VC1:`, `VC2:`, ... (stable references for Actor/Monitor)
+  - Include a concrete anchor per VC (endpoint/function + file path)
 - Use `blueprint.subtasks[].dependencies` (array of subtask IDs)
 - Include `complexity_score` (1-10) and `risk_level` (low|medium|high)
 - Include `security_critical` (true for auth/crypto/validation)
 - Include `test_strategy` with unit/integration/e2e keys
+  - Map every `VCn:` to ≥1 planned test case (prefer test name contains `vc<n>`)
+  - Recommended format: `path/to/test_file.ext::test_name_or_symbol`
 - Include `aag_contract` (one-line pseudocode: Actor -> Action -> Goal)
 
 AAG Contract format (REQUIRED per subtask):