Skip to content

Commit 656dca5

Browse files
authored
Merge pull request #78 from azalio/research/preconditions-postconditions
MAP: executable validation criteria + fix Claude Code hooks
2 parents 0f3fb26 + a39792d commit 656dca5

48 files changed

Lines changed: 1871 additions & 682 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/agents/actor.md

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -375,8 +375,12 @@ Document key decisions using this structure:
375375
- [ ] Error cases (invalid input, failures)
376376
- [ ] Security cases (injection, auth bypass) — if applicable
377377

378+
**Validation criteria → tests (MANDATORY when test_strategy is not N/A)**:
379+
- For each `VCn:` item in `validation_criteria`, implement or update at least one automated test that would fail without your change and pass with it.
380+
- Prefer naming tests with `vc<n>` (e.g., `test_vc1_*`, `TestVC1*`) so Monitor can deterministically confirm coverage.
381+
378382
**Format**:
379-
```
383+
```text
380384
1. test_[function]_[scenario]_[expected]
381385
Input: [specific input]
382386
Expected: [specific output/behavior]
@@ -392,7 +396,18 @@ Document key decisions using this structure:
392396
Expected: 409, {"error": "Email already registered"}
393397
</example>
394398

395-
## 6. Used Patterns (ACE Learning)
399+
## 6. Validation Criteria Coverage (Evidence)
400+
401+
If the subtask packet includes `validation_criteria`, list each `VCn:` and where it is enforced.
402+
403+
**Format**:
404+
```text
405+
VC1: <criterion text>
406+
- Code: path/to/file.ext#SymbolOrLocation
407+
- Tests: path/to/test_file.ext::test_name (or N/A with reason)
408+
```
409+
410+
## 7. Used Patterns (ACE Learning)
396411

397412
**Format**: `["impl-0012", "sec-0034"]` or `[]` if none
398413

@@ -403,7 +418,7 @@ Document key decisions using this structure:
403418

404419
**If no patterns match**: `[]` with note "No relevant patterns in current mem0"
405420

406-
## 7. Integration Notes (If Applicable)
421+
## 8. Integration Notes (If Applicable)
407422

408423
Only include if changes affect:
409424
- Database schema (migrations needed?)
@@ -449,6 +464,7 @@ Only include if changes affect:
449464
- [ ] AAG contract stated BEFORE code (Section 1)
450465
- [ ] Trade-offs documented with alternatives
451466
- [ ] Test cases cover happy + edge + error paths
467+
- [ ] Each `validation_criteria` item has at least one automated test (or explicit N/A with reason)
452468
- [ ] Used patterns tracked (or `[]` if none)
453469
- [ ] Template variables `{{...}}` preserved in generated code
454470

@@ -518,6 +534,14 @@ with the following JSON content:
518534
"summary": "<one-line description of what was implemented>",
519535
"aag_contract": "<the AAG contract line>",
520536
"files_changed": ["<list of modified file paths>"],
537+
"tests_changed": ["<list of modified/added test file paths>"],
538+
"validation_criteria_coverage": [
539+
{
540+
"criterion": "VC1: ...",
541+
"tests": ["path/to/test_file.ext::test_name"],
542+
"notes": "Short justification if tests are N/A or partial"
543+
}
544+
],
521545
"status": "applied"
522546
}
523547
```
@@ -715,7 +739,10 @@ output:
715739
716740
{{feedback}}
717741
718-
**Action Required**: Address ALL issues above. Focus on:
742+
**Action Required**: Address ALL issues above. Do NOT dismiss feedback as "out of scope" or "separate task".
743+
If you believe an item should be deferred, STOP and ask the user for explicit approval to defer.
744+
745+
Focus on:
719746
1. Specific line items mentioned
720747
2. Quality checklist items that failed
721748
3. Security or constraint violations
@@ -1083,4 +1110,3 @@ export class ReconnectingWebSocket {
10831110
**Used Bullets**: `[]` (No similar patterns in mem0. Novel implementation.)
10841111

10851112
</Actor_Reference_Examples>
1086-

.claude/agents/monitor.md

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -821,15 +821,27 @@ When `{{requirements}}` or `{{subtask_description}}` includes `validation_criter
821821
```
822822
FOR each criterion in validation_criteria:
823823
1. PARSE criterion into testable assertion
824-
2. VERIFY assertion against {{solution}}
825-
3. RECORD result: PASS | FAIL | PARTIAL | UNTESTABLE
824+
2. VERIFY assertion against {{solution}} (code-path evidence)
825+
3. VERIFY test coverage using test_strategy (if not N/A)
826+
4. RECORD result: PASS | FAIL | PARTIAL | UNTESTABLE
826827
827828
CONTRACT_STATUS:
828829
- ALL PASS → contract_compliant: true
829830
- ANY FAIL → contract_compliant: false, list violations
830831
- ANY UNTESTABLE → flag for clarification
831832
```
832833

834+
### Test Coverage Rule (Executable Contracts)
835+
836+
Design constraints only become reliable when they are enforced by executable checks.
837+
838+
For each `VCn:` criterion:
839+
- If `test_strategy` is provided and not `N/A`, require at least one concrete test case that covers it.
840+
- Prefer deterministic mapping: test names include `vc<n>` (e.g., `test_vc1_*`, `TestVC1*`).
841+
- Evidence MUST include both:
842+
- **Code evidence** (where in code the behavior is implemented), and
843+
- **Test evidence** (where in tests it is asserted).
844+
833845
### Contract Assertion Patterns
834846

835847
| Criterion Type | How to Verify | Example |
@@ -852,15 +864,31 @@ Include in JSON output when validation_criteria provided:
852864
"failed": 1,
853865
"untestable": 0,
854866
"details": [
855-
{"criterion": "Returns 401 for expired token", "status": "PASS", "evidence": "Line 45: if token.expired: return 401"},
856-
{"criterion": "Creates audit log entry", "status": "FAIL", "evidence": "No audit.log() call found in create_user()"}
867+
{
868+
"criterion": "VC1: Returns 401 for expired token (auth/middleware.py:validate_token)",
869+
"status": "PASS",
870+
"code_evidence": "auth/middleware.py:45: if token.expired: return 401",
871+
"test_coverage": "PASS",
872+
"test_evidence": "tests/test_auth.py::test_vc1_expired_token_returns_401"
873+
},
874+
{
875+
"criterion": "VC2: Creates audit log entry with user_id (audit/logger.py:log_event)",
876+
"status": "FAIL",
877+
"code_evidence": "No audit.log_event() call found in create_user()",
878+
"test_coverage": "MISSING",
879+
"test_evidence": "No test found matching vc2 or described in test_strategy"
880+
}
857881
]
858882
},
859883
"contract_compliant": false
860884
}
861885
```
862886

863-
**Decision Rule**: If `contract_compliant: false`, set `valid: false` unless ALL failed contracts are LOW severity (documentation, naming).
887+
**Decision Rule**:
888+
- If `contract_compliant: false`, set `valid: false` unless ALL failed contracts are LOW severity (documentation, naming).
889+
- If any Behavioral/Integration/Edge-case criterion has `test_coverage != PASS` and test_strategy is not `N/A`:
890+
- If `security_critical == true`: set `valid: false` (missing executable enforcement is a release blocker).
891+
- Otherwise: add a **testability** issue and require Actor to add tests.
864892

865893
</Monitor_Contract_Validation>
866894

@@ -2495,6 +2523,10 @@ def check_rate_limit(user_id, action, limit=100, window=3600):
24952523
- Requirements unmet → valid=false
24962524
- Only MEDIUM/LOW issues → valid=true (with feedback)
24972525

2526+
**Hard-stop semantics**:
2527+
- If you set `valid=false`, the workflow MUST resolve the issues before proceeding.
2528+
- Do not accept "we'll do it later" reasoning as a resolution unless the user explicitly approves deferral.
2529+
24982530
</Monitor_Critical_Reminders>
24992531

25002532
### Evidence File (Artifact-Gated Validation)
@@ -2512,7 +2544,13 @@ with the following JSON content:
25122544
"timestamp": "<ISO 8601 UTC>",
25132545
"valid": true,
25142546
"issues_found": 0,
2515-
"recommendation": "approve|reject|revise"
2547+
"recommendation": "approve|reject|revise",
2548+
"validation_criteria_test_coverage": {
2549+
"total": 0,
2550+
"covered": 0,
2551+
"missing": 0,
2552+
"notes": "Optional: summarize VC→test coverage findings"
2553+
}
25162554
}
25172555
```
25182556

.claude/agents/task-decomposer.md

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -248,8 +248,21 @@ Return **ONLY** valid JSON in this exact structure:
248248
**subtasks[].complexity_rationale**: MUST reference factors: "Score N: factor (+X), factor (+Y)..."
249249
**subtasks[].validation_criteria**: Array of **testable conditions** that prove completion
250250
- REQUIRED: 2-4 specific, verifiable outcomes
251-
- Good: "Returns 401 for expired token", "Creates audit log entry with user_id"
252-
- Bad: "Works correctly", "Handles errors"
251+
- Format (recommended): Prefix each item with `VC1:`, `VC2:`, ... for stable cross-agent reference.
252+
- Each criterion MUST be both:
253+
- **Behavior-/artifact-verifiable** (can be checked by reading code), and
254+
- **Test-verifiable** (has at least one concrete test case planned in `test_strategy`).
255+
- Each criterion SHOULD include a concrete anchor:
256+
- endpoint/handler + route, OR
257+
- function/class name + file path
258+
- Good:
259+
- "VC1: POST /users returns 201 and persists normalized email (users/routes.py:create_user)"
260+
- "VC2: Returns 401 for expired token (auth/middleware.py:validate_token)"
261+
- "VC3: Creates audit log entry with user_id (audit/logger.py:log_event)"
262+
- Bad:
263+
- "Works correctly"
264+
- "Handles errors"
265+
- "Tests pass"
253266
**subtasks[].contracts**: Array of **executable assertion patterns** (optional but recommended for complexity_score ≥ 5)
254267
- `type`: "precondition" | "postcondition" | "invariant"
255268
- `assertion`: Executable pattern (e.g., "response.status == 401 WHEN token.expired")
@@ -260,15 +273,26 @@ Return **ONLY** valid JSON in this exact structure:
260273
- This is the primary handoff artifact to the Actor agent
261274
- Actor "compiles" this contract into code; Monitor verifies against it
262275
- Format: `"<Actor> -> <Action>(params) -> <Goal with success criteria>"`
276+
- **Integration is part of the contract**:
277+
- Prefer describing the *entrypoint + call chain* that makes the behavior real (especially for validation, policy checks, auth, migrations).
278+
- Avoid leaf-only contracts that are easy to satisfy in isolation but not wired into production code paths.
263279
- Examples:
264280
- `"AuthService -> validate(token) -> returns 401|200 with user_id"`
265281
- `"ProjectModel -> add_field(archived_at: DateTime?) -> migration passes"`
266282
- `"RateLimiter -> decorate(endpoint, 100/min) -> returns 429 when exceeded"`
283+
- `"ConfigLoader -> load_policy(path) -> calls validate_risk_policy(); raises ConfigValidationError on contradictions"`
267284
**subtasks[].implementation_hint**: Optional guidance for non-obvious implementations
268285
- RECOMMENDED when: complexity_score >= 5 OR security_critical OR dependencies.length >= 2
269286
- OMIT when: standard pattern with obvious implementation
270287
- Example: "Use existing RateLimiter middleware, configure for /api/* routes"
271288
**subtasks[].test_strategy**: Required object with unit/integration/e2e keys. Use "N/A" for levels not applicable.
289+
- MUST map `validation_criteria` → tests:
290+
- For each `VCn:` criterion, include at least one planned test name that covers it.
291+
- Recommended naming: include `vc<n>` in the test name (e.g., `test_vc1_*`, `TestVC1*`) for deterministic grep-ability.
292+
- Recommended format: `path/to/test_file.ext::test_name_or_symbol`
293+
- "N/A" is acceptable ONLY when:
294+
- The repository has no automated test harness, and adding one is out-of-scope for this subtask.
295+
- In that case: either add a FOUNDATION subtask to introduce a minimal test harness, or document the gap explicitly in risks/assumptions.
272296
**subtasks[].affected_files**: Precise file paths (NOT "backend", "frontend"); use [] if paths unknown
273297

274298
### Subtask Ordering
@@ -484,16 +508,18 @@ When invoked with `mode: "re_decomposition"` from the orchestrator, you receive
484508
- [ ] Each subtask is atomic (independently implementable + testable)
485509
- [ ] Each subtask has an aag_contract in `Actor -> Action(params) -> Goal` format
486510
- [ ] AAG contracts are specific (not "does stuff" — name classes, methods, return types)
511+
- [ ] AAG contracts include wiring/integration when relevant (entrypoint + validator/policy checks, not leaf-only helpers)
487512
- [ ] All dependencies are explicit and accurate
488513
- [ ] Subtasks ordered by dependency (foundations first)
489514
- [ ] 5-8 subtasks (not too granular or too coarse)
490515
- [ ] Titles are action-oriented (start with verb)
491516
- [ ] Descriptions explain HOW, not just WHAT
492517

493518
**Acceptance Criteria**:
494-
- [ ] Each subtask has 3-5 specific criteria
519+
- [ ] Each subtask has 2-4 specific criteria
495520
- [ ] Criteria are testable and measurable
496-
- [ ] Criteria cover: functionality + edge cases + testing
521+
- [ ] Criteria cover: functionality + edge cases (as applicable)
522+
- [ ] Each VC has a concrete verification hook in test_strategy (at least one planned test per VC)
497523
- [ ] No vague criteria ("works", "is good", "done")
498524

499525
**File Paths**:
@@ -510,7 +536,7 @@ When invoked with `mode: "re_decomposition"` from the orchestrator, you receive
510536

511537
**Test Strategy**:
512538
- [ ] test_strategy object included for each subtask
513-
- [ ] Unit tests specified (REQUIRED for all subtasks)
539+
- [ ] Unit tests specified (default). If repo has no test harness: add a FOUNDATION subtask to introduce minimal tests or explicitly justify "N/A".
514540
- [ ] Integration tests specified when subtask integrates multiple components
515541
- [ ] E2e tests specified when subtask impacts user-facing functionality
516542
- [ ] "N/A" used appropriately when test layer not applicable

.claude/commands/map-debate.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,14 @@ Task: $ARGUMENTS
4444
4545
Hard requirements:
4646
- Use `blueprint.subtasks[].validation_criteria` (2-4 testable, verifiable outcomes)
47+
- Prefix each criterion with `VC1:`, `VC2:`, ... (stable references for Actor/Monitor)
48+
- Include a concrete anchor per VC (endpoint/function + file path)
4749
- Use `blueprint.subtasks[].dependencies` (array of subtask IDs) and order subtasks by dependency
4850
- Include `blueprint.subtasks[].complexity_score` (1-10) and `risk_level` (low|medium|high)
4951
- Include `blueprint.subtasks[].security_critical` (true for auth/crypto/validation/data access)
50-
- Include `blueprint.subtasks[].test_strategy` with unit/integration/e2e keys"
52+
- Include `blueprint.subtasks[].test_strategy` with unit/integration/e2e keys
53+
- Map every `VCn:` to ≥1 planned test case (prefer test name contains `vc<n>`)
54+
- Recommended format: `path/to/test_file.ext::test_name_or_symbol`"
5155
)
5256
```
5357

.claude/commands/map-efficient.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,10 +109,14 @@ Task: $ARGUMENTS
109109
110110
Hard requirements:
111111
- Use `blueprint.subtasks[].validation_criteria` (2-4 testable outcomes)
112+
- Prefix each criterion with `VC1:`, `VC2:`, ... (stable references for Actor/Monitor)
113+
- Include a concrete anchor per VC (endpoint/function + file path)
112114
- Use `blueprint.subtasks[].dependencies` (array of subtask IDs)
113115
- Include `complexity_score` (1-10) and `risk_level` (low|medium|high)
114116
- Include `security_critical` (true for auth/crypto/validation)
115117
- Include `test_strategy` with unit/integration/e2e keys
118+
- Map every `VCn:` to ≥1 planned test case (prefer test name contains `vc<n>`)
119+
- Recommended format: `path/to/test_file.ext::test_name_or_symbol`
116120
- Include `aag_contract` (one-line pseudocode: Actor -> Action -> Goal)
117121
118122
AAG Contract format (REQUIRED per subtask):

0 commit comments

Comments
 (0)