Skip to content

feat(#125): retry loop fix — targeted retry prompts, QA leniency, fix hints#126

Merged
Abernaughty merged 7 commits intomainfrom
feature/125-retry-loop-fix
Apr 6, 2026
Merged

feat(#125): retry loop fix — targeted retry prompts, QA leniency, fix hints#126
Abernaughty merged 7 commits intomainfrom
feature/125-retry-loop-fix

Conversation

@Abernaughty
Copy link
Copy Markdown
Owner

@Abernaughty Abernaughty commented Apr 6, 2026

Summary

Fixes the retry loop's inability to solve trivially fixable QA failures (#125). Root cause: the Lead Dev was regenerating code from scratch on every retry because it never saw its previous output, sandbox results, or a targeted "fix, don't rewrite" instruction.

Changes

FailureReport schema (src/agents/qa.py)

  • Added FixComplexity enum (trivial/moderate/complex)
  • Added fix_complexity and exact_fix_hint fields to FailureReport
  • Case-insensitive validator for fix_complexity (matches existing failure_type pattern)
  • Full backward compatibility — old LLM output without new fields still works

Developer retry prompt (src/orchestrator.py)

  • _build_retry_file_context(): reads current files from disk on retry
    • Only reads failure_report.failed_files (not all targets) for context budget
    • Respects workspace_root boundary (path traversal protection)
    • Configurable char limit via MAX_RETRY_FILE_CHARS env var (default 30k)
  • _build_retry_sandbox_context(): formats sandbox stdout/stderr for retry prompt
  • Retry-specific system prompt: "apply MINIMUM change, do NOT rewrite from scratch"
  • Includes exact_fix_hint and fix_complexity when provided by QA
  • First attempt and architectural failures still use the standard prompt (no behavior change)

QA leniency (src/orchestrator.py)

  • When blueprint.acceptance_criteria is empty, QA system prompt receives leniency guidance:
    • Prioritize functional correctness over cosmetic precision
    • Don't invent strict formatting requirements
    • Bias toward PASS if sandbox executed successfully (exit_code=0)
  • QA prompt now requests fix_complexity and exact_fix_hint in its JSON schema

Tests (tests/test_retry_loop.py) — 30 new tests

  • FailureReport new fields: trivial/moderate/complex, case insensitive, backward compat, JSON round-trip
  • FixComplexity enum values and validation
  • _build_retry_file_context: reads files, respects budget, blocks path traversal, uses failed_files over targets, handles missing files
  • _build_retry_sandbox_context: formats output, handles None
  • Developer retry prompt: file context injection, sandbox output injection, standard prompt on first attempt, standard prompt on architectural failure
  • QA leniency: triggers on empty criteria, skipped with criteria present

Testing

  • All new unit tests pass
  • Integration test: re-run Triforce task, expect ≤1 retry (manual verification after merge)

Closes

Closes #125

Summary by CodeRabbit

  • New Features

    • Failure reports now include an estimated fix complexity (trivial, moderate, complex) and an optional exact-fix hint; complexity inputs are parsed case-insensitively, trim whitespace, and tolerate empty/unknown values.
    • Retry attempts now inject current on-disk file context and recent sandbox execution results into retry prompts, with per-retry size limits and path validation.
    • QA reporting is more lenient when acceptance criteria are absent, favoring successful sandbox outcomes.
  • Tests

    • Added comprehensive tests covering retry behavior, file/sandbox context handling, and failure-report fields.

… hints

## Changes

### FailureReport schema (`src/agents/qa.py`)
- Added `FixComplexity` enum (trivial/moderate/complex)
- Added `fix_complexity` and `exact_fix_hint` fields to FailureReport
- Case-insensitive validator for `fix_complexity` (matches `failure_type` pattern)
- Full backward compatibility — old LLM output without new fields still works

### Developer retry prompt (`src/orchestrator.py`)
- New `_build_retry_file_context()`: reads current files from disk on retry
  - Only reads `failure_report.failed_files` (not all targets) for context budget
  - Respects workspace_root boundary (path traversal protection)
  - Configurable char limit via MAX_RETRY_FILE_CHARS env var (default 30k)
- New `_build_retry_sandbox_context()`: formats sandbox stdout for retry prompt
- Retry-specific system prompt: "apply MINIMUM change, do NOT rewrite"
- Includes exact_fix_hint and fix_complexity when provided by QA
- First attempt and architectural failures still use standard prompt

### QA leniency (`src/orchestrator.py`)
- When `blueprint.acceptance_criteria` is empty, QA system prompt gets
  leniency guidance: prioritize functional correctness, don't invent
  strict formatting requirements, bias toward PASS if sandbox succeeded
- QA prompt now requests fix_complexity and exact_fix_hint in JSON schema

### Tests (`tests/test_retry_loop.py`)
- FailureReport new fields: trivial/moderate/complex, case insensitive,
  backward compat, JSON round-trip
- FixComplexity enum values and validation
- _build_retry_file_context: reads files, respects budget, blocks traversal,
  uses failed_files over targets, handles missing files
- _build_retry_sandbox_context: formats output, handles None
- Developer retry prompt: file context injection, sandbox output injection,
  standard prompt on first attempt, standard prompt on architectural failure
- QA leniency: triggers on empty criteria, skipped with criteria

Closes #125
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a FixComplexity enum and two optional fields to FailureReport; implements retry-context builders to inject current on-disk file contents and sandbox results into developer retry prompts; updates qa_node prompt/schema and developer_node retry handling; and adds tests covering models, file/sandbox context builders, and retry prompts.

Changes

Cohort / File(s) Summary
QA Model Extensions
dev-suite/src/agents/qa.py
Add FixComplexity enum (trivial, moderate, complex); add `fix_complexity: FixComplexity
Orchestrator Retry Logic
dev-suite/src/orchestrator.py
Introduce MAX_RETRY_FILE_CHARS; implement _build_retry_file_context() (reads current files with workspace-boundary checks and char-budgeting) and _build_retry_sandbox_context() (formats sandbox exit/test/output summary); treat non-architectural failure_report as retry path and inject a structured RETRY CONTEXT (QA details, CURRENT FILES ON DISK, SANDBOX EXECUTION RESULTS, exact_fix_hint) into developer_node; extend qa_node prompt/schema to require fix_complexity/exact_fix_hint and add leniency instructions when acceptance criteria are empty.
Tests: Retry Loop & Models
dev-suite/tests/test_retry_loop.py
Add tests for FixComplexity parsing/normalization and FailureReport behavior; unit tests for _build_retry_file_context (file reading, missing files marker, max-char skipping, path traversal protection) and _build_retry_sandbox_context; integration-style tests for developer_node retry prompt composition/traces and qa_node leniency behavior.

Sequence Diagram(s)

sequenceDiagram
    participant QA as QA Node
    participant Orch as Orchestrator
    participant Dev as Developer Node
    participant Sand as Sandbox

    Note over Dev,Sand: Initial attempt
    Dev->>Sand: Run code in sandbox
    Sand-->>Orch: execution results (exit code, output, tests)
    Orch->>QA: Provide code + sandbox results for QA
    QA-->>Orch: failure_report (may include fix_complexity, exact_fix_hint)

    Note over Orch,Dev: Retry flow
    Orch->>Orch: detect retry (non-architectural failure_report)
    Orch->>Orch: _build_retry_file_context() — read current files on disk
    Orch->>Orch: _build_retry_sandbox_context() — format sandbox results
    Orch->>Dev: send retry prompt with RETRY CONTEXT (failure_report, CURRENT FILES ON DISK, SANDBOX EXECUTION RESULTS, exact_fix_hint)
    Dev->>Sand: apply targeted fix and re-run
    Sand-->>Orch: new execution results
    Orch->>QA: re-evaluate
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I nibble at traces and hop through the stack,
I stitch file contents back and gently unpack,
Sandbox hums its output, QA whispers the hint,
A tiny, tidy fix — no broad rewrite sprint,
I twitch my whiskers — retry yields a patchy win.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: implementing a retry loop fix with targeted retry prompts, QA leniency, and fix hints to address issue #125.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #125: structured retry context with file contents and sandbox output, failure classification via FixComplexity enum, QA leniency for empty acceptance criteria, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #125: new FailureReport fields, retry context builders, prompt modifications, and related tests with no extraneous changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/125-retry-loop-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
dev-suite/tests/test_retry_loop.py (2)

505-534: QA leniency tests verify the trigger condition but not the actual prompt behavior.

The tests check not bp.acceptance_criteria but don't verify that the leniency text actually appears in the QA system prompt. Consider adding an integration-style test that mocks the QA LLM and inspects the actual prompt content, similar to TestDeveloperRetryPrompt.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_retry_loop.py` around lines 505 - 534, Add an
integration-style assertion that verifies the leniency text is actually injected
into the QA system prompt by mocking the QA LLM and capturing the prompt
content; update or add a test (e.g., extend
test_qa_prompt_includes_leniency_for_empty_criteria or create a new test modeled
after TestDeveloperRetryPrompt) to instantiate Blueprint with empty
acceptance_criteria, run the qa_node/prompt-construction flow that builds the
system prompt, and assert the expected leniency string appears in the captured
prompt rather than only checking not bp.acceptance_criteria.

238-248: Path traversal test assertion could be more precise.

The current assertion assert "etc/passwd" not in result or "not found" in result conflates two different behaviors: path traversal blocking and file-not-found. The "not found" message is for missing files, not blocked paths.

Consider a more direct assertion that verifies the path traversal is actually blocked (the function logs a warning and skips the file, resulting in no entry for that path).

💡 Suggested improvement
     def test_blocks_path_traversal(self, tmp_path):
         """Path traversal attempts should be blocked."""
         fr = self._make_failure_report(
             failed_files=["../../etc/passwd"]
         )
         bp = self._make_blueprint()

         result = _build_retry_file_context(fr, bp, tmp_path)

-        # Should not contain the file content (blocked)
-        assert "etc/passwd" not in result or "not found" in result
+        # Path traversal should be blocked: file is skipped entirely
+        # (no entry added, not even "not found" which is for missing files within workspace)
+        assert "etc/passwd" not in result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_retry_loop.py` around lines 238 - 248, The test
currently mixes file-not-found and path-traversal behaviors; update
test_blocks_path_traversal to assert that the traversal path is absent from the
built context instead of allowing a "not found" fallback: call
_build_retry_file_context(fr, bp, tmp_path) and assert that the exact traversal
string (e.g. "../../etc/passwd") and its basename ("etc/passwd") are not present
in result, reflecting that _build_retry_file_context skips/logs and does not
include blocked paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 363-372: The string-prefix path traversal check is unsafe; replace
the vulnerable startswith check with Path.is_relative_to to ensure full_path is
actually inside workspace_root. In the try block that computes full_path =
(workspace_root / file_path).resolve(), change the condition to use
full_path.is_relative_to(workspace_root) (negated to trigger the warning and
continue) and keep the existing except (ValueError, OSError) handling for
resolve errors; reference the variables full_path, workspace_root, and file_path
when making this change.

In `@dev-suite/tests/test_retry_loop.py`:
- Around line 10-25: Remove the two unused stdlib imports (textwrap and
pathlib.Path) from the top import block and reorder the imports into proper
groups (stdlib, third-party, local) and alphabetically within each group; keep
only the actually used imports from unittest.mock (MagicMock, patch), pytest,
and the local symbols used in the test (Blueprint, FailureReport, FailureType,
FixComplexity, GraphState, WorkflowStatus, _build_retry_file_context,
_build_retry_sandbox_context, developer_node, SandboxResult) so linting no
longer flags unused or unsorted imports in the test file.

---

Nitpick comments:
In `@dev-suite/tests/test_retry_loop.py`:
- Around line 505-534: Add an integration-style assertion that verifies the
leniency text is actually injected into the QA system prompt by mocking the QA
LLM and capturing the prompt content; update or add a test (e.g., extend
test_qa_prompt_includes_leniency_for_empty_criteria or create a new test modeled
after TestDeveloperRetryPrompt) to instantiate Blueprint with empty
acceptance_criteria, run the qa_node/prompt-construction flow that builds the
system prompt, and assert the expected leniency string appears in the captured
prompt rather than only checking not bp.acceptance_criteria.
- Around line 238-248: The test currently mixes file-not-found and
path-traversal behaviors; update test_blocks_path_traversal to assert that the
traversal path is absent from the built context instead of allowing a "not
found" fallback: call _build_retry_file_context(fr, bp, tmp_path) and assert
that the exact traversal string (e.g. "../../etc/passwd") and its basename
("etc/passwd") are not present in result, reflecting that
_build_retry_file_context skips/logs and does not include blocked paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e73dbbf-d988-4551-99aa-f954d4ec025d

📥 Commits

Reviewing files that changed from the base of the PR and between ef51a8a and 7afb41a.

📒 Files selected for processing (3)
  • dev-suite/src/agents/qa.py
  • dev-suite/src/orchestrator.py
  • dev-suite/tests/test_retry_loop.py

- Removed unused `textwrap` import
- Removed unused `Path` import
- Fixed assertion in test_handles_no_output_summary to check "Output:" not "Output"
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
dev-suite/tests/test_retry_loop.py (1)

10-23: ⚠️ Potential issue | 🟡 Minor

Fix import ordering to satisfy Ruff I001.

CI is still failing on this import block formatting/order.

Proposed import ordering
 from unittest.mock import MagicMock, patch
 
 import pytest
 
 from src.agents.architect import Blueprint
 from src.agents.qa import FailureReport, FailureType, FixComplexity
 from src.orchestrator import (
-    GraphState,
-    WorkflowStatus,
     _build_retry_file_context,
     _build_retry_sandbox_context,
+    GraphState,
+    WorkflowStatus,
     developer_node,
 )
 from src.sandbox.e2b_runner import SandboxResult
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_retry_loop.py` around lines 10 - 23, Reorder the import
block in test_retry_loop.py to satisfy Ruff I001 by grouping and alphabetizing:
place stdlib imports first (from unittest.mock import MagicMock, patch), then
third-party imports (import pytest), then local package imports sorted
alphabetically — e.g., from src.agents.architect import Blueprint; from
src.agents.qa import FailureReport, FailureType, FixComplexity; from
src.orchestrator import GraphState, WorkflowStatus, _build_retry_file_context,
_build_retry_sandbox_context, developer_node; and finally from
src.sandbox.e2b_runner import SandboxResult; ensure each group is separated by a
blank line and names within each from-import are alphabetized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/tests/test_retry_loop.py`:
- Around line 504-534: Replace the current truthiness-only assertions with calls
into qa_node to capture the actual system message sent to the QA LLM: mock the
QA LLM client used by qa_node (or dependency like qa_llm.call/send) so you can
invoke qa_node with the Blueprint instances and inspect the constructed system
prompt; in test_qa_prompt_includes_leniency_for_empty_criteria assert that the
system message produced by qa_node contains the leniency block/text when
Blueprint.acceptance_criteria is empty, and in
test_qa_prompt_no_leniency_with_criteria assert that the same system message
does NOT contain the leniency text when Blueprint.acceptance_criteria is
non-empty—use the qa_node entry point and the Blueprint class names to locate
the code to exercise and the mocked LLM to capture the system prompt.
- Around line 158-177: The helper functions _make_blueprint and
_make_failure_report incorrectly use the "or" fallback (e.g., target_files or
["app.py"]) which coerces explicit empty lists to the default; change the
signatures to accept None as the default (e.g., target_files=None,
failed_files=None) and inside each function use an explicit None check to decide
whether to use the default list (use target_files if target_files is not None
else ["app.py"], and similarly for failed_files) so that passing [] remains a
valid, preserved value.
- Around line 236-247: The test's assertion is too weak — it only checks for the
string "etc/passwd" which may not appear even if traversal succeeded; update
test_blocks_path_traversal to assert that the literal traversal input
("../../etc/passwd") is not present in result and also assert that common passwd
content markers (e.g., "root:" or other colon-separated user entries) are not
present; reference the helper _build_retry_file_context and failure report
created by _make_failure_report so the test fails if file contents from
/etc/passwd are injected rather than merely checking for the path substring.

---

Duplicate comments:
In `@dev-suite/tests/test_retry_loop.py`:
- Around line 10-23: Reorder the import block in test_retry_loop.py to satisfy
Ruff I001 by grouping and alphabetizing: place stdlib imports first (from
unittest.mock import MagicMock, patch), then third-party imports (import
pytest), then local package imports sorted alphabetically — e.g., from
src.agents.architect import Blueprint; from src.agents.qa import FailureReport,
FailureType, FixComplexity; from src.orchestrator import GraphState,
WorkflowStatus, _build_retry_file_context, _build_retry_sandbox_context,
developer_node; and finally from src.sandbox.e2b_runner import SandboxResult;
ensure each group is separated by a blank line and names within each from-import
are alphabetized.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ac267907-49ad-425d-bae0-3303897585d8

📥 Commits

Reviewing files that changed from the base of the PR and between 7afb41a and 721e1d3.

📒 Files selected for processing (1)
  • dev-suite/tests/test_retry_loop.py

Comment on lines +236 to +247
def test_blocks_path_traversal(self, tmp_path):
"""Path traversal attempts should be blocked."""
fr = self._make_failure_report(
failed_files=["../../etc/passwd"]
)
bp = self._make_blueprint()

result = _build_retry_file_context(fr, bp, tmp_path)

# Should not contain the file content (blocked)
assert "etc/passwd" not in result or "not found" in result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Path-traversal assertion is too weak and can false-pass.

Line 246 checks for "etc/passwd" text, but leaked file contents typically won’t contain that path string. The test can pass even if traversal succeeds.

Stronger traversal test pattern
     def test_blocks_path_traversal(self, tmp_path):
         """Path traversal attempts should be blocked."""
-        fr = self._make_failure_report(
-            failed_files=["../../etc/passwd"]
-        )
+        secret = tmp_path.parent / "secret.txt"
+        secret.write_text("TOP_SECRET")
+        fr = self._make_failure_report(failed_files=["../secret.txt"])
         bp = self._make_blueprint()
 
         result = _build_retry_file_context(fr, bp, tmp_path)
 
-        # Should not contain the file content (blocked)
-        assert "etc/passwd" not in result or "not found" in result
+        # Must not leak outside-workspace content
+        assert "TOP_SECRET" not in result
+        assert result == ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_retry_loop.py` around lines 236 - 247, The test's
assertion is too weak — it only checks for the string "etc/passwd" which may not
appear even if traversal succeeded; update test_blocks_path_traversal to assert
that the literal traversal input ("../../etc/passwd") is not present in result
and also assert that common passwd content markers (e.g., "root:" or other
colon-separated user entries) are not present; reference the helper
_build_retry_file_context and failure report created by _make_failure_report so
the test fails if file contents from /etc/passwd are injected rather than merely
checking for the path substring.

…leniency tests

Fixes:
- Ruff I001: reorder imports alphabetically (_build_* before GraphState)
- Path traversal: use Path.is_relative_to() instead of str.startswith()
- QA leniency tests: mock QA LLM and verify actual prompt content
- Helper methods: use `is not None` instead of `or` for empty list args
- Path traversal test: assert path is completely absent from result
- Added qa_node import for leniency tests
- Used f-string for retry context format string
…n groups

Ruff isort sorts uppercase names before underscore-prefixed names:
GraphState, WorkflowStatus before _build_retry_file_context.
Also removes blank line between `import pytest` and `from src.*`
(ruff treats both as third-party group).
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/tests/test_retry_loop.py`:
- Around line 10-24: The import block in tests/test_retry_loop.py is
misformatted (I001) because the multi-line import from src.orchestrator is not
sorted/grouped correctly; reorder and format the imports so they follow standard
grouping (stdlib, third-party, local) and alphabetize the names in the
multi-line import list (ensure GraphState, WorkflowStatus,
_build_retry_file_context, _build_retry_sandbox_context, developer_node, qa_node
are sorted), or simply run ruff check --fix tests/test_retry_loop.py to auto-fix
the import ordering and formatting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 30e7a5fe-4451-4bf5-97af-98c90cfbd8dd

📥 Commits

Reviewing files that changed from the base of the PR and between 721e1d3 and 16bf843.

📒 Files selected for processing (2)
  • dev-suite/src/orchestrator.py
  • dev-suite/tests/test_retry_loop.py

Comment on lines +10 to +24
from unittest.mock import MagicMock, patch

import pytest

from src.agents.architect import Blueprint
from src.agents.qa import FailureReport, FailureType, FixComplexity
from src.orchestrator import (
_build_retry_file_context,
_build_retry_sandbox_context,
GraphState,
WorkflowStatus,
developer_node,
qa_node,
)
from src.sandbox.e2b_runner import SandboxResult
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix import block formatting to resolve CI failure.

The pipeline reports I001: import block is un-sorted or un-formatted. The imports from src.orchestrator are likely causing the issue due to their ordering within the multi-line import.

🧹 Proposed fix
 from unittest.mock import MagicMock, patch
 
 import pytest
 
 from src.agents.architect import Blueprint
 from src.agents.qa import FailureReport, FailureType, FixComplexity
 from src.orchestrator import (
+    GraphState,
+    WorkflowStatus,
     _build_retry_file_context,
     _build_retry_sandbox_context,
-    GraphState,
-    WorkflowStatus,
     developer_node,
     qa_node,
 )
 from src.sandbox.e2b_runner import SandboxResult

Run ruff check --fix tests/test_retry_loop.py to auto-format.

🧰 Tools
🪛 GitHub Actions: CI

[error] 10-24: Ruff check failed: import block is un-sorted or un-formatted. (I001) Help: Organize imports. Found 1 error; 1 fixable with --fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_retry_loop.py` around lines 10 - 24, The import block in
tests/test_retry_loop.py is misformatted (I001) because the multi-line import
from src.orchestrator is not sorted/grouped correctly; reorder and format the
imports so they follow standard grouping (stdlib, third-party, local) and
alphabetize the names in the multi-line import list (ensure GraphState,
WorkflowStatus, _build_retry_file_context, _build_retry_sandbox_context,
developer_node, qa_node are sorted), or simply run ruff check --fix
tests/test_retry_loop.py to auto-fix the import ordering and formatting.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
dev-suite/tests/test_retry_loop.py (2)

10-23: ⚠️ Potential issue | 🟡 Minor

Fix import grouping/order to clear Ruff I001.

Line 10 through Line 23 still trips Ruff I001 in CI. Please auto-format this import block (group separation + sorting) and commit the result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_retry_loop.py` around lines 10 - 23, Reformat the import
block in test_retry_loop.py to satisfy Ruff I001: group standard library imports
(from unittest.mock import MagicMock, patch) first, then third-party (import
pytest), then local project imports (from src.agents.architect import Blueprint;
from src.agents.qa import FailureReport, FailureType, FixComplexity; from
src.orchestrator import GraphState, WorkflowStatus, _build_retry_file_context,
_build_retry_sandbox_context, developer_node, qa_node; from
src.sandbox.e2b_runner import SandboxResult), and sort names within each group
alphabetically; ensure a blank line separates each group and commit the change.

236-247: ⚠️ Potential issue | 🟠 Major

Strengthen the path-traversal test to avoid false passes.

Line 246 currently checks only for "etc/passwd" text, which can pass even if out-of-workspace content is injected. Assert on leaked content (and optionally exact empty result) instead.

Suggested test hardening
     def test_blocks_path_traversal(self, tmp_path):
         """Path traversal attempts should be blocked."""
-        fr = self._make_failure_report(
-            failed_files=["../../etc/passwd"]
-        )
+        secret = tmp_path.parent / "secret.txt"
+        secret.write_text("TOP_SECRET")
+        fr = self._make_failure_report(failed_files=["../secret.txt"])
         bp = self._make_blueprint()
 
         result = _build_retry_file_context(fr, bp, tmp_path)
 
-        # Path traversal is blocked: file is skipped entirely
-        assert "etc/passwd" not in result
+        # Must not leak out-of-workspace file content
+        assert "TOP_SECRET" not in result
+        assert result == ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_retry_loop.py` around lines 236 - 247, The test
test_blocks_path_traversal currently only asserts "etc/passwd" substring absence
which can miss leaked content; update the assertion to ensure no real passwd
content can be returned by checking that _build_retry_file_context(fr, bp,
tmp_path) yields an empty result (assert result == {} or the appropriate empty
type) and also assert that known passwd markers like "root:" and any absolute
path patterns (e.g., "/etc/passwd") are not present in result; locate the test
function test_blocks_path_traversal and replace the final loose substring check
with these stronger assertions referring to the _build_retry_file_context call
and the fr/bp/tmp_path setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@dev-suite/tests/test_retry_loop.py`:
- Around line 10-23: Reformat the import block in test_retry_loop.py to satisfy
Ruff I001: group standard library imports (from unittest.mock import MagicMock,
patch) first, then third-party (import pytest), then local project imports (from
src.agents.architect import Blueprint; from src.agents.qa import FailureReport,
FailureType, FixComplexity; from src.orchestrator import GraphState,
WorkflowStatus, _build_retry_file_context, _build_retry_sandbox_context,
developer_node, qa_node; from src.sandbox.e2b_runner import SandboxResult), and
sort names within each group alphabetically; ensure a blank line separates each
group and commit the change.
- Around line 236-247: The test test_blocks_path_traversal currently only
asserts "etc/passwd" substring absence which can miss leaked content; update the
assertion to ensure no real passwd content can be returned by checking that
_build_retry_file_context(fr, bp, tmp_path) yields an empty result (assert
result == {} or the appropriate empty type) and also assert that known passwd
markers like "root:" and any absolute path patterns (e.g., "/etc/passwd") are
not present in result; locate the test function test_blocks_path_traversal and
replace the final loose substring check with these stronger assertions referring
to the _build_retry_file_context call and the fr/bp/tmp_path setup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7cb2f6c1-290b-4d95-84dc-8dd03c8af947

📥 Commits

Reviewing files that changed from the base of the PR and between 16bf843 and d031ed4.

📒 Files selected for processing (1)
  • dev-suite/tests/test_retry_loop.py

CI runs from dev-suite/ where src/ is a local package, so ruff
detects src.* as first-party. This means:
- src.sandbox sorts before src.orchestrator (single-line before multi-line)
- Blank line before multi-line from src.orchestrator import (...)

Verified by running ruff --fix with src/ directory present.
test_failure_report_included_in_retry was asserting "PREVIOUS ATTEMPT
FAILED" which was the old retry format. The new #125 retry prompt uses
"RETRY CONTEXT" and "QA FAILURE REPORT" instead.

The architect escalation test still uses "PREVIOUS ATTEMPT FAILED"
correctly — that path was not changed by #125.
@Abernaughty Abernaughty merged commit b9bae42 into main Apr 6, 2026
3 checks passed
@Abernaughty Abernaughty deleted the feature/125-retry-loop-fix branch April 6, 2026 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

investigate: Retry loop fails to fix trivially solvable QA failures — improve agent problem-solving feedback loop

1 participant