π Executive Summary
gh-aw-firewall is a highly mature agentic workflow repository β among the most advanced outside the gh-aw factory itself β with 28 compiled agentic workflows spanning security, CI/CD, documentation, and multi-engine smoke testing. However, three actionable gaps stand out: no issue triage/labeling agent, no meta-agent to audit workflow health, and a missing Firewall Escape Test Agent that is referenced in the security review workflow but doesn't yet exist.
π Patterns Learned from Pelis Agent Factory
From crawling the full Pelis blog series and exploring the githubnext/agentics reference repo, the key patterns are:
| Pattern |
Description |
Present here? |
| Specialization |
Many focused workflows vs one monolithic agent |
β
Yes β 28 specialized workflows |
| Multi-engine |
Different AI models for different tasks |
β
Yes β claude, codex, copilot |
| Meta-agents |
Agents that monitor other agents (Audit Workflows, Workflow Health Manager) |
β Missing |
| Cascade workflows |
Issues β downstream PR chains via issue-monster |
β
Partial β issue-monster exists |
| Cache-memory |
Cross-run persistent state (e.g., issue-duplication-detector) |
β
Yes |
| skip-if-match |
Preventing duplicate outputs |
β οΈ Partially broken β duplicates observed |
| Observability |
Metrics Collector, Portfolio Analyst |
β Missing |
| Issue triage |
Automated labeling + triage comments |
β Missing |
| Code quality agents |
Continuous Simplicity, Refactoring, Style |
β Missing |
| Breaking change detection |
Alerting on backward-incompatible changes |
β Missing |
| Daily malicious code scan |
Supply chain defense |
β Missing |
π Current Agentic Workflow Inventory
| Workflow |
Purpose |
Trigger |
Engine |
Assessment |
build-test-{bun,cpp,deno,dotnet,go,java,node,rust} |
Build & test PRs in 8 ecosystems |
PR opened/sync |
copilot |
β
Excellent coverage |
ci-cd-gaps-assessment |
Daily CI/CD gap analysis |
Schedule daily |
copilot |
β
Active, creating discussions |
ci-doctor |
Investigate CI failures, open issues |
workflow_run failed |
copilot |
β
Core workflow |
cli-flag-consistency-checker |
Weekly CLI flag consistency check |
Schedule weekly |
copilot |
β
Good hygiene |
dependency-security-monitor |
Daily CVE monitoring + dep PRs |
Schedule daily |
copilot |
β
Very active (3 open PRs) |
doc-maintainer |
Daily docs sync with code changes |
Schedule daily |
copilot |
β
Good coverage |
issue-duplication-detector |
Detect duplicate issues |
Issue opened |
copilot |
β
Uses cache-memory |
issue-monster |
Dispatch issues to Copilot SWE agent |
Issue opened + hourly |
copilot |
β
Core orchestrator |
pelis-agent-factory-advisor |
This workflow |
Schedule daily |
copilot |
β οΈ UNCOMPILED |
plan |
/plan slash command |
Discussion/issue comment |
copilot |
β
Interactive |
secret-digger-claude/codex/copilot |
Hourly secret scanning (3 engines) |
Hourly cron |
all 3 |
β οΈ Codex + Copilot failing |
security-guard |
PR security review |
PR opened/sync |
claude |
β
Excellent for this repo |
security-review |
Daily comprehensive security review |
Schedule daily |
copilot |
β
Very thorough |
smoke-{chroot,claude,codex,copilot} |
End-to-end smoke tests |
PR + schedule |
all 3 + copilot |
β
Multi-engine, excellent |
test-coverage-improver |
Weekly test coverage PRs |
Schedule weekly |
copilot |
β οΈ UNCOMPILED |
update-release-notes |
Enhance release notes on publish |
Release published |
copilot |
β
Good |
π¨ Immediate Issues to Address
These are operational problems with existing workflows that need fixing now.
1. Two workflows are uncompiled (pelis-agent-factory-advisor, test-coverage-improver)
- These will not run because GitHub Actions executes the
.lock.yml files, not the .md files
- Run
gh aw compile .github/workflows/test-coverage-improver.md and gh aw compile .github/workflows/pelis-agent-factory-advisor.md followed by the post-processing script
2. Duplicate discussions accumulating
3. Secret Digger failing for Codex + Copilot engines (#1107, #1105)
- Three parallel hourly secret scanners β the codex and copilot variants are failing; investigate and fix
4. Three open Dependency PRs stacking without merge (#1114, #1110, #1104)
dependency-security-monitor is creating PRs faster than they're being merged; consider adding auto-merge for patch-level safe updates or a stale-PR cleanup
π Actionable Recommendations
P0 β Implement Immediately
P0.1: Issue Triage Agent
What: Automatically label incoming issues with appropriate categories (bug, security, enhancement, documentation, question, good-first-issue)
Why: Currently 10 open issues have zero labels, making the issue tracker hard to navigate. The issue-monster dispatches issues but skips unlabeled/un-triaged ones. A triage agent feeds better quality issues into the cascade. From the factory: issue triage is the "hello world" of agentic workflows with immediate, clear value.
How: Add a new issue-triage.md workflow triggered on issues: [opened] with safe-outputs: add-labels and add-comment. Uses codebase context to label issues by analyzing title + body.
Effort: Low
---
on:
issues:
types: [opened]
permissions:
issues: read
contents: read
tools:
github:
toolsets: [issues, labels]
safe-outputs:
add-labels:
allowed: [bug, security, enhancement, documentation, question, good-first-issue, firewall, proxy, docker, ci]
add-comment: {}
timeout-minutes: 5
---
# Issue Triage Agent
Analyze issue #$\{\{ github.event.issue.number }} in $\{\{ github.repository }}...
P1 β Plan for Near-Term
P1.1: Firewall Escape Test Agent π₯
What: A dedicated daily agent that attempts to escape the AWF network firewall using known techniques and reports findings as a discussion
Why: The security-review.md workflow already references this agent ("Read the Firewall Escape Test Agent's Report") but it doesn't exist β this is a gap in the security review pipeline. For a security firewall repository, continuous adversarial escape testing is uniquely domain-relevant. This workflow would try known bypass techniques (DNS tunneling, HTTP CONNECT abuse, IPv6 bypass, localhost tricks) and report on which ones are properly blocked.
How: A daily scheduled workflow using bash: true that runs actual awf commands with various bypass attempts inside the container, checks squid logs, and reports success/failure per technique.
Effort: Medium
Unique to this repo: No other repository type can benefit from this as directly as a network firewall tool. Each test run validates real security invariants.
P1.2: Workflow Health Monitor (Meta-Agent)
What: A weekly meta-agent that reviews all other agentic workflow runs and creates a health report with issues for unhealthy agents
Why: The factory learned that meta-agents are incredibly valuable. Currently there's no observability on the 28 workflows themselves β nobody is watching the watchers. The duplicate discussion problem (#1111/#1106, etc.) would be caught automatically. Secret Digger failures (#1107, #1105) linger as issues but there's no systematic health check.
How: Weekly scheduled workflow using agentic-workflows tool to inspect recent runs of all workflows, identify failure rates, duplicate outputs, and cost anomalies. Creates issues for unhealthy workflows.
Effort: LowβMedium
---
on:
schedule: weekly
tools:
agentic-workflows:
github:
toolsets: [default, actions]
cache-memory: true
safe-outputs:
create-discussion:
title-prefix: "[Workflow Health] "
create-issue:
title-prefix: "[Workflow Health] "
labels: [agentic-workflows]
max: 5
P1.3: Breaking Change Checker
What: On each PR, detect backward-incompatible CLI changes (removed flags, changed defaults, renamed options, Docker API changes)
Why: AWF is a distributed CLI tool consumed by users who script it. Breaking changes in --allow-domains semantics, flag names, or Docker compose configuration need early detection. The factory uses this pattern with a 100% causal chain merge rate. Recent PRs adding --build-local, changing --image-tag behavior, and adding API proxy ports are exactly the type of changes this catches.
How: PR-triggered workflow that diffs src/cli.ts, src/types.ts, and containers/ against base branch, identifies potentially breaking changes, and comments on the PR.
Effort: Low
P2 β Consider for Roadmap
P2.1: Daily Malicious Code Scan
What: Daily scan of recent commits for suspicious patterns β obfuscated code, unusual network calls, hardcoded credentials, suspicious shell commands
Why: AWF runs as root with NET_ADMIN capability and accesses docker.sock. A supply chain compromise here would be particularly dangerous. The factory runs this daily in gh-aw. For a security-critical tool, this defensive layer is especially important.
Effort: Low (based on existing secret-digger pattern, just different analysis focus)
P2.2: Sub Issue Closer
What: Automatically close sub-issues when parent issues are resolved
Why: As issue-monster creates more Copilot SWE agent tasks, sub-issue tracking will accumulate stale closed/merged items. From the factory: "keeps the issue tracker clean."
Effort: Low
P2.3: Changeset Generator
What: On merging to main, analyze commits since last release and auto-generate a PR with version bump + CHANGELOG entry
Why: update-release-notes improves notes after a release is published, but there's no automation for preparing releases. The factory's Changeset workflow had a 78% merge rate across 28 proposed PRs. Given AWF releases container images via GHCR, having well-tracked version bumps matters.
Effort: Medium
P2.4: Fix skip-if-match for Discussion-Creating Workflows
What: Update ci-cd-gaps-assessment, security-review, and pelis-agent-factory-advisor to use better deduplication to avoid accumulating stale duplicate discussions/issues
Why: Currently 6 open duplicate issues (#1113/#1109, #1112/#1108, #1111/#1106). The skip-if-match queries need to match the title prefixes + date patterns.
Effort: Low β just adjust the skip-if-match queries in each workflow
P3 β Future Ideas
P3.1: Portfolio Analyst (Token Cost Optimizer)
What: Weekly analysis of workflow token usage and costs across all 28 workflows, identifying expensive agents and optimization opportunities
Why: With 28 workflows running daily/hourly/weekly, token costs accumulate. The factory found some agents were "way too chatty" with LLM calls. Secret-digger alone runs 3Γ per hour.
Effort: Low (read-only analysis)
P3.2: Weekly Issue & PR Summary
What: Weekly digest of repository activity β open issues, PR status, workflow health β posted as a discussion
Why: With automated agents creating many issues/PRs, maintainers need a curated weekly digest to stay informed without reading every individual output.
Effort: Low
P3.3: Contribution Guidelines Checker
What: On new PRs from external contributors, check that contribution guidelines (conventional commits, scope, PR title format) are followed and comment with guidance
Why: AWF enforces strict conventional commits (with a limited scope allowlist β cli, docker, squid, proxy, ci, deps). External contributors frequently get PR title check failures. An early-comment agent reduces frustration.
Effort: Low
π Maturity Assessment
Current Level: 4/5 β Advanced Factory
This is one of the most sophisticated agentic workflow setups outside the gh-aw factory itself. Strengths:
- β
28 compiled agentic workflows across all major categories
- β
Multi-engine support (Claude, Codex, Copilot)
- β
Domain-specific workflows (security-guard, smoke tests, secret-digger Γ 3)
- β
Good cascade design (ci-doctor β issues β issue-monster β PRs)
- β
Cache-memory usage for stateful agents
Target Level: 4.5/5 β Add meta-monitoring and triage
Gap Analysis:
- Add issue triage (P0) β improves issue quality entering issue-monster cascade
- Add workflow health monitor (P1) β closes the observability gap for 28 workflows
- Fix uncompiled workflows (operational) β pelis-advisor and test-coverage-improver aren't running
- Build the escape test agent (P1) β unique to this repo's security mission
π Comparison with Best Practices
| Best Practice |
This Repo |
Notes |
| Issue triage |
β |
Missing; all auto-created issues unlabeled |
| Fault investigation |
β
|
ci-doctor is excellent |
| Security compliance |
β
β
|
Above average β security-guard, security-review, secret-diggerΓ3 |
| Documentation sync |
β
|
doc-maintainer + cli-flag-consistency-checker |
| Meta-agent monitoring |
β |
No workflow health manager or audit workflows |
| Release automation |
β οΈ |
update-release-notes exists but no changeset generation |
| Code quality agents |
β |
No simplicity/refactoring/style agents |
| Interactive/ChatOps |
β
|
/plan slash command |
| Multi-engine testing |
β
β
|
Unique strength β smoke tests on 4 configs |
| Observability/metrics |
β |
No portfolio analyst or metrics collector |
What this repo does uniquely well: The triple-engine secret digger (running hourly on claude/codex/copilot) and the four-way smoke testing matrix are standout patterns not seen in the factory itself. The security-guard PR reviewer using Claude is particularly well-suited to this security-critical codebase.
Domain opportunity: A Firewall Escape Test Agent is uniquely valuable here β no other repository type can leverage this pattern. It would turn the firewall into its own test subject, continuously verifying security invariants.
Generated by Pelis Agent Factory Advisor Β· 2026-03-02
Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.
Generated by Pelis Agent Factory Advisor
π Executive Summary
gh-aw-firewallis a highly mature agentic workflow repository β among the most advanced outside the gh-aw factory itself β with 28 compiled agentic workflows spanning security, CI/CD, documentation, and multi-engine smoke testing. However, three actionable gaps stand out: no issue triage/labeling agent, no meta-agent to audit workflow health, and a missing Firewall Escape Test Agent that is referenced in the security review workflow but doesn't yet exist.π Patterns Learned from Pelis Agent Factory
From crawling the full Pelis blog series and exploring the
githubnext/agenticsreference repo, the key patterns are:π Current Agentic Workflow Inventory
build-test-{bun,cpp,deno,dotnet,go,java,node,rust}ci-cd-gaps-assessmentci-doctorcli-flag-consistency-checkerdependency-security-monitordoc-maintainerissue-duplication-detectorissue-monsterpelis-agent-factory-advisorplan/planslash commandsecret-digger-claude/codex/copilotsecurity-guardsecurity-reviewsmoke-{chroot,claude,codex,copilot}test-coverage-improverupdate-release-notesπ¨ Immediate Issues to Address
1. Two workflows are uncompiled (
pelis-agent-factory-advisor,test-coverage-improver).lock.ymlfiles, not the.mdfilesgh aw compile .github/workflows/test-coverage-improver.mdandgh aw compile .github/workflows/pelis-agent-factory-advisor.mdfollowed by the post-processing script2. Duplicate discussions accumulating
[CI/CD Assessment]has two open issues ([CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment β March 2026Β #1113 and [CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap AssessmentΒ #1109),[Security Review]has two ([Security Review] Daily Security Review and Threat Modeling β 2026-03-01Β #1112 and [Security Review] Daily Security Review and Threat Modeling β 2026-02-28Β #1108),[Pelis Agent Factory Advisor]has two ([Pelis Agent Factory Advisor] Agentic Workflow Maturity Report β Mar 2026Β #1111 and [Pelis Agent Factory Advisor] Agentic Workflow Maturity Report β Feb 2026Β #1106)skip-if-matchqueries in these workflows may need tuning; discussion titles include date suffixes which prevent deduplication3. Secret Digger failing for Codex + Copilot engines (#1107, #1105)
4. Three open Dependency PRs stacking without merge (#1114, #1110, #1104)
dependency-security-monitoris creating PRs faster than they're being merged; consider adding auto-merge for patch-level safe updates or a stale-PR cleanupπ Actionable Recommendations
P0 β Implement Immediately
P0.1: Issue Triage Agent
What: Automatically label incoming issues with appropriate categories (
bug,security,enhancement,documentation,question,good-first-issue)Why: Currently 10 open issues have zero labels, making the issue tracker hard to navigate. The issue-monster dispatches issues but skips unlabeled/un-triaged ones. A triage agent feeds better quality issues into the cascade. From the factory: issue triage is the "hello world" of agentic workflows with immediate, clear value.
How: Add a new
issue-triage.mdworkflow triggered onissues: [opened]withsafe-outputs: add-labelsandadd-comment. Uses codebase context to label issues by analyzing title + body.Effort: Low
P1 β Plan for Near-Term
P1.1: Firewall Escape Test Agent π₯
What: A dedicated daily agent that attempts to escape the AWF network firewall using known techniques and reports findings as a discussion
Why: The
security-review.mdworkflow already references this agent ("Read the Firewall Escape Test Agent's Report") but it doesn't exist β this is a gap in the security review pipeline. For a security firewall repository, continuous adversarial escape testing is uniquely domain-relevant. This workflow would try known bypass techniques (DNS tunneling, HTTP CONNECT abuse, IPv6 bypass, localhost tricks) and report on which ones are properly blocked.How: A daily scheduled workflow using
bash: truethat runs actualawfcommands with various bypass attempts inside the container, checks squid logs, and reports success/failure per technique.Effort: Medium
Unique to this repo: No other repository type can benefit from this as directly as a network firewall tool. Each test run validates real security invariants.
P1.2: Workflow Health Monitor (Meta-Agent)
What: A weekly meta-agent that reviews all other agentic workflow runs and creates a health report with issues for unhealthy agents
Why: The factory learned that meta-agents are incredibly valuable. Currently there's no observability on the 28 workflows themselves β nobody is watching the watchers. The duplicate discussion problem (#1111/#1106, etc.) would be caught automatically. Secret Digger failures (#1107, #1105) linger as issues but there's no systematic health check.
How: Weekly scheduled workflow using
agentic-workflowstool to inspect recent runs of all workflows, identify failure rates, duplicate outputs, and cost anomalies. Creates issues for unhealthy workflows.Effort: LowβMedium
P1.3: Breaking Change Checker
What: On each PR, detect backward-incompatible CLI changes (removed flags, changed defaults, renamed options, Docker API changes)
Why: AWF is a distributed CLI tool consumed by users who script it. Breaking changes in
--allow-domainssemantics, flag names, or Docker compose configuration need early detection. The factory uses this pattern with a 100% causal chain merge rate. Recent PRs adding--build-local, changing--image-tagbehavior, and adding API proxy ports are exactly the type of changes this catches.How: PR-triggered workflow that diffs
src/cli.ts,src/types.ts, andcontainers/against base branch, identifies potentially breaking changes, and comments on the PR.Effort: Low
P2 β Consider for Roadmap
P2.1: Daily Malicious Code Scan
What: Daily scan of recent commits for suspicious patterns β obfuscated code, unusual network calls, hardcoded credentials, suspicious shell commands
Why: AWF runs as root with NET_ADMIN capability and accesses docker.sock. A supply chain compromise here would be particularly dangerous. The factory runs this daily in gh-aw. For a security-critical tool, this defensive layer is especially important.
Effort: Low (based on existing
secret-diggerpattern, just different analysis focus)P2.2: Sub Issue Closer
What: Automatically close sub-issues when parent issues are resolved
Why: As issue-monster creates more Copilot SWE agent tasks, sub-issue tracking will accumulate stale closed/merged items. From the factory: "keeps the issue tracker clean."
Effort: Low
P2.3: Changeset Generator
What: On merging to main, analyze commits since last release and auto-generate a PR with version bump + CHANGELOG entry
Why:
update-release-notesimproves notes after a release is published, but there's no automation for preparing releases. The factory's Changeset workflow had a 78% merge rate across 28 proposed PRs. Given AWF releases container images via GHCR, having well-tracked version bumps matters.Effort: Medium
P2.4: Fix
skip-if-matchfor Discussion-Creating WorkflowsWhat: Update
ci-cd-gaps-assessment,security-review, andpelis-agent-factory-advisorto use better deduplication to avoid accumulating stale duplicate discussions/issuesWhy: Currently 6 open duplicate issues (#1113/#1109, #1112/#1108, #1111/#1106). The
skip-if-matchqueries need to match the title prefixes + date patterns.Effort: Low β just adjust the
skip-if-matchqueries in each workflowP3 β Future Ideas
P3.1: Portfolio Analyst (Token Cost Optimizer)
What: Weekly analysis of workflow token usage and costs across all 28 workflows, identifying expensive agents and optimization opportunities
Why: With 28 workflows running daily/hourly/weekly, token costs accumulate. The factory found some agents were "way too chatty" with LLM calls. Secret-digger alone runs 3Γ per hour.
Effort: Low (read-only analysis)
P3.2: Weekly Issue & PR Summary
What: Weekly digest of repository activity β open issues, PR status, workflow health β posted as a discussion
Why: With automated agents creating many issues/PRs, maintainers need a curated weekly digest to stay informed without reading every individual output.
Effort: Low
P3.3: Contribution Guidelines Checker
What: On new PRs from external contributors, check that contribution guidelines (conventional commits, scope, PR title format) are followed and comment with guidance
Why: AWF enforces strict conventional commits (with a limited scope allowlist β
cli, docker, squid, proxy, ci, deps). External contributors frequently get PR title check failures. An early-comment agent reduces frustration.Effort: Low
π Maturity Assessment
Current Level: 4/5 β Advanced Factory
This is one of the most sophisticated agentic workflow setups outside the gh-aw factory itself. Strengths:
Target Level: 4.5/5 β Add meta-monitoring and triage
Gap Analysis:
π Comparison with Best Practices
/planslash commandWhat this repo does uniquely well: The triple-engine secret digger (running hourly on claude/codex/copilot) and the four-way smoke testing matrix are standout patterns not seen in the factory itself. The security-guard PR reviewer using Claude is particularly well-suited to this security-critical codebase.
Domain opportunity: A Firewall Escape Test Agent is uniquely valuable here β no other repository type can leverage this pattern. It would turn the firewall into its own test subject, continuously verifying security invariants.
Generated by Pelis Agent Factory Advisor Β· 2026-03-02