Skip to content

incident-management: tighten IR template structure and pipeline runbook#424

Open
frameworks-volunteer wants to merge 2 commits intosecurity-alliance:developfrom
frameworks-volunteer:matta/ir-template-tightening
Open

incident-management: tighten IR template structure and pipeline runbook#424
frameworks-volunteer wants to merge 2 commits intosecurity-alliance:developfrom
frameworks-volunteer:matta/ir-template-tightening

Conversation

@frameworks-volunteer
Copy link
Contributor

Summary

This PR is a first pass on the recently added Incident Response Template section.

The goal is not to expand the section broadly, but to make it clearer, tighter, and more operationally credible without adding filler or speculative content.

This pass focuses on three things:

  1. clarifying the distinction between framework guidance, templates, runbooks, and playbooks
  2. reducing a few over-absolute statements
  3. upgrading the weakest runbook in the set (build-pipeline-compromise) into something more responder-oriented

What changed

1) Clarified content taxonomy

Added concise framing so readers can understand what each layer is for:

  • incident-management/overview.mdx
  • clarifies that the section now contains both:
  • framework guidance
  • operational templates
  • incident-management/playbooks/overview.mdx
  • reframes playbooks as reference material, not drop-in internal operating procedures
  • points readers to the template/runbook sections for copy-and-adapt operational docs
  • incident-response-template/overview.mdx
  • clarifies that the broader incident-management pages explain concepts/practices
  • clarifies that the template section is intended to be copied/customized for internal use
  • distinguishes:
  • policy / roles / communications / contacts
  • templates
  • runbooks
  • incident-response-template/templates/overview.mdx
  • clarifies when to use templates vs runbooks vs policy pages
  • incident-response-template/runbooks/overview.mdx
  • clarifies that runbooks are operational procedures, distinct from framework playbooks and blank templates

2) Tightened a few absolute statements

  • incident-response-template/incident-response-policy.mdx
  • changed:
  • "Monitor for at least a week"
  • to:
  • "Monitor based on residual risk, blast radius, and incident type"
  • incident-response-template/roles-and-staffing.mdx
  • changed:
  • "These people should be reachable 24/7"
  • to:
  • "There should be a 24/7 escalation path to these people"

These changes are meant to make the guidance more realistic and less doctrinal.

3) Upgraded the build pipeline compromise runbook

incident-response-template/runbooks/build-pipeline-compromise.mdx was previously a thin stub. This PR upgrades it into a more credible example runbook by adding:

  • better identification criteria
  • scope questions
  • differentiation from adjacent incident classes
  • immediate actions that reflect actual responder priorities:
  • freeze pipeline
  • preserve evidence
  • rotate credentials by blast radius
  • stop trusting recent outputs
  • investigation questions focused on access path, permissions, credential exposure, and affected outputs
  • containment / recovery options:
  • rebuild from known-good commit using clean pipeline
  • rollback to last known-good release
  • keep service paused until trust is re-established
  • a verification gate before normal delivery resumes
  • a concise hardening checklist after the incident

What this PR does not do

Intentionally out of scope for this first pass:

  • broad content expansion
  • adding new Web3-specific runbooks just to fill gaps
  • renaming sections or restructuring the sidebar deeply
  • inventing protocol-specific operational steps without high confidence

I would rather leave gaps visible than fill them with weak or speculative guidance.

Why this scope

The Incident Response Template addition is already valuable, but right now it mixes:

  • framework/reference material
  • internal templates
  • runbooks

This first pass tries to make that structure easier to understand, while also strengthening one page that felt materially underdeveloped.

Follow-up ideas (not included here)

Possible future passes, if useful:

  • strengthen frontend-compromise and dependency-attack
  • add battle-tested Web3-native scenarios only where confidence is high
  • revisit naming/IA if the team wants clearer labels than the current playbook/runbook/template split

@github-actions
Copy link

github-actions bot commented Mar 21, 2026

built with Refined Cloudflare Pages Action

⚡ Cloudflare Pages Deployment

Name Status Preview Last Commit
frameworks ✅ Ready (View Log) Visit Preview 56913d1

@frameworks-volunteer
Copy link
Contributor Author

Second pass update

This second pass keeps the same philosophy as the first one:

  • no broad expansion
  • no filler
  • no speculative Web3-specific guidance
  • only tighten pages where the operational value is clear and high-confidence

What changed in this pass

This pass focuses on the two runbooks that still felt materially underpowered:

  • incident-response-template/runbooks/frontend-compromise.mdx
  • incident-response-template/runbooks/dependency-attack.mdx

1) Strengthened frontend-compromise

This page now better reflects how frontend incidents actually behave in practice, especially in Web3 where a frontend compromise often becomes a user-signing or
approval-theft incident very quickly.

Changes include:

  • clearer identification and scope questions
  • stronger focus on stopping service quickly
  • explicit emphasis on warning users early and clearly
  • preserving evidence before cleanup
  • tighter framing around identifying the real trust-boundary failure:
  • DNS
  • CDN/hosting
  • dependency
  • build pipeline
  • improved recovery conditions before restoring service
  • more practical affected-user support guidance

The goal here was to make the page more useful during the first minutes of an actual incident, not just more complete on paper.

2) Strengthened dependency-attack

This page was still too close to a stub. It now better distinguishes between a generic vulnerable package and a dependency incident that may have affected real build
outputs, releases, or users.

Changes include:

  • better scope questions:
  • production vs build-only exposure
  • build-time vs runtime execution
  • possible credential / artifact impact
  • clearer differentiation from:
  • frontend compromise
  • build pipeline compromise
  • stronger immediate actions:
  • freeze releases
  • identify the exact package/version path
  • stop trusting recent outputs
  • preserve evidence
  • improved investigation questions
  • more credible containment / recovery options
  • a verification gate before resuming normal delivery
  • tighter prevention guidance focused on dependency discipline and build trust

What I intentionally did not change

Still intentionally out of scope:

  • adding new runbooks just to close every possible gap
  • speculative guidance for scenarios that need deeper expertise or stronger repo context
  • touching pages that did not clearly benefit from high-confidence tightening

For example, I left key-compromise unchanged in this pass rather than make lower-confidence edits.

Why this is the last pass

At this point, the highest-value weak spots in the imported IR template section have been addressed without turning the PR into a broad rewrite.

This keeps the contribution focused on:

  • clearer information architecture
  • more realistic wording
  • stronger responder-oriented runbooks where they were obviously too thin

@mattaereal mattaereal requested a review from scode2277 March 22, 2026 00:54
@mattaereal mattaereal self-assigned this Mar 22, 2026
- [ ] Consider using a private registry
- [ ] Pin exact versions for critical packages
- [ ] Review dependency changes in PRs
- [ ] Use deterministic install commands in CI (`npm ci`, `pnpm install --frozen-lockfile`, etc.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in CI environments, pnpm install runs with the frozen lockfile by default

Comment on lines +28 to +29
- **Playbooks** elsewhere in the framework, which are reference guidance
- **Templates**, which are blank working documents to fill out during or after an incident
Copy link
Collaborator

@scode2277 scode2277 Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Playbooks** elsewhere in the framework, which are reference guidance
- **Templates**, which are blank working documents to fill out during or after an incident
- **[Playbooks](/incident-management/playbooks/overview)**, which are reference guidance for handling specific types of security incidents
- **[Templates](/incident-management/incident-response-template/templates/overview)**, which are blank working documents to fill out during or after an incident

Comment on lines +28 to +30
- use a template when you need to create a new incident record or post-mortem
- use a runbook when you need scenario-specific response steps
- use the policy pages when you need process, roles, or communication expectations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- use a template when you need to create a new incident record or post-mortem
- use a runbook when you need scenario-specific response steps
- use the policy pages when you need process, roles, or communication expectations
- use a [template](/incident-management/incident-response-template/templates/overview) when you need to create a new incident record or post-mortem
- use a [runbook](/incident-management/incident-response-template/runbooks/overview) when you need scenario-specific response steps
- use the [policy page](/incident-management/incident-response-template/incident-response-policy) when you need process, roles, or communication expectations

Comment on lines +83 to +84
- **Incident Management pages** explain concepts and practices
- **Incident Response Template pages** are meant to be copied, customized, and used internally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Incident Management pages** explain concepts and practices
- **Incident Response Template pages** are meant to be copied, customized, and used internally
- **[Incident Management](/incident-management/overview)** explain concepts and practices
- **[Incident Response Templates](/incident-management/incident-response-template/overview)** are meant to be copied, customized, and used internally

Comment on lines +88 to +90
- **Policy / roles / communications / contacts** define your operating model
- **Templates** are blank working documents to fill out during or after incidents
- **Runbooks** are scenario-specific response procedures
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Policy / roles / communications / contacts** define your operating model
- **Templates** are blank working documents to fill out during or after incidents
- **Runbooks** are scenario-specific response procedures
- **[Policy](/incident-management/incident-response-template/incident-response-policy) / [roles and staffing](/incident-management/incident-response-template/roles-and-staffing) / [communications](/incident-management/incident-response-template/communications) / [contacts](/incident-management/incident-response-template/contacts)** define your operating model
- **[Templates](/incident-management/incident-response-template/templates/overview)** are blank working documents to fill out during or after incidents
- **[Runbooks](/incident-management/incident-response-template/runbooks/overview)** are scenario-specific response procedures

Comment on lines +28 to +35
This section contains two different kinds of content:

- **Framework guidance**: explanatory pages on communication, detection/response, lessons learned, and reference playbooks
- **Operational templates**: copy-and-adapt incident response documents, templates, and runbooks for internal team use

Use the framework guidance to understand the discipline. Use the incident response template section to build your own
operational documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This section contains two different kinds of content:
- **Framework guidance**: explanatory pages on communication, detection/response, lessons learned, and reference playbooks
- **Operational templates**: copy-and-adapt incident response documents, templates, and runbooks for internal team use
Use the framework guidance to understand the discipline. Use the incident response template section to build your own
operational documentation.
This framework contains two different kinds of content:
- **Incident Management core knowledge**: explanatory pages on communication, detection/response, lessons learned, and reference playbooks
- **Incident Response templates**: copy-and-adapt incident response documents, templates, and runbooks for internal team use
Use the core knowledge to understand the discipline. Use the incident response template section to build your own
operational documentation.

@scode2277
Copy link
Collaborator

Left some comments @mattaereal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants