Add smoke-test skill for extension models

## Problem statement

When developing extension models, AI agents frequently believe the model is correct after unit tests pass, but subtle corner-case bugs survive into pushes and extension publishes. These bugs are typically discovered only during manual smoke testing against live APIs — after the code has already been committed or published. Examples from real development sessions:

- **Content-Type mismatches**: v2 API required `application/vnd.api+json` but the model used `application/json` — unit tests with stubbed fetch didn't catch this
- **Stale bundle caching**: Source fixes weren't reflected at runtime because `.swamp/bundles/*.js` wasn't cleared — agents didn't know about this caching layer
- **API validation quirks**: Honeycomb boards require `type: "flexible"` in the body, not just a `name` — only discovered during live create
- **delete_protected defaults**: Honeycomb creates environments with `delete_protected: true`, making delete fail unless update is called first
- **Read-only resource guards**: Attempting create/update/delete on read-only resources like `dataset-definitions` or `auth` should be rejected before making API calls

These are the kinds of issues that unit tests with mocked responses can't catch, but a structured smoke-test protocol would.

## Proposed solution

A `swamp-smoke-test` skill that agents can invoke (or that hooks trigger automatically) before `git push`, `swamp extension push`, or similar publish actions. The skill would:

1. **Discover the extension's method surface**: Parse the model to enumerate all methods × resource types × argument combinations
2. **Generate a smoke-test plan**: For each method, identify:
   - Safe read-only operations (GET/list) that can run against live APIs without side effects
   - CRUD cycle candidates: resources that can be safely created, updated, and deleted (with unique test names to avoid collisions)
   - Error-path tests: missing required args, read-only resource rejection, invalid auth
   - Corner cases specific to the API: required fields beyond `name`, default flags that block deletion, etc.
3. **Execute the plan**: Run each test via `swamp model method run`, verify success/failure matches expectations
4. **Report results**: Produce a structured summary table (method × resource × result) suitable for PR descriptions
5. **Clean up**: Ensure all created test resources are deleted, even if intermediate steps fail

### Key design considerations

- The skill should be **API-aware but not API-specific** — it reads the model's method schemas and resource registry to generate tests, rather than hard-coding per-service knowledge
- It should **never touch pre-existing resources** — all created resources use unique names (e.g. `smoke-test-{resource}-{timestamp}`)
- It should handle **permission errors gracefully** — a 401 on `slos` because the key lacks permission is not a test failure, it's an expected constraint
- Bundle cache clearing (`.swamp/bundles/`) should be part of the pre-test setup
- The skill could optionally integrate with git hooks to block pushes when smoke tests fail

## Alternatives considered

- **Manual smoke testing**: Current approach — works but is tedious, error-prone, and depends on the agent remembering to do it
- **Enhanced unit tests**: Better mocks could catch some issues, but can't catch Content-Type mismatches, bundle caching, or API validation quirks that only surface with real HTTP calls
- **CI-based integration tests**: Would require live API credentials in CI, which adds secret management complexity

## Additional context

This was motivated by developing the `@bixu/honeycomb` extension, where multiple bugs survived unit tests and were only caught during manual smoke testing sessions. The pattern of "agent thinks it's done → smoke test reveals bugs → fix → re-test" repeated across several sessions. A skill that codifies this testing protocol would catch these issues earlier and more consistently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add smoke-test skill for extension models #633

Problem statement

Proposed solution

Key design considerations

Alternatives considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add smoke-test skill for extension models #633

Description

Problem statement

Proposed solution

Key design considerations

Alternatives considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions