Skip to content

Add smoke-test skill for extension models #633

@bixu

Description

@bixu

Problem statement

When developing extension models, AI agents frequently believe the model is correct after unit tests pass, but subtle corner-case bugs survive into pushes and extension publishes. These bugs are typically discovered only during manual smoke testing against live APIs — after the code has already been committed or published. Examples from real development sessions:

  • Content-Type mismatches: v2 API required application/vnd.api+json but the model used application/json — unit tests with stubbed fetch didn't catch this
  • Stale bundle caching: Source fixes weren't reflected at runtime because .swamp/bundles/*.js wasn't cleared — agents didn't know about this caching layer
  • API validation quirks: Honeycomb boards require type: "flexible" in the body, not just a name — only discovered during live create
  • delete_protected defaults: Honeycomb creates environments with delete_protected: true, making delete fail unless update is called first
  • Read-only resource guards: Attempting create/update/delete on read-only resources like dataset-definitions or auth should be rejected before making API calls

These are the kinds of issues that unit tests with mocked responses can't catch, but a structured smoke-test protocol would.

Proposed solution

A swamp-smoke-test skill that agents can invoke (or that hooks trigger automatically) before git push, swamp extension push, or similar publish actions. The skill would:

  1. Discover the extension's method surface: Parse the model to enumerate all methods × resource types × argument combinations
  2. Generate a smoke-test plan: For each method, identify:
    • Safe read-only operations (GET/list) that can run against live APIs without side effects
    • CRUD cycle candidates: resources that can be safely created, updated, and deleted (with unique test names to avoid collisions)
    • Error-path tests: missing required args, read-only resource rejection, invalid auth
    • Corner cases specific to the API: required fields beyond name, default flags that block deletion, etc.
  3. Execute the plan: Run each test via swamp model method run, verify success/failure matches expectations
  4. Report results: Produce a structured summary table (method × resource × result) suitable for PR descriptions
  5. Clean up: Ensure all created test resources are deleted, even if intermediate steps fail

Key design considerations

  • The skill should be API-aware but not API-specific — it reads the model's method schemas and resource registry to generate tests, rather than hard-coding per-service knowledge
  • It should never touch pre-existing resources — all created resources use unique names (e.g. smoke-test-{resource}-{timestamp})
  • It should handle permission errors gracefully — a 401 on slos because the key lacks permission is not a test failure, it's an expected constraint
  • Bundle cache clearing (.swamp/bundles/) should be part of the pre-test setup
  • The skill could optionally integrate with git hooks to block pushes when smoke tests fail

Alternatives considered

  • Manual smoke testing: Current approach — works but is tedious, error-prone, and depends on the agent remembering to do it
  • Enhanced unit tests: Better mocks could catch some issues, but can't catch Content-Type mismatches, bundle caching, or API validation quirks that only surface with real HTTP calls
  • CI-based integration tests: Would require live API credentials in CI, which adds secret management complexity

Additional context

This was motivated by developing the @bixu/honeycomb extension, where multiple bugs survived unit tests and were only caught during manual smoke testing sessions. The pattern of "agent thinks it's done → smoke test reveals bugs → fix → re-test" repeated across several sessions. A skill that codifies this testing protocol would catch these issues earlier and more consistently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions