Skip to content

v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay#53

Open
jpr5 wants to merge 13 commits intomainfrom
feat/v1.6.0-subspec1
Open

v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay#53
jpr5 wants to merge 13 commits intomainfrom
feat/v1.6.0-subspec1

Conversation

@jpr5
Copy link
Contributor

@jpr5 jpr5 commented Mar 20, 2026

Summary

Major feature release adding 8 capabilities to llmock. Rebased on latest main (includes parallel test expansion + provider error format fixes).

Provider Endpoints

  • Bedrock Streaminginvoke-with-response-stream (AWS Event Stream binary) + Converse API
  • Vertex AI — Routes to existing Gemini handler
  • Ollama/api/chat, /api/generate, /api/tags (NDJSON streaming)
  • Cohere/v2/chat (typed SSE events)

Infrastructure

  • Chaos Testing — Probabilistic drop/malformed/disconnect, three precedence levels
  • Prometheus Metrics — Opt-in /metrics, counters, cumulative histograms, gauges

Record-and-Replay

  • Proxy-on-miss — Real API responses saved as fixtures
  • Stream collapsing — 6 functions (SSE, NDJSON, EventStream)
  • Strict mode (503) — Catch missing fixtures in CI
  • Auth safety — Forwarded but redacted in journal, never in fixtures

Quality

  • 1243 tests across 37 files
  • Build/format/lint clean, zero external dependencies

Review Fixes Applied

All 20 findings from the code review have been addressed:

Critical (6 fixed)

  1. HandlerDefaults type extracted — Replaced 12+ inline type declarations with shared interface. All handlers now have access to record/strict/registry fields.
  2. Systematic defaults type narrowing — Fixed across handleBedrock, handleResponses, handleMessages, handleGemini, handleEmbeddings.
  3. Chaos metrics registry gap — Fixed by chore: release 0.1.0 #1 (all handlers now receive registry via HandlerDefaults).
  4. Recorder binary relay corruption — Uses raw Buffer for EventStream relay instead of UTF-8 string.
  5. collapseOllamaNDJSON tool_calls — Now accumulates message.tool_calls from stream chunks.
  6. buildFixtureResponse Ollama tool_calls — Checks toolCalls before empty content, with null guard on tc.function.

Important (6 fixed)

  1. SKILL.md: --strict returns 503, not 404
  2. Recorder: X-LLMock-Record-Error header on write failure
  3. ChaosAction type deduplicated (moved to types.ts)
  4. bedrock.ts docstring updated (no longer overclaims /converse)
  5. README provider list updated (all 8 providers)
  6. types.ts file header updated

Suggestions (6 fixed)

  1. Chaos rates clamped to [0,1] after merging
  2. RecordProviderKey string union for typed provider keys
  3. collapseCohereSS renamed to collapseCohereSSE
  4. Recorder auth header comment fixed
  5. SKILL.md proxy log level corrected (warn, not info)
  6. OllamaMessage.role typed as union

Tests added (9 new)

  1. Bedrock strict mode (503), Ollama NDJSON tool_calls collapse, writeNDJSONStream latency, Cohere streaming tool calls, recorder binary EventStream relay integrity

Additional CR fixes

  • rawServer leak in recorder.test.ts (afterEach cleanup)
  • Unused imports removed across 9 handler files

Commits (8)

  1. Original: housekeeping
  2. Original: all source code (24 files)
  3. Original: all tests (14 files)
  4. Original: skill and README updates
  5. refactor: HandlerDefaults type, ChaosAction dedup, RecordProviderKey, OllamaMessage.role, unused imports
  6. fix: Binary relay, Ollama tool_calls, chaos clamping, collapseCohereSSE rename, recorder improvements
  7. test: Strict mode, tool_calls collapse, latency, binary relay, afterEach cleanup
  8. docs: Status code, log level, provider list fixes

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 20, 2026

Open in StackBlitz

npm i https://pkg.pr.new/CopilotKit/llmock/@copilotkit/llmock@53

commit: f858fd2

@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch from f858fd2 to 37cdc75 Compare March 20, 2026 18:24
@jpr5 jpr5 changed the title v1.6.0 Sub-spec 1: Bedrock Streaming, Ollama, Cohere, Vertex AI, Chaos Testing, Prometheus Metrics v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay Mar 20, 2026
@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch from dd5effc to 00efb4b Compare March 20, 2026 22:23
jpr5 added 4 commits March 20, 2026 15:56
New provider endpoints: Ollama, Cohere, Vertex AI, Bedrock Converse,
Bedrock streaming (invoke-with-response-stream).

New features: Prometheus metrics (/metrics), record-and-replay proxy,
strict mode (503 on no-match), stream collapse, AWS EventStream binary
framing, NDJSON writer, auth header redaction.

Wire up missing imports (proxyAndRecord, createMetricsRegistry,
normalizePathLabel), add RecordConfig type, expand MockServerOptions
and ServerInstance.defaults with metrics/strict/record fields, add
optional registry param to applyChaos for chaos counter tracking,
add enableRecording/disableRecording to LLMock class, remove unused
RecordConfig imports, deduplicate ChaosConfig import in CLI.
Tests for: Bedrock streaming, Bedrock Converse, AWS EventStream binary
framing, Ollama chat/generate, Cohere v2 chat, Vertex AI, metrics
endpoint, record-and-replay proxy, stream collapse, strict mode, and
multi-provider recording.
@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch from 00efb4b to 1feef78 Compare March 20, 2026 22:58
Restore --record, --strict, --metrics, --provider-* CLI flags that were
lost during commit regrouping. Restore getter-based defaults (get chaos(),
get record(), get strict()) for live config propagation. Remove direct
defaults mutation in setChaos/clearChaos/enableRecording/disableRecording
since getters read from the options object directly.
@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch from f7e8ef7 to 711a600 Compare March 21, 2026 00:18
@jpr5
Copy link
Contributor Author

jpr5 commented Mar 21, 2026

PR (53) Review — CopilotKit/llmock (2026-03-21)

Critical Issues

(1) handleBedrock defaults type mismatchbedrock.ts:247 declares its defaults parameter as { latency: number; chunkSize: number; logger: Logger; chaos?: ChaosConfig } but the function body accesses defaults.record (line 320), defaults.strict (lines 342–347), and passes defaults to proxyAndRecord (line 328) which expects record? and logger. Missing record?: RecordConfig, strict?: boolean, registry?: MetricsRegistry. Works at runtime only because JavaScript ignores type annotations, but the type contract is incorrect and would fail under strict TypeScript settings.

(2) Systematic defaults type narrowing across 5 handlers — The same bug as (1) is replicated in handleResponses (responses.ts:502), handleMessages (messages.ts:434), handleGemini (gemini.ts:382), and handleEmbeddings (embeddings.ts:43). All five handlers access defaults.record and defaults.strict despite those properties not existing on their declared type. Contrast with handleBedrockStream (line 556), handleConverse, handleConverseStream, handleOllama, handleOllamaGenerate, and handleCohere, which all correctly declare the full type.

(3) applyChaos registry gap — chaos metrics silently lost for 5 handlers — Because handleBedrock, handleResponses, handleMessages, handleGemini, and handleEmbeddings do not have registry in their narrowed defaults type, their calls to applyChaos pass undefined for the registry parameter. The llmock_chaos_triggered_total Prometheus counter is never incremented for these endpoints even when metrics are enabled. Chaos metrics are incomplete — only the newer handlers (Ollama, Cohere, Bedrock streaming, Converse) report chaos events.

(4) Recorder binary relay corrupts Bedrock EventStream datarecorder.ts line 67 declares upstreamBody: string and line 210 converts binary buffers via rawBuffer.toString() (utf-8 default). Line 177 relays via res.end(upstreamBody). Binary EventStream frames contain CRC32 checksums and binary-encoded lengths that are corrupted by utf-8 round-tripping. The collapse path (line 95) correctly uses the raw buffer, but the direct relay path sends corrupted data to the client.

(5) collapseOllamaNDJSON ignores tool_callsstream-collapse.ts lines 270–297 only extracts message.content and response fields, never handling message.tool_calls. Ollama streaming tool call responses would not be collapsed correctly in the recorder path. The test suite also has no coverage for this case.

(6) buildFixtureResponse returns empty TextResponse for Ollama tool callsrecorder.ts buildFixtureResponse checks content before toolCalls. For Ollama responses that include both message.content: "" (empty string) and message.tool_calls: [...], the empty string is non-null and matches the text content path first, returning { content: "" } and silently discarding the tool calls.


Important Issues

(7) SKILL.md documents wrong status code for --strictskills/write-fixtures/SKILL.md line 434 states "--strict returns a 404 error for unmatched requests." The code actually returns 503 (const strictStatus = defaults.strict ? 503 : 404 in every handler). types.ts line 233 correctly documents 503.

(8) Recorder filesystem write failures not propagated — When fs.writeFileSync fails in the recorder (disk full, permissions, etc.), the error is caught and logged but the HTTP response to the client still succeeds. The client has no indication that the fixture was not saved.

(9) Duplicated ChaosAction unionJournalEntry.response.chaosAction in types.ts line 159 inlines "drop" | "malformed" | "disconnect" rather than importing the ChaosAction type from chaos.ts. If a new chaos action is added to one location, the other could silently diverge.

(10) bedrock.ts module docstring incomplete — The docstring says "AWS Bedrock Claude invoke endpoint support" and describes only the non-streaming /model/{modelId}/invoke format, but the file also exports handleBedrockStream, buildBedrockStreamTextEvents, and buildBedrockStreamToolCallEvents for the streaming endpoint.

(11) README stale provider list — Line 48 says "(OpenAI, Claude, Gemini)" but this PR adds Bedrock, Vertex AI, Ollama, and Cohere support.

(12) types.ts file header stale — Line 1 comment says "OpenAI Chat Completion request types (subset we care about)" but the file now defines types for chaos, recording, metrics, streaming profiles, fixture matching, journal entries, and server options.


Suggestions

(13) Extract shared HandlerDefaults type — Replace 12+ inline type declarations across all handler functions with a single exported interface. This fixes (1)–(3) and prevents future divergence:

export interface HandlerDefaults {
  latency: number;
  chunkSize: number;
  logger: Logger;
  chaos?: ChaosConfig;
  registry?: MetricsRegistry;
  strict?: boolean;
  record?: RecordConfig;
}

(14) ChaosConfig lacks range validation — No validation that probability values are in [0, 1]. A header like x-llmock-chaos-drop: 50 silently sets a 5000% drop rate (always triggers). Consider a validation helper at server startup or header parse time.

(15) RecordConfig provider keys are untypedRecord<string, string | undefined> accepts any string key, but the system only recognizes specific provider names. Consider a string union.

(16) collapseCohereSS naming inconsistency — Missing trailing "E" for "SSE" consistency with collapseOpenAISSE, collapseAnthropicSSE, collapseGeminiSSE. This is a public export from index.ts.

(17) recorder.ts misleading comment — Line 151 says auth headers are "in the match/response, not headers" but they are simply excluded entirely from fixtures.

(18) SKILL.md log level incorrect — Line 444 says "every proxy hit logs at info level" but the code (recorder.ts line 50) uses logger.warn.

(19) OllamaMessage.role typed as string — Should be "system" | "user" | "assistant" | "tool" to match other provider message types.

(20) Missing test coverage gaps — No tests for: non-streaming Bedrock strict/record mode with full defaults, Ollama NDJSON tool_calls collapse, writeNDJSONStream latency option, Cohere tool_calls streaming, recorder binary CRC integrity of relayed EventStream frames.

Copy link
Contributor Author

@jpr5 jpr5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See most recent comment for the full review.

jpr5 added 5 commits March 20, 2026 23:52
…licate ChaosAction

- Create shared HandlerDefaults interface replacing 12+ inline type declarations
- All handlers now have access to record, strict, registry fields (fixes silent undefined access)
- Move ChaosAction type to types.ts to eliminate inline duplication in JournalEntry
- Add RecordProviderKey string union for typed provider keys
- Type OllamaMessage.role as union instead of bare string
- Remove unused imports across all handler files
- Fix bedrock.ts docstring to not overclaim /converse endpoints
…amping

- Use raw Buffer for binary EventStream relay instead of UTF-8 string (prevents CRC corruption)
- buildFixtureResponse checks toolCalls before empty content for Ollama responses
- Add null guard on tc.function in Ollama tool_calls extraction
- collapseOllamaNDJSON accumulates message.tool_calls from stream chunks
- Rename collapseCohereSS to collapseCohereSSE for naming consistency
- Clamp chaos rates to [0,1] after merging all override levels
- Add X-LLMock-Record-Error header when fixture write fails
- Fix auth header comment in recorder
…ary relay

- Bedrock strict mode returns 503 for unmatched requests
- Ollama NDJSON tool_calls collapse (single, priority, multiple)
- writeNDJSONStream with non-zero latency
- Cohere streaming tool calls with fixture-provided IDs
- Recorder binary EventStream relay integrity with afterEach cleanup
- collapseCohereSSE rename in test references
Five handlers (handleBedrock, handleGemini, handleMessages,
handleResponses, handleEmbeddings) were missing the registry
argument, causing chaos metrics to not be recorded for those
endpoints.
@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch from 96efe89 to 2983185 Compare March 21, 2026 07:00
jpr5 added 3 commits March 21, 2026 00:01
The recorder's buildFixtureResponse had no handler for the Converse
format ({ output: { message: { content: [...] } } }), causing recorded
fixtures to silently be saved as error responses. Add handler for both
text and toolUse content blocks.
- Replace module-level mutable recordCounter with crypto.randomUUID()
  to avoid non-deterministic filenames in concurrent test scenarios
- Pass original request body string to proxyAndRecord in the OpenAI
  completions path, preserving formatting fidelity to upstream
- Recorder: proxy preserves original request body formatting
- Recorder: Ollama empty content + tool_calls priority in buildFixtureResponse
- Recorder: UUID-based filename format
- Chaos: rate clamping (>1 clamps to 1, negative clamps to 0)
- Metrics: chaos counter incremented on Anthropic endpoint (was broken)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant