feat(proxy): add cache diagnostic logging for Anthropic requests#1891
feat(proxy): add cache diagnostic logging for Anthropic requests#1891alex-alecu wants to merge 6 commits intomainfrom
Conversation
Add logCacheDiagnostics() that computes a structured diagnostic payload for Anthropic chat_completions requests with tools. Logs prefix hash, breakpoint position, message structure, and body hash to enable detection of prefix drift causing cache misses.
Wire logCacheDiagnostics() into the gateway route handler after all body mutations (tracking IDs, reasoning dedup, cache breakpoints) and before forwarding upstream, so the [CacheDiag] log captures the exact request state sent to the provider.
Add [CacheDiag:response] log blocks in processTokenData() for Anthropic chat_completions with tools. Logs cache hit/write/input tokens and cost from both generation lookup and inline-only paths, enabling correlation with the pre-request [CacheDiag] prefix hash to diagnose cache misses.
Add test-cache-diag.ts script that makes multi-turn streaming requests with realistic tools and system prompt to exercise the [CacheDiag] logging path. Wire up the server-only shim in the script runner so CLI scripts can import modules that transitively depend on server-only.
| upstreamId: usageStats.upstream_id, | ||
| inputTokens: usageStats.inputTokens, | ||
| cacheHitTokens: usageStats.cacheHitTokens, | ||
| cacheWriteTokens: usageStats.cacheWriteTokens, |
There was a problem hiding this comment.
WARNING: Inline fallback always reports zero cache writes
On the source: 'inline' path this logs usageStats.cacheWriteTokens, but processOpenRouterUsage() still hardcodes that field to 0 and OpenRouterUsage does not read prompt_tokens_details.cache_write_tokens. When fetchGeneration() misses, this diagnostic will hide non-zero cache writes instead of surfacing them.
| ((part as Record<string, unknown>).cache_control as Record<string, unknown>).type === | ||
| 'ephemeral' | ||
| ); | ||
| breakpointContentLength = JSON.stringify(content).length; |
There was a problem hiding this comment.
WARNING: contentLen can be wrong when no breakpoint is found
breakpointContentLength is updated before you know whether the current message actually has cache_control. If a request reaches this logger without any breakpoint, the payload ends up with index: -1 / role: '<none>' but a non-zero contentLen copied from the last inspected message, which makes the diagnostic misleading.
Code Review SummaryStatus: 4 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Other Observations (not in diff)N/A Files Reviewed (1 file)
Fix these issues in Kilo Cloud Reviewed by gpt-5.4-20260305 · 479,970 tokens |
…cking Add 'compare' mode that runs the same test against both amazon-bedrock and anthropic providers back-to-back. Extract cache_write_tokens from the SSE stream (now available from OpenRouter). Accept optional provider arg to force routing via provider.only.
Cache diagnostic test results (2026-04-02)Test 1: Gateway proxy — Bedrock vs Anthropic directRan Amazon Bedrock (
Anthropic direct (
Both providers return identical zero cache tokens. Test 2: Direct OpenRouter API (bypassing our gateway)Sent
4,297 tokens is well above the 1,024 minimum for Opus models (and above 4,096 in case Opus 4.6 has a higher threshold like Opus 4.5). Test 3: OpenRouter Messages API (
|
| Test | Input tokens | Provider | Cache Creation | Cache Read |
|---|---|---|---|---|
| Turn 1 | 2,111 | Anthropic | 0 | 0 |
| Turn 2 (same prefix) | 2,123 | Anthropic | 0 | 0 |
Conclusion
OpenRouter does not forward cache_control annotations to upstream providers in either the Chat Completions or Messages API formats. This was tested:
- Through our gateway and directly against OpenRouter's API
- With both per-block
cache_control(on content parts) and top-levelcache_control - With both Bedrock and Anthropic direct routing
- With prompts from 1,531 to 4,297 input tokens
The zero cache tokens are NOT caused by our gateway code — addCacheBreakpoints() correctly sets the breakpoints, but they are stripped or ignored by OpenRouter before reaching the upstream provider.
Open question: If production sessions show cache savings, they must come from a different mechanism (e.g. OpenRouter's internal prompt_cache_key field, session_id sticky routing, or Vercel AI SDK provider metadata path). Further investigation needed to identify which production sessions have non-zero cache_hit_tokens and what API path/provider they used.
…shape The old harness used tool_choice:none and synthetic messages, never producing tool-result follow-ups. This kept prompts below Opus's 4096-token cache minimum and prevented the breakpoint from landing on a tool message — both of which are the norm in production. Rewrite to use a realistic fixture (system prompt + 14 tools from no_tool_request.json), tool_choice:auto with a local tool executor, and provider.order instead of provider.only.
| } | ||
|
|
||
| const authToken = generateApiToken(user[0]); | ||
| const baseUrl = process.env.NEXT_PUBLIC_BASE_URL || 'http://localhost:3000'; |
There was a problem hiding this comment.
WARNING: Local server selection still ignores .dev-port
This script is meant to hit the local gateway, but it still falls back to http://localhost:3000 instead of reading the assigned port from .dev-port. On repos that run the dev server on a different port, the harness can talk to the wrong instance and produce misleading cache diagnostics.
| } | ||
|
|
||
| function resolveWorkspacePath(relativePath: string): string { | ||
| return path.resolve(WORKSPACE_ROOT, relativePath); |
There was a problem hiding this comment.
WARNING: Tool paths can escape the workspace root
path.resolve(WORKSPACE_ROOT, relativePath) accepts .. segments and absolute paths, so a model-issued read_file, list_files, or search_files call can traverse outside the repo. Because the harness sends tool output back to OpenRouter, a bad tool call here can exfiltrate arbitrary local files from the machine running the script.
Summary
src/lib/providers/cache-debug.tsand wire them intosrc/app/api/openrouter/[...path]/route.tsafter provider-specific request mutations, so Anthropicchat_completionsrequests with tools log breakpoint placement, prompt cache key presence, prefix hash, and serialized body shape for the exact payload sent upstream.src/lib/processUsage.tsso the same Anthropic tool requests log cache hit/write tokens, inference provider, and cost reconciliation from generation lookup or inline fallback.src/scripts/openrouter/test-cache-diag.tsand fix theserver-onlyshim insrc/scripts/index.ts, which let us validate the actual provider behavior locally: default routing resolves to Amazon Bedrock, Bedrock hits on the first cache-bearing follow-up, and explicit Anthropic misses that first follow-up before hitting on later turns.Verification
curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/users/sign_in->200pnpm script:run openrouter test-cache-diag "cache-diag-20260402161604@example.com"-> pass; default routing used Amazon Bedrock with cache hit rates0%, 75%, 96%, 84%, 97%, 90%pnpm script:run openrouter test-cache-diag "cache-diag-20260402161604@example.com" anthropic/claude-opus-4.6 compare-> pass; preferredamazon-bedrockmatched default, preferredanthropicshowed0%, 0%, 96%, 84%, 97%, 90%/private/tmp/kilo-dev-server.logfor[CacheDiag]and[CacheDiag:response]entries ->promptCacheKey: trueon all requests; default routing and preferred Bedrock loggedinferenceProvider:"Amazon Bedrock"; preferred Anthropic missed the first cache-bearing follow-up and then hit on subsequent turnsVisual Changes
N/A
Reviewer Notes
chat_completionsrequests with tools and wrapped so diagnostics never break request handling.src/lib/utils/testdata/no_tool_request.jsonand a tool-result follow-up loop, which matches the production cache breakpoint shape better than the oldertool_choice: \"none\"false-negative path.