feat(proxy): add cache diagnostic logging for Anthropic requests by alex-alecu · Pull Request #1891 · Kilo-Org/cloud

alex-alecu · 2026-04-02T07:56:18Z

Summary

Add pre-request cache diagnostics in src/lib/providers/cache-debug.ts and wire them into src/app/api/openrouter/[...path]/route.ts after provider-specific request mutations, so Anthropic chat_completions requests with tools log breakpoint placement, prompt cache key presence, prefix hash, and serialized body shape for the exact payload sent upstream.
Add post-response cache diagnostics in src/lib/processUsage.ts so the same Anthropic tool requests log cache hit/write tokens, inference provider, and cost reconciliation from generation lookup or inline fallback.
Add the production-shaped cache repro harness in src/scripts/openrouter/test-cache-diag.ts and fix the server-only shim in src/scripts/index.ts, which let us validate the actual provider behavior locally: default routing resolves to Amazon Bedrock, Bedrock hits on the first cache-bearing follow-up, and explicit Anthropic misses that first follow-up before hitting on later turns.

Verification

curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/users/sign_in -> 200
pnpm script:run openrouter test-cache-diag "cache-diag-20260402161604@example.com" -> pass; default routing used Amazon Bedrock with cache hit rates 0%, 75%, 96%, 84%, 97%, 90%
pnpm script:run openrouter test-cache-diag "cache-diag-20260402161604@example.com" anthropic/claude-opus-4.6 compare -> pass; preferred amazon-bedrock matched default, preferred anthropic showed 0%, 0%, 96%, 84%, 97%, 90%
Reviewed /private/tmp/kilo-dev-server.log for [CacheDiag] and [CacheDiag:response] entries -> promptCacheKey: true on all requests; default routing and preferred Bedrock logged inferenceProvider:"Amazon Bedrock"; preferred Anthropic missed the first cache-bearing follow-up and then hit on subsequent turns

Visual Changes

N/A

Reviewer Notes

Logging is gated to Anthropic chat_completions requests with tools and wrapped so diagnostics never break request handling.
The new harness uses a realistic system prompt + tool fixture from src/lib/utils/testdata/no_tool_request.json and a tool-result follow-up loop, which matches the production cache breakpoint shape better than the older tool_choice: \"none\" false-negative path.
The main reviewer question is whether the current provider-specific behavior is acceptable: Bedrock hits immediately, while Anthropic appears to miss the first cache-bearing follow-up but behaves normally after that.

Add logCacheDiagnostics() that computes a structured diagnostic payload for Anthropic chat_completions requests with tools. Logs prefix hash, breakpoint position, message structure, and body hash to enable detection of prefix drift causing cache misses.

Wire logCacheDiagnostics() into the gateway route handler after all body mutations (tracking IDs, reasoning dedup, cache breakpoints) and before forwarding upstream, so the [CacheDiag] log captures the exact request state sent to the provider.

Add [CacheDiag:response] log blocks in processTokenData() for Anthropic chat_completions with tools. Logs cache hit/write/input tokens and cost from both generation lookup and inline-only paths, enabling correlation with the pre-request [CacheDiag] prefix hash to diagnose cache misses.

Add test-cache-diag.ts script that makes multi-turn streaming requests with realistic tools and system prompt to exercise the [CacheDiag] logging path. Wire up the server-only shim in the script runner so CLI scripts can import modules that transitively depend on server-only.

src/lib/providers/cache-debug.ts

+    const roleCounts: Record<string, number> = {};
+    let totalContentBytes = 0;
+    for (const msg of messages) {
+      roleCounts[msg.role] = (roleCounts[msg.role] || 0) + 1;


kilo-code-bot · 2026-04-02T08:03:19Z

src/lib/processUsage.ts

+        upstreamId: usageStats.upstream_id,
+        inputTokens: usageStats.inputTokens,
+        cacheHitTokens: usageStats.cacheHitTokens,
+        cacheWriteTokens: usageStats.cacheWriteTokens,


WARNING: Inline fallback always reports zero cache writes

On the source: 'inline' path this logs usageStats.cacheWriteTokens, but processOpenRouterUsage() still hardcodes that field to 0 and OpenRouterUsage does not read prompt_tokens_details.cache_write_tokens. When fetchGeneration() misses, this diagnostic will hide non-zero cache writes instead of surfacing them.

kilo-code-bot · 2026-04-02T08:03:19Z

src/lib/providers/cache-debug.ts

+            ((part as Record<string, unknown>).cache_control as Record<string, unknown>).type ===
+              'ephemeral'
+        );
+        breakpointContentLength = JSON.stringify(content).length;


WARNING: contentLen can be wrong when no breakpoint is found

breakpointContentLength is updated before you know whether the current message actually has cache_control. If a request reaches this logger without any breakpoint, the payload ends up with index: -1 / role: '<none>' but a non-zero contentLen copied from the last inspected message, which makes the diagnostic misleading.

kilo-code-bot · 2026-04-02T08:03:29Z

Code Review Summary

Status: 4 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	4
SUGGESTION	0

Issue Details (click to expand)

WARNING

File	Line	Issue
`src/lib/processUsage.ts`	867	Inline-only cache diagnostics still always log `cacheWriteTokens` as `0`.
`src/lib/providers/cache-debug.ts`	48	`breakpoint.contentLen` can still report the last inspected message even when no cache breakpoint exists.
`src/scripts/openrouter/test-cache-diag.ts`	160	The harness still defaults to `http://localhost:3000` instead of reading `.dev-port`.
`src/scripts/openrouter/test-cache-diag.ts`	392	Tool path resolution accepts `..` and absolute paths, so model-issued reads can escape the workspace.

Other Observations (not in diff)

N/A

Files Reviewed (1 file)

src/scripts/openrouter/test-cache-diag.ts - 2 issues

Fix these issues in Kilo Cloud

_{Reviewed by gpt-5.4-20260305 · 479,970 tokens}

…cking Add 'compare' mode that runs the same test against both amazon-bedrock and anthropic providers back-to-back. Extract cache_write_tokens from the SSE stream (now available from OpenRouter). Accept optional provider arg to force routing via provider.only.

alex-alecu · 2026-04-02T08:37:26Z

Cache diagnostic test results (2026-04-02)

Test 1: Gateway proxy — Bedrock vs Anthropic direct

Ran test-cache-diag with compare mode through our gateway. Realistic system prompt + 8 tool definitions, 3 turns.

Amazon Bedrock (provider.only: ['amazon-bedrock']):

Turn	Msgs	Input	Hit Rate
1	2	2,794	0%
2	4	2,862	0%
3	6	2,928	0%

Anthropic direct (provider.only: ['anthropic']):

Turn	Msgs	Input	Hit Rate
1	2	2,794	0%
2	4	2,862	0%
3	6	2,928	0%

Both providers return identical zero cache tokens.

Test 2: Direct OpenRouter API (bypassing our gateway)

Sent cache_control: { type: 'ephemeral' } on the system message content part directly to https://openrouter.ai/api/v1/chat/completions, no gateway involved:

Test	Input tokens	Provider	Cache Write	Cache Read
Turn 1 (4,297 tokens, system with `cache_control`)	4,297	Anthropic	0	0
Turn 2 (same prefix + new user msg)	4,309	Anthropic	0	0

4,297 tokens is well above the 1,024 minimum for Opus models (and above 4,096 in case Opus 4.6 has a higher threshold like Opus 4.5).

Test 3: OpenRouter Messages API (`/api/v1/messages`)

Tested with top-level cache_control: { type: 'ephemeral' } through the Messages API:

Test	Input tokens	Provider	Cache Creation	Cache Read
Turn 1	2,111	Anthropic	0	0
Turn 2 (same prefix)	2,123	Anthropic	0	0

Conclusion

OpenRouter does not forward cache_control annotations to upstream providers in either the Chat Completions or Messages API formats. This was tested:

Through our gateway and directly against OpenRouter's API
With both per-block cache_control (on content parts) and top-level cache_control
With both Bedrock and Anthropic direct routing
With prompts from 1,531 to 4,297 input tokens

The zero cache tokens are NOT caused by our gateway code — addCacheBreakpoints() correctly sets the breakpoints, but they are stripped or ignored by OpenRouter before reaching the upstream provider.

Open question: If production sessions show cache savings, they must come from a different mechanism (e.g. OpenRouter's internal prompt_cache_key field, session_id sticky routing, or Vercel AI SDK provider metadata path). Further investigation needed to identify which production sessions have non-zero cache_hit_tokens and what API path/provider they used.

…shape The old harness used tool_choice:none and synthetic messages, never producing tool-result follow-ups. This kept prompts below Opus's 4096-token cache minimum and prevented the breakpoint from landing on a tool message — both of which are the norm in production. Rewrite to use a realistic fixture (system prompt + 14 tools from no_tool_request.json), tool_choice:auto with a local tool executor, and provider.order instead of provider.only.

src/scripts/openrouter/test-cache-diag.ts

+      'x-forwarded-for': '127.0.0.1',
+      'x-kilocode-taskid': taskId,
+    },
+    body: JSON.stringify(body),


kilo-code-bot · 2026-04-02T12:52:50Z

src/scripts/openrouter/test-cache-diag.ts

+  }
+
+  const authToken = generateApiToken(user[0]);
+  const baseUrl = process.env.NEXT_PUBLIC_BASE_URL || 'http://localhost:3000';


WARNING: Local server selection still ignores .dev-port

This script is meant to hit the local gateway, but it still falls back to http://localhost:3000 instead of reading the assigned port from .dev-port. On repos that run the dev server on a different port, the harness can talk to the wrong instance and produce misleading cache diagnostics.

kilo-code-bot · 2026-04-02T12:52:50Z

src/scripts/openrouter/test-cache-diag.ts

+}
+
+function resolveWorkspacePath(relativePath: string): string {
+  return path.resolve(WORKSPACE_ROOT, relativePath);


WARNING: Tool paths can escape the workspace root

path.resolve(WORKSPACE_ROOT, relativePath) accepts .. segments and absolute paths, so a model-issued read_file, list_files, or search_files call can traverse outside the repo. Because the harness sends tool output back to OpenRouter, a bad tool call here can exfiltrate arbitrary local files from the machine running the script.

alex-alecu added 4 commits March 31, 2026 13:52

github-advanced-security bot found potential problems Apr 2, 2026

View reviewed changes

kilo-code-bot bot reviewed Apr 2, 2026

View reviewed changes

github-advanced-security bot found potential problems Apr 2, 2026

View reviewed changes

src/scripts/openrouter/test-cache-diag.ts

'x-forwarded-for': '127.0.0.1',

'x-kilocode-taskid': taskId,

},

body: JSON.stringify(body),

Check warning

Code scanning / CodeQL

File data in outbound network request Medium test

Outbound network request depends on

file data

Loading
.

kilo-code-bot bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proxy): add cache diagnostic logging for Anthropic requests#1891

feat(proxy): add cache diagnostic logging for Anthropic requests#1891
alex-alecu wants to merge 6 commits intomainfrom
feat/cache-debug-logging

alex-alecu commented Apr 2, 2026 •

edited

Loading

Uh oh!

Check failure

kilo-code-bot bot Apr 2, 2026

Uh oh!

kilo-code-bot bot Apr 2, 2026

Uh oh!

kilo-code-bot bot commented Apr 2, 2026 •

edited

Loading

WARNING

Uh oh!

alex-alecu commented Apr 2, 2026

Uh oh!

Check warning

kilo-code-bot bot Apr 2, 2026

Uh oh!

kilo-code-bot bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alex-alecu commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Visual Changes

Reviewer Notes

Uh oh!

Check failure

Uh oh!

kilo-code-bot bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

WARNING

Uh oh!

alex-alecu commented Apr 2, 2026

Cache diagnostic test results (2026-04-02)

Test 1: Gateway proxy — Bedrock vs Anthropic direct

Test 2: Direct OpenRouter API (bypassing our gateway)

Test 3: OpenRouter Messages API (/api/v1/messages)

Conclusion

Uh oh!

Check warning

Uh oh!

kilo-code-bot bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-alecu commented Apr 2, 2026 •

edited

Loading

kilo-code-bot bot commented Apr 2, 2026 •

edited

Loading

Test 3: OpenRouter Messages API (`/api/v1/messages`)