Skip to content

feat(proxy): add cache diagnostic logging for Anthropic requests#1891

Open
alex-alecu wants to merge 6 commits intomainfrom
feat/cache-debug-logging
Open

feat(proxy): add cache diagnostic logging for Anthropic requests#1891
alex-alecu wants to merge 6 commits intomainfrom
feat/cache-debug-logging

Conversation

@alex-alecu
Copy link
Copy Markdown
Contributor

@alex-alecu alex-alecu commented Apr 2, 2026

Summary

  • Add pre-request cache diagnostics in src/lib/providers/cache-debug.ts and wire them into src/app/api/openrouter/[...path]/route.ts after provider-specific request mutations, so Anthropic chat_completions requests with tools log breakpoint placement, prompt cache key presence, prefix hash, and serialized body shape for the exact payload sent upstream.
  • Add post-response cache diagnostics in src/lib/processUsage.ts so the same Anthropic tool requests log cache hit/write tokens, inference provider, and cost reconciliation from generation lookup or inline fallback.
  • Add the production-shaped cache repro harness in src/scripts/openrouter/test-cache-diag.ts and fix the server-only shim in src/scripts/index.ts, which let us validate the actual provider behavior locally: default routing resolves to Amazon Bedrock, Bedrock hits on the first cache-bearing follow-up, and explicit Anthropic misses that first follow-up before hitting on later turns.

Verification

  • curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/users/sign_in -> 200
  • pnpm script:run openrouter test-cache-diag "cache-diag-20260402161604@example.com" -> pass; default routing used Amazon Bedrock with cache hit rates 0%, 75%, 96%, 84%, 97%, 90%
  • pnpm script:run openrouter test-cache-diag "cache-diag-20260402161604@example.com" anthropic/claude-opus-4.6 compare -> pass; preferred amazon-bedrock matched default, preferred anthropic showed 0%, 0%, 96%, 84%, 97%, 90%
  • Reviewed /private/tmp/kilo-dev-server.log for [CacheDiag] and [CacheDiag:response] entries -> promptCacheKey: true on all requests; default routing and preferred Bedrock logged inferenceProvider:"Amazon Bedrock"; preferred Anthropic missed the first cache-bearing follow-up and then hit on subsequent turns

Visual Changes

N/A

Reviewer Notes

  • Logging is gated to Anthropic chat_completions requests with tools and wrapped so diagnostics never break request handling.
  • The new harness uses a realistic system prompt + tool fixture from src/lib/utils/testdata/no_tool_request.json and a tool-result follow-up loop, which matches the production cache breakpoint shape better than the older tool_choice: \"none\" false-negative path.
  • The main reviewer question is whether the current provider-specific behavior is acceptable: Bedrock hits immediately, while Anthropic appears to miss the first cache-bearing follow-up but behaves normally after that.

Add logCacheDiagnostics() that computes a structured diagnostic payload
for Anthropic chat_completions requests with tools. Logs prefix hash,
breakpoint position, message structure, and body hash to enable
detection of prefix drift causing cache misses.
Wire logCacheDiagnostics() into the gateway route handler after all body
mutations (tracking IDs, reasoning dedup, cache breakpoints) and before
forwarding upstream, so the [CacheDiag] log captures the exact request
state sent to the provider.
Add [CacheDiag:response] log blocks in processTokenData() for Anthropic
chat_completions with tools. Logs cache hit/write/input tokens and cost
from both generation lookup and inline-only paths, enabling correlation
with the pre-request [CacheDiag] prefix hash to diagnose cache misses.
Add test-cache-diag.ts script that makes multi-turn streaming requests
with realistic tools and system prompt to exercise the [CacheDiag] logging
path. Wire up the server-only shim in the script runner so CLI scripts
can import modules that transitively depend on server-only.
const roleCounts: Record<string, number> = {};
let totalContentBytes = 0;
for (const msg of messages) {
roleCounts[msg.role] = (roleCounts[msg.role] || 0) + 1;

Check failure

Code scanning / CodeQL

Remote property injection High

A property name to write to depends on a
user-provided value
.
upstreamId: usageStats.upstream_id,
inputTokens: usageStats.inputTokens,
cacheHitTokens: usageStats.cacheHitTokens,
cacheWriteTokens: usageStats.cacheWriteTokens,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Inline fallback always reports zero cache writes

On the source: 'inline' path this logs usageStats.cacheWriteTokens, but processOpenRouterUsage() still hardcodes that field to 0 and OpenRouterUsage does not read prompt_tokens_details.cache_write_tokens. When fetchGeneration() misses, this diagnostic will hide non-zero cache writes instead of surfacing them.

((part as Record<string, unknown>).cache_control as Record<string, unknown>).type ===
'ephemeral'
);
breakpointContentLength = JSON.stringify(content).length;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: contentLen can be wrong when no breakpoint is found

breakpointContentLength is updated before you know whether the current message actually has cache_control. If a request reaches this logger without any breakpoint, the payload ends up with index: -1 / role: '<none>' but a non-zero contentLen copied from the last inspected message, which makes the diagnostic misleading.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Apr 2, 2026

Code Review Summary

Status: 4 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 4
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
src/lib/processUsage.ts 867 Inline-only cache diagnostics still always log cacheWriteTokens as 0.
src/lib/providers/cache-debug.ts 48 breakpoint.contentLen can still report the last inspected message even when no cache breakpoint exists.
src/scripts/openrouter/test-cache-diag.ts 160 The harness still defaults to http://localhost:3000 instead of reading .dev-port.
src/scripts/openrouter/test-cache-diag.ts 392 Tool path resolution accepts .. and absolute paths, so model-issued reads can escape the workspace.
Other Observations (not in diff)

N/A

Files Reviewed (1 file)
  • src/scripts/openrouter/test-cache-diag.ts - 2 issues

Fix these issues in Kilo Cloud


Reviewed by gpt-5.4-20260305 · 479,970 tokens

…cking

Add 'compare' mode that runs the same test against both amazon-bedrock
and anthropic providers back-to-back. Extract cache_write_tokens from the
SSE stream (now available from OpenRouter). Accept optional provider arg
to force routing via provider.only.
@alex-alecu
Copy link
Copy Markdown
Contributor Author

Cache diagnostic test results (2026-04-02)

Test 1: Gateway proxy — Bedrock vs Anthropic direct

Ran test-cache-diag with compare mode through our gateway. Realistic system prompt + 8 tool definitions, 3 turns.

Amazon Bedrock (provider.only: ['amazon-bedrock']):

Turn Msgs Input Cache Write Cache Read Hit Rate
1 2 2,794 0 0 0%
2 4 2,862 0 0 0%
3 6 2,928 0 0 0%

Anthropic direct (provider.only: ['anthropic']):

Turn Msgs Input Cache Write Cache Read Hit Rate
1 2 2,794 0 0 0%
2 4 2,862 0 0 0%
3 6 2,928 0 0 0%

Both providers return identical zero cache tokens.

Test 2: Direct OpenRouter API (bypassing our gateway)

Sent cache_control: { type: 'ephemeral' } on the system message content part directly to https://openrouter.ai/api/v1/chat/completions, no gateway involved:

Test Input tokens Provider Cache Write Cache Read
Turn 1 (4,297 tokens, system with cache_control) 4,297 Anthropic 0 0
Turn 2 (same prefix + new user msg) 4,309 Anthropic 0 0

4,297 tokens is well above the 1,024 minimum for Opus models (and above 4,096 in case Opus 4.6 has a higher threshold like Opus 4.5).

Test 3: OpenRouter Messages API (/api/v1/messages)

Tested with top-level cache_control: { type: 'ephemeral' } through the Messages API:

Test Input tokens Provider Cache Creation Cache Read
Turn 1 2,111 Anthropic 0 0
Turn 2 (same prefix) 2,123 Anthropic 0 0

Conclusion

OpenRouter does not forward cache_control annotations to upstream providers in either the Chat Completions or Messages API formats. This was tested:

  • Through our gateway and directly against OpenRouter's API
  • With both per-block cache_control (on content parts) and top-level cache_control
  • With both Bedrock and Anthropic direct routing
  • With prompts from 1,531 to 4,297 input tokens

The zero cache tokens are NOT caused by our gateway code — addCacheBreakpoints() correctly sets the breakpoints, but they are stripped or ignored by OpenRouter before reaching the upstream provider.

Open question: If production sessions show cache savings, they must come from a different mechanism (e.g. OpenRouter's internal prompt_cache_key field, session_id sticky routing, or Vercel AI SDK provider metadata path). Further investigation needed to identify which production sessions have non-zero cache_hit_tokens and what API path/provider they used.

…shape

The old harness used tool_choice:none and synthetic messages, never
producing tool-result follow-ups. This kept prompts below Opus's
4096-token cache minimum and prevented the breakpoint from landing on
a tool message — both of which are the norm in production.

Rewrite to use a realistic fixture (system prompt + 14 tools from
no_tool_request.json), tool_choice:auto with a local tool executor,
and provider.order instead of provider.only.
'x-forwarded-for': '127.0.0.1',
'x-kilocode-taskid': taskId,
},
body: JSON.stringify(body),

Check warning

Code scanning / CodeQL

File data in outbound network request Medium test

Outbound network request depends on
file data
.
}

const authToken = generateApiToken(user[0]);
const baseUrl = process.env.NEXT_PUBLIC_BASE_URL || 'http://localhost:3000';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Local server selection still ignores .dev-port

This script is meant to hit the local gateway, but it still falls back to http://localhost:3000 instead of reading the assigned port from .dev-port. On repos that run the dev server on a different port, the harness can talk to the wrong instance and produce misleading cache diagnostics.

}

function resolveWorkspacePath(relativePath: string): string {
return path.resolve(WORKSPACE_ROOT, relativePath);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Tool paths can escape the workspace root

path.resolve(WORKSPACE_ROOT, relativePath) accepts .. segments and absolute paths, so a model-issued read_file, list_files, or search_files call can traverse outside the repo. Because the harness sends tool output back to OpenRouter, a bad tool call here can exfiltrate arbitrary local files from the machine running the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants