Add prompt caching audit skills by AntoineToussaint · Pull Request #2 · tensorzero/for-agents

AntoineToussaint · 2026-03-31T22:02:01Z

Summary

Adds two Claude Code skills for diagnosing prompt caching issues via the TensorZero gateway API:

/prompt-audit <function> [variant] — audits tool ordering stability, cache breakpoint configuration, and cache hit rates for a function+variant
/cache-status — quick dashboard of caching effectiveness across all functions/variants

Skills are bundled as Python scripts (stdlib only, no dependencies) with allowed-tools: Bash(python3 *) so they run without permission prompts and hide intermediate API calls.

Both skills query two gateway endpoints:

POST /v1/inferences/list_inferences — to discover recent inferences
GET /internal/model_inferences/{id} — to get cache token stats and raw_request for tool/breakpoint analysis

Example output

`/prompt-audit cache_test`

Full output

Cache Audit: cache_test / anthropic-with-tools

Model: claude-haiku-4-5-anthropic (provider: anthropic)
Sample size: 8 inferences

Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.

Cache Breakpoints: FAIL
Anthropic model with long system prompt (~4446 estimated tokens) but no cache_control breakpoints configured. Add breakpoints via extra_body to enable caching.

Cache Hit Rate: 0.0%

Total input tokens: 38,784
Cache read tokens: 0
Cache write tokens: 0
Cache hit rate is low. Check breakpoint configuration and system prompt stability.

Recommendations

Add cache_control breakpoints via extra_body in the variant config, e.g.:
{ pointer = "/system/0/cache_control", value = { type = "ephemeral" } }

Cache Audit: cache_test / anthropic-with-tools-cached

Model: claude-haiku-4-5-anthropic (provider: anthropic)
Sample size: 8 inferences

Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.

Cache Breakpoints: PASS
Cache breakpoints are configured via extra_body.

Cache Hit Rate: 81.4%

Total input tokens: 38,784
Cache read tokens: 31,584
Cache write tokens: 4,512

Recommendations
None — this variant is well-configured for prompt caching.

Cache Audit: cache_test / openai-with-tools

Model: gpt-4o-mini-2024-07-18 (provider: openai)
Sample size: 8 inferences

Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.

Cache Breakpoints: INFO
Provider openai uses automatic prefix caching — no explicit breakpoints needed.

Cache Hit Rate: 61.3%

Total input tokens: 29,240
Cache read tokens: 17,920
Cache write tokens: 0

Recommendations
None — this variant is well-configured for prompt caching.

`/cache-status`

Function	Variant	Model	Provider	Inferences	Avg Input Tokens	Cache Hit Rate
cache_test	anthropic-with-tools	claude-haiku-4-5-anthropic	anthropic	8	4,848	0%
cache_test_no_tools	anthropic-no-tools	claude-haiku-4-5-anthropic	anthropic	8	3,969	0%
cache_test_no_tools	anthropic-no-tools-cached	claude-haiku-4-5-anthropic	anthropic	8	3,969	0%
cache_test	openai-with-tools	gpt-4o-mini-2024-07-18	openai	8	3,655	61%
cache_test	anthropic-with-tools-cached	claude-haiku-4-5-anthropic	anthropic	8	4,848	81%

Summary: 2/5 variants with cache hit rate ≥ 50%. 3 variants flagged for attention.

Test plan

/prompt-audit cache_test — correctly detects missing breakpoints, shows 0% vs 81% cache hit rates
/prompt-audit cache_test_no_tools — correctly identifies no tools, detects breakpoint config
/cache-status — shows dashboard across all functions, flags low-performing variants
Tool ordering check uses raw_request from model inference data (not always-empty provider_tools from list_inferences)
Cache breakpoint check uses raw_request (not always-empty extra_body from list_inferences)
Skills run without permission prompts via allowed-tools: Bash(python3 *)

🤖 Generated with Claude Code

Two Claude Code skills for diagnosing prompt caching issues via the TensorZero gateway API: - prompt-audit: audits tool ordering stability, cache breakpoint configuration, and cache hit rates for a function+variant - cache-status: quick dashboard of caching effectiveness across all functions/variants Skills are bundled as Python scripts (stdlib only) with allowed-tools: Bash(python3 *) so they run without permission prompts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AntoineToussaint mentioned this pull request Mar 31, 2026

Add prompt caching test config and seed script tensorzero/tensorzero#7135

Merged

2 tasks

AntoineToussaint assigned GabrielBianconi Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt caching audit skills#2

Add prompt caching audit skills#2
AntoineToussaint wants to merge 1 commit intomainfrom
add-cache-audit-skills

AntoineToussaint commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AntoineToussaint commented Mar 31, 2026

Summary

Example output

/prompt-audit cache_test

Cache Audit: cache_test / anthropic-with-tools

Cache Audit: cache_test / anthropic-with-tools-cached

Cache Audit: cache_test / openai-with-tools

/cache-status

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`/prompt-audit cache_test`

`/cache-status`