Skip to content

Add prompt caching audit skills#2

Open
AntoineToussaint wants to merge 1 commit intomainfrom
add-cache-audit-skills
Open

Add prompt caching audit skills#2
AntoineToussaint wants to merge 1 commit intomainfrom
add-cache-audit-skills

Conversation

@AntoineToussaint
Copy link
Copy Markdown
Member

Summary

Adds two Claude Code skills for diagnosing prompt caching issues via the TensorZero gateway API:

  • /prompt-audit <function> [variant] — audits tool ordering stability, cache breakpoint configuration, and cache hit rates for a function+variant
  • /cache-status — quick dashboard of caching effectiveness across all functions/variants

Skills are bundled as Python scripts (stdlib only, no dependencies) with allowed-tools: Bash(python3 *) so they run without permission prompts and hide intermediate API calls.

Both skills query two gateway endpoints:

  • POST /v1/inferences/list_inferences — to discover recent inferences
  • GET /internal/model_inferences/{id} — to get cache token stats and raw_request for tool/breakpoint analysis

Example output

/prompt-audit cache_test

Full output

Cache Audit: cache_test / anthropic-with-tools

Model: claude-haiku-4-5-anthropic (provider: anthropic)
Sample size: 8 inferences

Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.

Cache Breakpoints: FAIL
Anthropic model with long system prompt (~4446 estimated tokens) but no cache_control breakpoints configured. Add breakpoints via extra_body to enable caching.

Cache Hit Rate: 0.0%

  • Total input tokens: 38,784
  • Cache read tokens: 0
  • Cache write tokens: 0
  • Cache hit rate is low. Check breakpoint configuration and system prompt stability.

Recommendations

  1. Add cache_control breakpoints via extra_body in the variant config, e.g.:
    { pointer = "/system/0/cache_control", value = { type = "ephemeral" } }

Cache Audit: cache_test / anthropic-with-tools-cached

Model: claude-haiku-4-5-anthropic (provider: anthropic)
Sample size: 8 inferences

Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.

Cache Breakpoints: PASS
Cache breakpoints are configured via extra_body.

Cache Hit Rate: 81.4%

  • Total input tokens: 38,784
  • Cache read tokens: 31,584
  • Cache write tokens: 4,512

Recommendations
None — this variant is well-configured for prompt caching.


Cache Audit: cache_test / openai-with-tools

Model: gpt-4o-mini-2024-07-18 (provider: openai)
Sample size: 8 inferences

Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.

Cache Breakpoints: INFO
Provider openai uses automatic prefix caching — no explicit breakpoints needed.

Cache Hit Rate: 61.3%

  • Total input tokens: 29,240
  • Cache read tokens: 17,920
  • Cache write tokens: 0

Recommendations
None — this variant is well-configured for prompt caching.

/cache-status

Function Variant Model Provider Inferences Avg Input Tokens Cache Hit Rate
cache_test anthropic-with-tools claude-haiku-4-5-anthropic anthropic 8 4,848 0%
cache_test_no_tools anthropic-no-tools claude-haiku-4-5-anthropic anthropic 8 3,969 0%
cache_test_no_tools anthropic-no-tools-cached claude-haiku-4-5-anthropic anthropic 8 3,969 0%
cache_test openai-with-tools gpt-4o-mini-2024-07-18 openai 8 3,655 61%
cache_test anthropic-with-tools-cached claude-haiku-4-5-anthropic anthropic 8 4,848 81%

Summary: 2/5 variants with cache hit rate ≥ 50%. 3 variants flagged for attention.

Test plan

  • /prompt-audit cache_test — correctly detects missing breakpoints, shows 0% vs 81% cache hit rates
  • /prompt-audit cache_test_no_tools — correctly identifies no tools, detects breakpoint config
  • /cache-status — shows dashboard across all functions, flags low-performing variants
  • Tool ordering check uses raw_request from model inference data (not always-empty provider_tools from list_inferences)
  • Cache breakpoint check uses raw_request (not always-empty extra_body from list_inferences)
  • Skills run without permission prompts via allowed-tools: Bash(python3 *)

🤖 Generated with Claude Code

Two Claude Code skills for diagnosing prompt caching issues via the
TensorZero gateway API:

- prompt-audit: audits tool ordering stability, cache breakpoint
  configuration, and cache hit rates for a function+variant
- cache-status: quick dashboard of caching effectiveness across all
  functions/variants

Skills are bundled as Python scripts (stdlib only) with
allowed-tools: Bash(python3 *) so they run without permission prompts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants