Open
Conversation
Two Claude Code skills for diagnosing prompt caching issues via the TensorZero gateway API: - prompt-audit: audits tool ordering stability, cache breakpoint configuration, and cache hit rates for a function+variant - cache-status: quick dashboard of caching effectiveness across all functions/variants Skills are bundled as Python scripts (stdlib only) with allowed-tools: Bash(python3 *) so they run without permission prompts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two Claude Code skills for diagnosing prompt caching issues via the TensorZero gateway API:
/prompt-audit <function> [variant]— audits tool ordering stability, cache breakpoint configuration, and cache hit rates for a function+variant/cache-status— quick dashboard of caching effectiveness across all functions/variantsSkills are bundled as Python scripts (stdlib only, no dependencies) with
allowed-tools: Bash(python3 *)so they run without permission prompts and hide intermediate API calls.Both skills query two gateway endpoints:
POST /v1/inferences/list_inferences— to discover recent inferencesGET /internal/model_inferences/{id}— to get cache token stats andraw_requestfor tool/breakpoint analysisExample output
/prompt-audit cache_testFull output
Cache Audit: cache_test / anthropic-with-tools
Model: claude-haiku-4-5-anthropic (provider: anthropic)
Sample size: 8 inferences
Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.
Cache Breakpoints: FAIL
Anthropic model with long system prompt (~4446 estimated tokens) but no
cache_controlbreakpoints configured. Add breakpoints viaextra_bodyto enable caching.Cache Hit Rate: 0.0%
Recommendations
cache_controlbreakpoints viaextra_bodyin the variant config, e.g.:{ pointer = "/system/0/cache_control", value = { type = "ephemeral" } }Cache Audit: cache_test / anthropic-with-tools-cached
Model: claude-haiku-4-5-anthropic (provider: anthropic)
Sample size: 8 inferences
Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.
Cache Breakpoints: PASS
Cache breakpoints are configured via
extra_body.Cache Hit Rate: 81.4%
Recommendations
None — this variant is well-configured for prompt caching.
Cache Audit: cache_test / openai-with-tools
Model: gpt-4o-mini-2024-07-18 (provider: openai)
Sample size: 8 inferences
Tool Ordering: PASS
Tool definitions are stable across all 8 inferences.
Cache Breakpoints: INFO
Provider
openaiuses automatic prefix caching — no explicit breakpoints needed.Cache Hit Rate: 61.3%
Recommendations
None — this variant is well-configured for prompt caching.
/cache-statusSummary: 2/5 variants with cache hit rate ≥ 50%. 3 variants flagged for attention.
Test plan
/prompt-audit cache_test— correctly detects missing breakpoints, shows 0% vs 81% cache hit rates/prompt-audit cache_test_no_tools— correctly identifies no tools, detects breakpoint config/cache-status— shows dashboard across all functions, flags low-performing variantsraw_requestfrom model inference data (not always-emptyprovider_toolsfromlist_inferences)raw_request(not always-emptyextra_bodyfromlist_inferences)allowed-tools: Bash(python3 *)🤖 Generated with Claude Code