add anthropic caching to system prompt in agent#1655
Conversation
|
Greptile OverviewGreptile SummaryImplemented Anthropic prompt caching for agent system prompts by creating a new Key Changes:
Benefits:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant V3AgentHandler
participant prependSystemMessage
participant LLMClient
participant AnthropicAPI
Client->>V3AgentHandler: execute() or stream()
V3AgentHandler->>V3AgentHandler: prepareAgent()
V3AgentHandler->>V3AgentHandler: buildAgentSystemPrompt()
alt execute() flow
V3AgentHandler->>prependSystemMessage: prependSystemMessage(systemPrompt, messages)
prependSystemMessage->>prependSystemMessage: Add system message with cache control
prependSystemMessage-->>V3AgentHandler: messages with cached system prompt
V3AgentHandler->>LLMClient: generateText({messages})
else stream() flow
V3AgentHandler->>prependSystemMessage: prependSystemMessage(systemPrompt, messages)
prependSystemMessage->>prependSystemMessage: Add system message with cache control
prependSystemMessage-->>V3AgentHandler: messages with cached system prompt
V3AgentHandler->>LLMClient: streamText({messages})
end
LLMClient->>AnthropicAPI: Request with cached system prompt
Note over AnthropicAPI: Caches system prompt<br/>with ephemeral cache control
AnthropicAPI-->>LLMClient: Response with cached_input_tokens
LLMClient-->>V3AgentHandler: Result with usage metrics
V3AgentHandler-->>Client: AgentResult with cached token count
|
There was a problem hiding this comment.
No issues found across 1 file
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Agent as V3AgentHandler
participant LLM as LLMClient
participant Provider as Model Provider (Claude/Generic)
Note over Agent,Provider: Text Generation or Streaming Flow
Agent->>Agent: NEW: prependSystemMessage(systemPrompt, messages)
Note right of Agent: Constructs message with<br/>providerOptions.anthropic.cacheControl: "ephemeral"
Agent->>LLM: CHANGED: generateText() or streamText()
Note right of Agent: System prompt is now prepended to 'messages' array<br/>instead of being passed as 'system' property.
LLM->>Provider: Send API Request
alt Provider is Anthropic (Claude)
Provider->>Provider: Identify ephemeral cache breakpoint
alt Cache Hit
Provider-->>LLM: Response (with cached_input_tokens)
else Cache Miss
Provider->>Provider: Store system prompt in cache
Provider-->>LLM: Response (standard usage)
end
else Other Provider (OpenAI/etc)
Provider->>Provider: Ignore anthropic providerOptions
Provider-->>LLM: Standard Response
end
LLM-->>Agent: Return Result / Stream Content
Add Anthropic Prompt Caching for Agent System Prompts
Summary
Added Anthropic's ephemeral cache control to system prompts in
v3AgentHandler.ts. Refactored into a singleprependSystemMessage()function used by bothexecute()andstream().Why
Reduces token costs and latency for Claude models by caching the system prompt server-side. The
providerOptionsare ignored by non-Anthropic models, so this is safe for all providers.Limitations
Currently limited to the system prompt only. Extending to conversation messages requires restructuring message processing to insert cache breakpoints at stable boundaries.
Test Plan
cached_input_tokensappears in usage metrics with ClaudeSummary by cubic
Added Anthropic ephemeral caching to the agent’s system prompt to lower Claude token usage and reduce latency. Safe for non-Anthropic models; the options are ignored.
New Features
Refactors
Written for commit 6e2eef6. Summary will update on new commits. Review in cubic