design: add Token-Aware Context Management proposal#758
design: add Token-Aware Context Management proposal#758opieter-aws wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-758/docs/user-guide/quickstart/overview/ Updated at: 2026-04-14T07:54:57.293Z |
There was a problem hiding this comment.
One thing worth thinking about: this design is TS-specific, and the Python ConversationManager API is shaped pretty differently (separate reduce_context and apply_management methods, exception is required, no wrapper pattern today). Totally fine for a TS design doc, but wanted to flag it early so we don't end up with two divergent architectures for the same feature.
I think it'd be helpful to either:
- Scope this explicitly as TS-only and note that a parallel Python design will follow, or
- Propose shared abstractions both SDKs can converge on (e.g., agree that the wrapper pattern is the direction for Python too, that
reduce_contextshould decouple from requiring an exception, etc.)
The roadmap currently targets Python first since that's where most of the customer demand is, so my recommendation would be to land the Python design first and let it inform the TS implementation.
There was a problem hiding this comment.
Wanted to share some thoughts on the scope, but am happy to discuss if you see it differently, as I have some concerns with ThresholdConversationManager as a bag of unrelated entities.
I noticed ThresholdConversationManager bundles three roadmap items (#555, #1296, #298) that were originally scoped separately because they have pretty different risk profiles and use cases:
- Externalization (1296): pure cost reducer, no LLM calls, low risk, shippable today
- Proactive compression (555): involves LLM calls, has a cost break-even curve, needs careful tuning
- In-loop management (298): architectural change to when hooks fire
The nice thing is these don't actually need a wrapper to compose, they hook into different events independently:
- Externalization hooks
AfterToolCallEvent - Proactive compression hooks
BeforeModelCallEvent - In-loop management falls out naturally since both hooks already fire within the agent loop
Keeping them separate would make it easier to ship incrementally (land the easy wins first), iterate on compression without touching externalization, and let users opt into just what they need.
My suggestion would be to focus this design on proactive compression (#555) as a standalone piece. Once all three are stable and proven independently, we could always offer a convenience flag or preset (e.g., contextManagement: "auto") that turns on sensible defaults for all of them — but as sugar on top of independently shippable pieces rather than the starting point.
What do you think?
| threshold?: number | ||
|
|
||
| /** Tool result externalization config. When provided, enables externalization. */ | ||
| externalization?: ToolResultExternalizationConfig |
There was a problem hiding this comment.
Externalization being optional but proactive compression being mandatory means that a user who only wants externalization has to set threshold: 1.0 as a workaround. This to me suggests a code smell these should be separate concerns rather than bundled in one manager.
Description
Adds design document 0006: Token-Aware Context Management. This proposes a
ThresholdConversationManagerthat wraps any inner conversation manager to add proactive compression, tool result externalization, and in-loop context management.This is a follow-up on the roadmap in 0003-context-management.md
Related Issues
Type of Change
Checklist
npm run dev