Skip to content

design: add Token-Aware Context Management proposal#758

Open
opieter-aws wants to merge 1 commit intostrands-agents:mainfrom
opieter-aws:main
Open

design: add Token-Aware Context Management proposal#758
opieter-aws wants to merge 1 commit intostrands-agents:mainfrom
opieter-aws:main

Conversation

@opieter-aws
Copy link
Copy Markdown

@opieter-aws opieter-aws commented Apr 14, 2026

Description

Adds design document 0006: Token-Aware Context Management. This proposes a ThresholdConversationManager that wraps any inner conversation manager to add proactive compression, tool result externalization, and in-loop context management.

This is a follow-up on the roadmap in 0003-context-management.md

Related Issues

Type of Change

  • New content
  • Content update/revision
  • Structure/organization improvement
  • Typo/formatting fix
  • Bug fix
  • Other (please describe):

Checklist

  • I have read the CONTRIBUTING document
  • My changes follow the project's documentation style
  • I have tested the documentation locally using npm run dev
  • Links in the documentation are valid and working

@github-actions
Copy link
Copy Markdown
Contributor

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-758/docs/user-guide/quickstart/overview/

Updated at: 2026-04-14T07:54:57.293Z

Copy link
Copy Markdown
Member

@lizradway lizradway Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


One thing worth thinking about: this design is TS-specific, and the Python ConversationManager API is shaped pretty differently (separate reduce_context and apply_management methods, exception is required, no wrapper pattern today). Totally fine for a TS design doc, but wanted to flag it early so we don't end up with two divergent architectures for the same feature.

I think it'd be helpful to either:

  1. Scope this explicitly as TS-only and note that a parallel Python design will follow, or
  2. Propose shared abstractions both SDKs can converge on (e.g., agree that the wrapper pattern is the direction for Python too, that reduce_context should decouple from requiring an exception, etc.)

The roadmap currently targets Python first since that's where most of the customer demand is, so my recommendation would be to land the Python design first and let it inform the TS implementation.

Copy link
Copy Markdown
Member

@lizradway lizradway Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to share some thoughts on the scope, but am happy to discuss if you see it differently, as I have some concerns with ThresholdConversationManager as a bag of unrelated entities.

I noticed ThresholdConversationManager bundles three roadmap items (#555, #1296, #298) that were originally scoped separately because they have pretty different risk profiles and use cases:

  • Externalization (1296): pure cost reducer, no LLM calls, low risk, shippable today
  • Proactive compression (555): involves LLM calls, has a cost break-even curve, needs careful tuning
  • In-loop management (298): architectural change to when hooks fire

The nice thing is these don't actually need a wrapper to compose, they hook into different events independently:

  • Externalization hooks AfterToolCallEvent
  • Proactive compression hooks BeforeModelCallEvent
  • In-loop management falls out naturally since both hooks already fire within the agent loop

Keeping them separate would make it easier to ship incrementally (land the easy wins first), iterate on compression without touching externalization, and let users opt into just what they need.

My suggestion would be to focus this design on proactive compression (#555) as a standalone piece. Once all three are stable and proven independently, we could always offer a convenience flag or preset (e.g., contextManagement: "auto") that turns on sensible defaults for all of them — but as sugar on top of independently shippable pieces rather than the starting point.

What do you think?

threshold?: number

/** Tool result externalization config. When provided, enables externalization. */
externalization?: ToolResultExternalizationConfig
Copy link
Copy Markdown
Member

@lizradway lizradway Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Externalization being optional but proactive compression being mandatory means that a user who only wants externalization has to set threshold: 1.0 as a workaround. This to me suggests a code smell these should be separate concerns rather than bundled in one manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants