Skip to content

Conversation

@danielmillerp
Copy link
Contributor

@danielmillerp danielmillerp commented Feb 8, 2026

What this does

This PR replaces LangGraph's direct-Postgres checkpointer with an HTTP-proxy checkpointer that routes all checkpoint operations through the agentex backend API.

Why

LangGraph agents need to persist state (checkpoints) between messages. The built-in approach (AsyncPostgresSaver) has each agent pod open its own Postgres connection pool. This doesn't scale — more agent pods means more connections, and we'd hit limits as we grow LangGraph usage. This is the same problem we already solved for Temporal: agents shouldn't talk to the DB directly. Instead, they go through the backend API, which manages a shared connection pool.

How it works

We implemented LangGraph's BaseCheckpointSaver abstract class — the same interface that AsyncPostgresSaver implements — but instead of running SQL queries, each method makes an HTTP POST to the backend:

BaseCheckpointSaver method Backend endpoint called
aget_tuple() POST /checkpoints/get-tuple
aput() POST /checkpoints/put
aput_writes() POST /checkpoints/put-writes
alist() POST /checkpoints/list
adelete_thread() POST /checkpoints/delete-thread

The HttpCheckpointSaver uses the existing AsyncAgentex httpx client (with EnvAuth for the agent API key), so authentication works the same way as every other SDK→backend call.

Serialization stays in the SDK — complex Python objects are serialized via LangGraph's serde, then base64-encoded for JSON transport. The backend just stores and retrieves the raw data.

What changed

  • New: HttpCheckpointSaver class — the HTTP-proxy checkpointer implementation
  • New: create_checkpointer() factory — returns an HttpCheckpointSaver wired up with the SDK client
  • Removed: langgraph-checkpoint-postgres and psycopg[binary] dependencies (no longer needed)
  • Added: langgraph-checkpoint (base types only — BaseCheckpointSaver, CheckpointTuple, etc.)

Impact on agents

Zero code changes. The create_checkpointer() API is unchanged — agents just call it and get back a checkpointer. The only difference is they no longer need DATABASE_URL in their environment.

Companion PR

Test plan

  • Pyright strict mode passes (0 errors)
  • Ruff lint + format passes
  • Manually tested end-to-end with langgraph agent: sent messages, confirmed checkpoints stored and conversation history restored across messages

🤖 Generated with Claude Code


async def _post(self, path: str, body: dict[str, Any]) -> Any:
"""POST JSON to the backend and return parsed response."""
resp = self._client._client.post( # noqa: SLF001
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the double _client call here ? can we remove that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stored as class variable!


@override
async def aget_tuple(self, config: RunnableConfig) -> CheckpointTuple | None:
configurable = config["configurable"] # type: ignore[reportTypedDictNotRequiredAccess]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we wanna make these [""] type safe to be get() so it doesn't panic in the code? not sure if this would be caught anywhere but just thought id call out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are needed by langgraph constructs, we should be failing loudly if they aren't there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using get() would produce None which would silence error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess my point is that this will throw a key value error that won't be returned nicely to the user vs. adding handlers for this ourselves / help the user fix it

not blocking, but just maybe something to think about if this could be an easy mistake when creating a langgraph agent using our sdk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that makes sense can wrap sure !!

Copy link
Contributor Author

@danielmillerp danielmillerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

@danielmillerp danielmillerp force-pushed the dm/checkpointer branch 8 times, most recently from 61d9316 to 4cb3d7f Compare February 12, 2026 02:55
Replace direct Postgres checkpointing with HTTP-proxied checkpoint
operations through the agentex backend API. Agents no longer need
DATABASE_URL or direct DB connections for LangGraph state persistence.

- Add HttpCheckpointSaver that proxies through AsyncAgentex client
- Add create_checkpointer() factory using the HTTP checkpointer
- Replace langgraph-checkpoint-postgres dep with langgraph-checkpoint
- Export checkpointer module from adk package

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants