Skip to content

feat(sdk): 59787 merge events#155

Open
namrataghadi-galileo wants to merge 23 commits intomainfrom
feature/59787-merge-events
Open

feat(sdk): 59787 merge events#155
namrataghadi-galileo wants to merge 23 commits intomainfrom
feature/59787-merge-events

Conversation

@namrataghadi-galileo
Copy link
Copy Markdown
Contributor

@namrataghadi-galileo namrataghadi-galileo commented Mar 31, 2026

Summary

  • add an optional SDK-side merged event-creation mode so local and server ControlExecutionEvents can be created together after evaluation
  • keep event creation separate from evaluation by reconstructing events after evaluation results are available, instead of sending full event payloads over the wire
  • reduce wire transfer for the merged path by keeping server evaluation responses focused on evaluation semantics and reconstructing server events in the SDK
  • preserve event identity consistency during reconstruction by using ControlDefinition.observability_identity() for composite conditions
  • keep this PR focused on event creation only by continuing to use the existing built-in SDK/server ingestion path

Behavior

  • default behavior remains unchanged: local events are reconstructed in the SDK and enqueued through the existing SDK observability pipeline, while server-side evaluation still builds and ingests its own events on the server
  • when merge_events=True is enabled for the initialized SDK session, the SDK switches to the merged event-creation path: it reconstructs local and server events after evaluation, sends X-Agent-Control-Merge-Events: true to the server, and enqueues one merged batch through the existing observability pipeline
  • server-side merge mode skips final server ingestion but still returns the same evaluation semantics, so the SDK can merge results and reconstruct the final event batch locally
  • trace/span correlation is preserved in both paths through the tracing resolver and reconstructed control metadata

What changed

  • added shared SDK event reconstruction helpers in evaluation_events.py
  • updated evaluation.py to support:
  • local reconstruction + default enqueue path
  • optional merged event-creation mode gated by initialized session state
  • SDK reconstruction of server events from the lean server response plus cached control definitions
  • fallback to default behavior when the request does not match the active initialized session
  • updated /Users/namrataghadi/code/agentcontrol/agent-control/sdks/python/src/agent_control/init.py and _state.py to add a session-scoped merge_events mode
  • updated evaluation.py so the server:
  • still ingests events in the default path
  • skips final ingestion when X-Agent-Control-Merge-Events: true is set
  • only builds server observability events when observability is enabled

Testing

  • SDK and server tests were updated to cover:
  • default local enqueue behavior
  • merged event-creation behavior
  • provider-backed trace/span propagation
  • fallback when merged mode is requested from the wrong initialized session/client
  • server merge-mode skip-ingest behavior
  • server no-op behavior when observability is disabled

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 91.26984% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
sdks/python/src/agent_control/evaluation.py 91.54% 6 Missing ⚠️
sdks/python/src/agent_control/evaluation_events.py 89.79% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@lan17 lan17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are still two blockers before we merge this. I left inline notes on the partial sink integration and the unconditional reconstruction work on the evaluation hot path.

)


async def check_evaluation(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still leaves the merged-event path only partially integrated across the public SDK surface. If a caller registers set_control_event_sink() but uses the public check_evaluation() helper, we just POST and return the parsed result without setting X-Agent-Control-Merge-Events or reconstructing/emitting any events. I think we need to either wire the sink behavior through this helper too, or explicitly scope the sink API away from this entry point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you’re right. check_evaluation() currently bypasses the merged-event path even when a control event sink is registered, which makes the public SDK surface inconsistent. I think the right fix is to wire the sink behavior through this helper as well: when a sink is present, resolve trace/span IDs, send X-Agent-Control-Merge-Events: true, reconstruct the server events from the lightweight response plus cached control definitions, and emit them through the sink before returning the parsed result. That keeps the default behavior unchanged while making merged-event handling consistent across both public evaluation entry points.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer valid. I have re-worked this PR to only contain merge model support

local_control_lookup = {
control.id: control.control for control in applicable_local_controls
}
local_events = build_control_execution_events(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid reconstructing events unless something will actually consume them? We now always build local events here, and we do the same for server events later, even when merged emission is off and even when OSS observability is disabled. Since evaluation is a latency-sensitive hot path, I don't think we should pay this parsing/allocation cost unless either the existing observability path is active or a merged-event sink is registered.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m tightening the hot path so we only reconstruct events when they’ll actually be consumed: local events only when the default SDK observability path is enabled or a sink is registered, and server events only when the merged-event path is active.

Copy link
Copy Markdown
Contributor

@lan17 lan17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issue:

  • server/src/agent_control_server/endpoints/evaluation.py:242-263 now trusts X-Agent-Control-Merge-Events from the caller and skips server-side observability ingestion solely on that header. A direct API caller can set the header and suppress control-execution events without reconstructing/re-emitting them anywhere. The server needs to treat merged mode as a trusted SDK flow, or otherwise keep the default ingestion path for untrusted callers.

Comment on lines +417 to +418
if not merged_emission_enabled:
enqueue_observability_events(local_events)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When merged_emission_enabled is true, local_events are built but not enqueued here. If the subsequent server POST raises (timeout, connection error, HTTP error), the function exits without ever calling enqueue_observability_events(local_events) — the local controls ran successfully but their observability records are silently lost.

Consider wrapping the server call in try/except and enqueuing local_events before re-raising, so locally-evaluated controls always produce observability records regardless of server availability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks . Implemented. In the merged local+server path, check_evaluation_with_local() now preserves SDK-local observability even if the subsequent server call fails. The server POST is wrapped in try/except, and if local controls already produced local_events, those events are enqueued before the exception is re-raised. That keeps the original failure behavior for evaluation while avoiding silent loss of local control records.

I also added a regression test covering the case where merged mode is enabled, local evaluation succeeds, the server POST raises, and only the reconstructed local events are enqueued before the error bubbles up.

timestamp=now,
evaluator_name=evaluator_name,
selector_path=selector_path,
error_message=match.result.error if not matched else None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expression uses a single matched boolean for all three categories (matches, errors, non_matches), but the server-side _build_observability_events handles them differently: matches keep result.error, non-matches force None. When merge mode reconstructs server events through this helper, the error_message values will diverge from what the server would have produced.

This existed before for local-only events, but merge mode now routes server events through this path too, making parity with the server more important. Consider passing error_message per-category to match server behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The shared SDK event builder now matches the server’s category-specific error_message behavior: matches and errors keep result.error, while non_matches always set error_message=None, so merged-mode reconstruction stays aligned with the server path.

Comment on lines -41 to -42
if not isinstance(trace_id, str) or not isinstance(span_id, str):
return None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isinstance(trace_id, str) / isinstance(span_id, str) guard was removed here. TypedDict provides no runtime enforcement, so a buggy provider returning e.g. int or bytes now passes the truthiness check and flows into HTTP headers and event fields that expect strings.

The removal looks incidental to the PR's goals — consider restoring it as a low-cost defensive check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I restored the defensive runtime type check in trace_context.py, so providers returning non-string trace_id or span_id are rejected again before those values can flow into headers or event fields.

@namrataghadi-galileo
Copy link
Copy Markdown
Contributor Author

@lan17 Fixed. The server no longer trusts X-Agent-Control-Merge-Events by itself. Merged mode is now established during init(..., merge_events=True) / initAgent, and the server records that enablement for the initialized (authenticated client, agent) session. On /evaluation, the server only skips observability ingestion if that same client previously initialized that same agent with merge-events enabled; otherwise it keeps the default ingestion path. Since merge_events defaults to False, existing behavior remains unchanged unless merged mode is explicitly turned on.

Copy link
Copy Markdown
Contributor

@lan17 lan17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed current head 9af1755.

The original header-trust issue looks fixed, but there is still one blocker:

  • server/src/agent_control_server/merge_event_sessions.py:28-40 scopes merge-enable state by (client.api_key, agent_name). For session-cookie auth, _authenticate_via_cookie() returns AuthenticatedClient(api_key="", ...) in server/src/agent_control_server/auth.py:118-121, so every cookie-authenticated browser session collapses to the same key ("", agent_name). If one logged-in UI session initializes an agent with merge events enabled, any other logged-in UI session can send X-Agent-Control-Merge-Events: true and suppress server-side ingestion for that agent. The trusted-session key needs to use a per-session or otherwise user-specific identifier, not the empty api_key sentinel.

@lan17
Copy link
Copy Markdown
Contributor

lan17 commented Apr 2, 2026

Stepping back from the line-level issues, I think the direction here is reasonable, but the architecture is taking on too much implicit session and state coupling too quickly. Separating evaluation semantics from observability event delivery is the right idea, and extracting shared event reconstruction logic is a real improvement over duplicating emission behavior in multiple places. Where this starts to feel brittle is that the behavior of a single evaluation request now depends on prior init(...)/initAgent side effects, cached control definitions in the SDK, special headers, and server-side trust state. That is a lot of hidden coordination for a latency-sensitive path, and it makes correctness and debuggability harder than it should be.

For a v1, I would be comfortable with a narrower version of this. I would keep merged event creation scoped only to the local-first SDK path, keep check_evaluation() on the plain server-owned flow, and keep the built-in queue as the only delivery mechanism for now. In other words, make one explicit opt-in contract that says the SDK owns creation of the merged batch for this initialized session, and avoid broadening the public surface area until that protocol is solid. That still gets the main benefit of this work without committing us to a more general session-plus-delivery framework yet.

What I do not think we should do yet is spread this across multiple entry points, multiple trust mechanisms, and alternate delivery sinks at the same time. Once merge mode works in a way that is explicit, robust across auth modes and deployment topologies, and easy to reason about end to end, then adding a configurable sink is a straightforward follow-on. Right now #160 feels like it is layering a flexible delivery abstraction on top of a control plane that is still settling.

So my recommendation would be: land a smaller, clearer version of merged creation first, prove that the ownership model is correct, and only then generalize it. I think the core idea is good. I just do not think we should lock in this much flexibility before the underlying contract is simpler and more stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants