Conversation
…re/59787-merge-events
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
lan17
left a comment
There was a problem hiding this comment.
I think there are still two blockers before we merge this. I left inline notes on the partial sink integration and the unconditional reconstruction work on the evaluation hot path.
| ) | ||
|
|
||
|
|
||
| async def check_evaluation( |
There was a problem hiding this comment.
This still leaves the merged-event path only partially integrated across the public SDK surface. If a caller registers set_control_event_sink() but uses the public check_evaluation() helper, we just POST and return the parsed result without setting X-Agent-Control-Merge-Events or reconstructing/emitting any events. I think we need to either wire the sink behavior through this helper too, or explicitly scope the sink API away from this entry point.
There was a problem hiding this comment.
Good catch — you’re right. check_evaluation() currently bypasses the merged-event path even when a control event sink is registered, which makes the public SDK surface inconsistent. I think the right fix is to wire the sink behavior through this helper as well: when a sink is present, resolve trace/span IDs, send X-Agent-Control-Merge-Events: true, reconstruct the server events from the lightweight response plus cached control definitions, and emit them through the sink before returning the parsed result. That keeps the default behavior unchanged while making merged-event handling consistent across both public evaluation entry points.
There was a problem hiding this comment.
This comment is no longer valid. I have re-worked this PR to only contain merge model support
| local_control_lookup = { | ||
| control.id: control.control for control in applicable_local_controls | ||
| } | ||
| local_events = build_control_execution_events( |
There was a problem hiding this comment.
Can we avoid reconstructing events unless something will actually consume them? We now always build local events here, and we do the same for server events later, even when merged emission is off and even when OSS observability is disabled. Since evaluation is a latency-sensitive hot path, I don't think we should pay this parsing/allocation cost unless either the existing observability path is active or a merged-event sink is registered.
There was a problem hiding this comment.
I’m tightening the hot path so we only reconstruct events when they’ll actually be consumed: local events only when the default SDK observability path is enabled or a sink is registered, and server events only when the merged-event path is active.
lan17
left a comment
There was a problem hiding this comment.
Blocking issue:
server/src/agent_control_server/endpoints/evaluation.py:242-263now trustsX-Agent-Control-Merge-Eventsfrom the caller and skips server-side observability ingestion solely on that header. A direct API caller can set the header and suppress control-execution events without reconstructing/re-emitting them anywhere. The server needs to treat merged mode as a trusted SDK flow, or otherwise keep the default ingestion path for untrusted callers.
| if not merged_emission_enabled: | ||
| enqueue_observability_events(local_events) |
There was a problem hiding this comment.
When merged_emission_enabled is true, local_events are built but not enqueued here. If the subsequent server POST raises (timeout, connection error, HTTP error), the function exits without ever calling enqueue_observability_events(local_events) — the local controls ran successfully but their observability records are silently lost.
Consider wrapping the server call in try/except and enqueuing local_events before re-raising, so locally-evaluated controls always produce observability records regardless of server availability.
There was a problem hiding this comment.
Thanks . Implemented. In the merged local+server path, check_evaluation_with_local() now preserves SDK-local observability even if the subsequent server call fails. The server POST is wrapped in try/except, and if local controls already produced local_events, those events are enqueued before the exception is re-raised. That keeps the original failure behavior for evaluation while avoiding silent loss of local control records.
I also added a regression test covering the case where merged mode is enabled, local evaluation succeeds, the server POST raises, and only the reconstructed local events are enqueued before the error bubbles up.
| timestamp=now, | ||
| evaluator_name=evaluator_name, | ||
| selector_path=selector_path, | ||
| error_message=match.result.error if not matched else None, |
There was a problem hiding this comment.
This expression uses a single matched boolean for all three categories (matches, errors, non_matches), but the server-side _build_observability_events handles them differently: matches keep result.error, non-matches force None. When merge mode reconstructs server events through this helper, the error_message values will diverge from what the server would have produced.
This existed before for local-only events, but merge mode now routes server events through this path too, making parity with the server more important. Consider passing error_message per-category to match server behavior.
There was a problem hiding this comment.
Fixed. The shared SDK event builder now matches the server’s category-specific error_message behavior: matches and errors keep result.error, while non_matches always set error_message=None, so merged-mode reconstruction stays aligned with the server path.
| if not isinstance(trace_id, str) or not isinstance(span_id, str): | ||
| return None |
There was a problem hiding this comment.
The isinstance(trace_id, str) / isinstance(span_id, str) guard was removed here. TypedDict provides no runtime enforcement, so a buggy provider returning e.g. int or bytes now passes the truthiness check and flows into HTTP headers and event fields that expect strings.
The removal looks incidental to the PR's goals — consider restoring it as a low-cost defensive check.
There was a problem hiding this comment.
Fixed. I restored the defensive runtime type check in trace_context.py, so providers returning non-string trace_id or span_id are rejected again before those values can flow into headers or event fields.
|
@lan17 Fixed. The server no longer trusts X-Agent-Control-Merge-Events by itself. Merged mode is now established during init(..., merge_events=True) / initAgent, and the server records that enablement for the initialized (authenticated client, agent) session. On /evaluation, the server only skips observability ingestion if that same client previously initialized that same agent with merge-events enabled; otherwise it keeps the default ingestion path. Since merge_events defaults to False, existing behavior remains unchanged unless merged mode is explicitly turned on. |
lan17
left a comment
There was a problem hiding this comment.
Re-reviewed current head 9af1755.
The original header-trust issue looks fixed, but there is still one blocker:
server/src/agent_control_server/merge_event_sessions.py:28-40scopes merge-enable state by(client.api_key, agent_name). For session-cookie auth,_authenticate_via_cookie()returnsAuthenticatedClient(api_key="", ...)inserver/src/agent_control_server/auth.py:118-121, so every cookie-authenticated browser session collapses to the same key("", agent_name). If one logged-in UI session initializes an agent with merge events enabled, any other logged-in UI session can sendX-Agent-Control-Merge-Events: trueand suppress server-side ingestion for that agent. The trusted-session key needs to use a per-session or otherwise user-specific identifier, not the emptyapi_keysentinel.
|
Stepping back from the line-level issues, I think the direction here is reasonable, but the architecture is taking on too much implicit session and state coupling too quickly. Separating evaluation semantics from observability event delivery is the right idea, and extracting shared event reconstruction logic is a real improvement over duplicating emission behavior in multiple places. Where this starts to feel brittle is that the behavior of a single evaluation request now depends on prior For a v1, I would be comfortable with a narrower version of this. I would keep merged event creation scoped only to the local-first SDK path, keep What I do not think we should do yet is spread this across multiple entry points, multiple trust mechanisms, and alternate delivery sinks at the same time. Once merge mode works in a way that is explicit, robust across auth modes and deployment topologies, and easy to reason about end to end, then adding a configurable sink is a straightforward follow-on. Right now So my recommendation would be: land a smaller, clearer version of merged creation first, prove that the ownership model is correct, and only then generalize it. I think the core idea is good. I just do not think we should lock in this much flexibility before the underlying contract is simpler and more stable. |
Summary
Behavior
What changed
Testing