feat: ground truth optimization path by andrewklatzke · Pull Request #122 · launchdarkly/python-server-sdk-ai

andrewklatzke · 2026-04-03T22:13:05Z

Requirements

I have added test coverage for new or changed functionality
I have followed the repository's pull request submission guidelines
I have validated my changes against all supported platform versions

Describe the solution you've provided

Implements the "ground truth" path for the SDK optimizations. The existing path optimizes through a form of "chaos testing" where inputs are randomly selected and then judged based solely on the result of the output + acceptance statements + judges.

This path requires the user to pass additional data (ground_truth_responses) and the results of the calls are further compared to the results of expected responses. In the original mode, we iterate until we reach a passing result, but in this one we iterate through all N responses to collect a set of pass/fail metrics and use those instead. The history of the pass/fail/scores/rationale from that set of samples is then passed to the LLM for the optimization. Once a new variation is generated, it runs through all N samples again to ensure they're all passing.

Describe alternatives you've considered

We discussed doing this in other ways, such as:

Sampling the ground truth options and only running a subset until passing (does not ensure that it passes for all entries, leads to possible overfitting to a single item)
Only confirming that it passes on some subset of the items -- has the same issue as above

Additional context

Implementation when pulling from a config looks like:

    options = OptimizationFromConfigOptions(
        project_key="default",
        context_choices=[context_builder("user-123")],
        handle_agent_call=handle_agent_call,
        handle_judge_call=handle_judge_call,
        base_url="https://ld-stg.launchdarkly.com/"
    )

    result = await client.optimize_from_config("ground-truth-optimization", options)

Result from a simple optimization:

Note

Medium Risk
Adds a new multi-sample optimization loop and changes judge prompting to incorporate optional ground-truth expected responses, which can affect optimization outcomes and run behavior. Also introduces stricter validation (missing instructions/model fallback) that may change failure modes for existing consumers.

Overview
Adds a ground-truth optimization mode that evaluates an agent against an ordered list of samples each attempt and only succeeds when all samples pass, generating a new variation and re-running the full batch until max_attempts.

Introduces GroundTruthSample / GroundTruthOptimizationOptions, exports them publicly, and updates optimize_from_config to auto-detect groundTruthResponses and dispatch to the new ground-truth run (zipping groundTruthResponses + userInputOptions + variableChoices, with length validation).

Updates judge evaluation to accept an optional expected_response and inject it into both config-judge templates and acceptance-judge user messages, and tightens runtime validation by erroring on missing agent instructions plus seeding a default model from model_choices when the flag has none. Extensive new tests cover the new dataclasses, batch loop behavior/callbacks, config dispatch, and expected-response injection.

^{Reviewed by Cursor Bugbot for commit 44c8c59. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 44c8c59. Configure here.}

cursor · 2026-04-03T22:18:04Z

packages/optimization/src/ldai_optimization/client.py

            config, options, api_client, optimization_id, run_id
        )
+        if isinstance(optimization_options, GroundTruthOptimizationOptions):
+            return await self._run_ground_truth_optimization(agent_config, optimization_options)


optimize_from_config silently returns incompatible types

Medium Severity

optimize_from_config now returns either a single OptimizationContext or a List[OptimizationContext] depending on whether the remote config contains ground truth responses. The docstring still says it returns "OptimizationContext from the final iteration". Existing callers accessing attributes like result.completion_response will get an AttributeError if the remote config is updated to include ground truth, since the return silently changes to a list. The caller has no way to predict the return type before calling since the config is fetched internally.

Additional Locations (1)

packages/optimization/src/ldai_optimization/client.py#L1208-L1209

^{Reviewed by Cursor Bugbot for commit 44c8c59. Configure here.}

cursor · 2026-04-03T22:18:04Z

packages/optimization/src/ldai_optimization/dataclasses.py

        if len(self.model_choices) < 1:
            raise ValueError("model_choices must have at least 1 model")
+        if len(self.ground_truth_responses) < 1:
+            raise ValueError("ground_truth_responses must have at least 1 sample")


Missing judge_model validation in ground truth options

Low Severity

GroundTruthOptimizationOptions.__post_init__ is missing the judge_model is None validation that OptimizationOptions.__post_init__ has. While judge_model is typed as str, Python doesn't enforce this at runtime. Passing None would not be caught until the bridge OptimizationOptions is constructed internally, producing a confusing error referencing the wrong class.

^{Reviewed by Cursor Bugbot for commit 44c8c59. Configure here.}

feat: ground truth optimization path

44c8c59

andrewklatzke requested a review from jsonbailey April 3, 2026 22:13

andrewklatzke requested a review from a team as a code owner April 3, 2026 22:13

cursor bot reviewed Apr 3, 2026

View reviewed changes

jsonbailey approved these changes Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ground truth optimization path#122

feat: ground truth optimization path#122
andrewklatzke wants to merge 1 commit intoaklatzke/AIC-1794/optimize-method-from-ldfrom
aklatzke/AIC-1795/optimize-method-ground-truth-path

andrewklatzke commented Apr 3, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 3, 2026

Uh oh!

cursor bot Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andrewklatzke commented Apr 3, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

optimize_from_config silently returns incompatible types

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

Missing judge_model validation in ground truth options

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrewklatzke commented Apr 3, 2026 •

edited by cursor bot

Loading

`optimize_from_config` silently returns incompatible types

Missing `judge_model` validation in ground truth options