Merged
Conversation
# Conflicts: # verifiers/scripts/eval.py # verifiers/types.py
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This was referenced Feb 14, 2026
snimu
added a commit
that referenced
this pull request
Feb 14, 2026
Adapt rlm_env.py to fully use the provider-agnostic types introduced in #897 (unified client interface): - Use flat ToolCall attributes (name, arguments) instead of nested function object dance - Return ToolMessage objects from _call_sub_tool instead of raw dicts - Use Client type annotation instead of Any for client parameters - Pass tool_defs directly to get_model_response instead of via state - Use typed AssistantMessage access in no_tools_called stop condition - Simplify _extract_tokens_from_response (remove dead dict code paths) - Fix SubLLMResult final_content type narrowing for MessageContent union Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 tasks
snimu
added a commit
that referenced
this pull request
Feb 15, 2026
* migrate rlm_env to unified client types Adapt rlm_env.py to fully use the provider-agnostic types introduced in #897 (unified client interface): - Use flat ToolCall attributes (name, arguments) instead of nested function object dance - Return ToolMessage objects from _call_sub_tool instead of raw dicts - Use Client type annotation instead of Any for client parameters - Pass tool_defs directly to get_model_response instead of via state - Use typed AssistantMessage access in no_tools_called stop condition - Simplify _extract_tokens_from_response (remove dead dict code paths) - Fix SubLLMResult final_content type narrowing for MessageContent union Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * restore prompt_state tool_defs as safety measure Restore setting prompt_state["tool_defs"] in _call_sub_llm_api alongside the new direct tool_defs kwarg pass. While both paths resolve equivalently through resolve_optional_args, keeping the state key is safer for any code that may read it downstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Based off of PR #788 which added provider-agnostic types for message/client/tool/response flow.
This PR replaces direct native clients (e.g.
AsyncOpenAI) and types (e.g. OAI-style tool message and response types) calls with a provider-agnosticvf.Clientadapter that convertsMessagesandTooldefinitions to native provider requests and normalizes outputs into a unifiedvf.Response(including usage, tool calls, and optional reasoning content).It adds first-class Anthropic support via
AnthropicMessagesClient, and integrates interleaved thinking support into the defaultOpenAIChatCompletionsClient.Client Interface
vf.Clientis the adapter layer that wraps a native SDK client and standardizes everything into avf.Response.Each client implementation defines four core methods:
to_native_prompt— Convertvf.Messagesinto provider-native prompt format.get_native_response— Execute the provider-native API call.raise_from_native_response(optional) — Map/raise provider-specific errors (e.g. overlong prompt).from_native_response— Convert provider-native output into unifiedvf.Response.We intentionally moved to custom unified types (instead of continuing to normalize to OpenAI-only types) because some provider features do not map cleanly to OAI schemas. The current type system is provider-agnostic and supports multimodal/reasoning/tool patterns across clients. We currently implement the following clients:
OpenAICompletionsClientOpenAIChatCompletionsClientOpenAIChatCompletionsTokenClientAnthropicMessagesClientThis architecture is extensible to additional providers/API surfaces (including future OAI Responses-style adapters).
Tests
gpt-4.1-minivia PI inference and OAI chat completions clientuv run vf-eval continuation-quality -n1 -r1 -d -vuv run vf-eval gsm8k -n1 -r1 -d -vuv run vf-eval wiki-search -n1 -r1 -d -vglm-4.7via PI inference and OAI chat completions clientuv run vf-eval continuation-quality -n1 -r1 -d -v -m glm-4.7uv run vf-eval gsm8k -n1 -r1 -d -v -m glm-4.7uv run vf-eval wiki-search -n1 -r1 -d -v -m glm-4.7Variety of models via native API
uv run vf-eval wiki-search -n1 -r1 -d -v -m deepseek-reasoner -b https://api.deepseek.com/v1 -k DEEPSEEK_API_KEYuv run vf-eval wiki-search -n1 -r1 -d -v -m kimi-k2.5 -b https://api.moonshot.ai/v1 -k MOONSHOT_API_KEYAgainst vLLM server
uv run inference --model.name Qwen/Qwen3-4B-Thinking-2507 --tensor-parallel-size 2 \ --tool-call-parser hermes \ --reasoning-parser deepseek_r1 \ --enable-auto-tool-choiceuv run vf-eval wiki-search -n1 -r1 -d -v -m Qwen/Qwen3-4B-Thinking-2507 -b http://localhost:8000/v1Type of Change
Testing
uv run pytestlocally.Checklist
Note
High Risk
Broad, breaking interface refactor across client invocation, tool schemas, and response types; mistakes can affect all model calls and endpoint/CLI configuration across providers.
Overview
Moves generation to a provider-agnostic client adapter: environments now call
Client.get_response()and receive a unifiedvf.Response, with typed tool definitions (tool_defs/Tool) replacing OpenAI-specificoai_toolsand legacy native response types.Adds first-class provider selection via
ClientConfig.client_type/ClientType, updates the endpoint registry and CLI to carryapi_client_type(withtypeshorthand in registries), and switches built-in Anthropic model aliases to directapi.anthropic.comwithANTHROPIC_API_KEYwhile adding DeepSeek endpoints.Updates OpenEnv prompt renderers to return typed
Messages, replaces OpenAI-specific test mocking with a newMockClient, and adds focused tests for auth/overlong-prompt error handling, multimodal prompt conversions, interception serialization, and message normalization; docs/reference and eval/development/test docs are updated, andanthropicis added as a dependency.Written by Cursor Bugbot for commit 7ecbd40. This will update automatically on new commits. Configure here.