Skip to content

unified client interface#897

Merged
mikasenghaas merged 79 commits intomainfrom
mika/anthropic-client
Feb 14, 2026
Merged

unified client interface#897
mikasenghaas merged 79 commits intomainfrom
mika/anthropic-client

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas commented Feb 11, 2026

Description

Based off of PR #788 which added provider-agnostic types for message/client/tool/response flow.

This PR replaces direct native clients (e.g. AsyncOpenAI) and types (e.g. OAI-style tool message and response types) calls with a provider-agnostic vf.Client adapter that converts Messages and Tool definitions to native provider requests and normalizes outputs into a unified vf.Response (including usage, tool calls, and optional reasoning content).

It adds first-class Anthropic support via AnthropicMessagesClient, and integrates interleaved thinking support into the default OpenAIChatCompletionsClient.

Client Interface

vf.Client is the adapter layer that wraps a native SDK client and standardizes everything into a vf.Response.

Each client implementation defines four core methods:

  1. to_native_prompt — Convert vf.Messages into provider-native prompt format.
  2. get_native_response — Execute the provider-native API call.
  3. raise_from_native_response (optional) — Map/raise provider-specific errors (e.g. overlong prompt).
  4. from_native_response — Convert provider-native output into unified vf.Response.

We intentionally moved to custom unified types (instead of continuing to normalize to OpenAI-only types) because some provider features do not map cleanly to OAI schemas. The current type system is provider-agnostic and supports multimodal/reasoning/tool patterns across clients. We currently implement the following clients:

  • OpenAICompletionsClient
  • OpenAIChatCompletionsClient
  • OpenAIChatCompletionsTokenClient
  • AnthropicMessagesClient

This architecture is extensible to additional providers/API surfaces (including future OAI Responses-style adapters).

Tests

gpt-4.1-mini via PI inference and OAI chat completions client

  • uv run vf-eval continuation-quality -n1 -r1 -d -v
  • uv run vf-eval gsm8k -n1 -r1 -d -v
  • uv run vf-eval wiki-search -n1 -r1 -d -v

glm-4.7 via PI inference and OAI chat completions client

  • uv run vf-eval continuation-quality -n1 -r1 -d -v -m glm-4.7
  • uv run vf-eval gsm8k -n1 -r1 -d -v -m glm-4.7
  • uv run vf-eval wiki-search -n1 -r1 -d -v -m glm-4.7

Variety of models via native API

  • uv run vf-eval wiki-search -n1 -r1 -d -v -m deepseek-reasoner -b https://api.deepseek.com/v1 -k DEEPSEEK_API_KEY
  • uv run vf-eval wiki-search -n1 -r1 -d -v -m kimi-k2.5 -b https://api.moonshot.ai/v1 -k MOONSHOT_API_KEY

Against vLLM server

uv run inference --model.name Qwen/Qwen3-4B-Thinking-2507 --tensor-parallel-size 2 \
     --tool-call-parser hermes \
     --reasoning-parser deepseek_r1 \
     --enable-auto-tool-choice
  • uv run vf-eval wiki-search -n1 -r1 -d -v -m Qwen/Qwen3-4B-Thinking-2507 -b http://localhost:8000/v1

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

High Risk
Broad, breaking interface refactor across client invocation, tool schemas, and response types; mistakes can affect all model calls and endpoint/CLI configuration across providers.

Overview
Moves generation to a provider-agnostic client adapter: environments now call Client.get_response() and receive a unified vf.Response, with typed tool definitions (tool_defs/Tool) replacing OpenAI-specific oai_tools and legacy native response types.

Adds first-class provider selection via ClientConfig.client_type/ClientType, updates the endpoint registry and CLI to carry api_client_type (with type shorthand in registries), and switches built-in Anthropic model aliases to direct api.anthropic.com with ANTHROPIC_API_KEY while adding DeepSeek endpoints.

Updates OpenEnv prompt renderers to return typed Messages, replaces OpenAI-specific test mocking with a new MockClient, and adds focused tests for auth/overlong-prompt error handling, multimodal prompt conversions, interception serialization, and message normalization; docs/reference and eval/development/test docs are updated, and anthropic is added as a dependency.

Written by Cursor Bugbot for commit 7ecbd40. This will update automatically on new commits. Configure here.

# Conflicts:
#	verifiers/scripts/eval.py
#	verifiers/types.py
@mikasenghaas mikasenghaas changed the title feat(clients): unified client interface with Anthropic support unified client interface Feb 11, 2026
Comment thread configs/endpoints.py Outdated
Comment thread verifiers/clients/__init__.py
Comment thread verifiers/envs/environment.py Outdated
Comment thread verifiers/envs/environment.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread docs/evaluation.md
Comment thread verifiers/clients/anthropic_messages_client.py
@mikasenghaas mikasenghaas mentioned this pull request Feb 11, 2026
13 tasks
@mikasenghaas mikasenghaas merged commit 191c516 into main Feb 14, 2026
6 checks passed
snimu added a commit that referenced this pull request Feb 14, 2026
Adapt rlm_env.py to fully use the provider-agnostic types introduced
in #897 (unified client interface):

- Use flat ToolCall attributes (name, arguments) instead of nested
  function object dance
- Return ToolMessage objects from _call_sub_tool instead of raw dicts
- Use Client type annotation instead of Any for client parameters
- Pass tool_defs directly to get_model_response instead of via state
- Use typed AssistantMessage access in no_tools_called stop condition
- Simplify _extract_tokens_from_response (remove dead dict code paths)
- Fix SubLLMResult final_content type narrowing for MessageContent union

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@snimu snimu mentioned this pull request Feb 14, 2026
13 tasks
snimu added a commit that referenced this pull request Feb 15, 2026
* migrate rlm_env to unified client types

Adapt rlm_env.py to fully use the provider-agnostic types introduced
in #897 (unified client interface):

- Use flat ToolCall attributes (name, arguments) instead of nested
  function object dance
- Return ToolMessage objects from _call_sub_tool instead of raw dicts
- Use Client type annotation instead of Any for client parameters
- Pass tool_defs directly to get_model_response instead of via state
- Use typed AssistantMessage access in no_tools_called stop condition
- Simplify _extract_tokens_from_response (remove dead dict code paths)
- Fix SubLLMResult final_content type narrowing for MessageContent union

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* restore prompt_state tool_defs as safety measure

Restore setting prompt_state["tool_defs"] in _call_sub_llm_api
alongside the new direct tool_defs kwarg pass. While both paths
resolve equivalently through resolve_optional_args, keeping the
state key is safer for any code that may read it downstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants