Skip to content

ethpandaops/vllm-agent-sdk-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vllm-agent-sdk-go

Go SDK for building agentic applications backed by a local or self-hosted vLLM OpenAI-compatible server.

  • Package: vllmsdk
  • Default backend: http://127.0.0.1:8000/v1

Install

go get github.com/ethpandaops/vllm-agent-sdk-go

Configuration

The SDK resolves configuration from explicit options first, then environment variables, then defaults.

Environment Variables

Variable Description Default
VLLM_BASE_URL vLLM server base URL http://127.0.0.1:8000/v1
VLLM_API_KEY Bearer auth token (optional, only if your server enforces auth) (none)
VLLM_MODEL Model name (none — must be set via env or WithModel())
VLLM_AGENT_SESSION_STORE_PATH Local session store directory (none)

Example-only variables (not resolved by the core SDK):

Variable Description Default
VLLM_IMAGE_MODEL Image-capable model for multimodal examples QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ
VLLM_VISION_MODEL Vision model for multimodal input examples Falls back to VLLM_IMAGE_MODEL, then VLLM_MODEL
VLLM_IMAGE_OUTPUT_DIR Directory for saving generated images (none)

Option Precedence

All settings follow the same resolution order:

  1. Explicit option (e.g. WithBaseURL(...), WithAPIKey(...), WithModel(...))
  2. Environment variable (VLLM_BASE_URL, VLLM_API_KEY, VLLM_MODEL)
  3. Built-in default (where applicable)

Developer Workflow

The repo ships a sibling-style Makefile:

  • make test runs race-enabled package tests with coverage output.
  • make test-integration runs ./integration/... with -tags=integration.
  • make audit runs the aggregate quality gate.

Integration setup:

  • Set VLLM_BASE_URL or default to http://127.0.0.1:8000/v1.
  • Set VLLM_MODEL to the model served by your vLLM instance.
  • Set VLLM_API_KEY if your vLLM server enforces bearer auth.
  • Integration tests skip when the local vLLM server is unavailable.

Quick Start

package main

import (
	"context"
	"fmt"
	"time"

	vllmsdk "github.com/ethpandaops/vllm-agent-sdk-go"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()

	// Model resolved from VLLM_MODEL env var, or set explicitly:
	for msg, err := range vllmsdk.Query(
		ctx,
		vllmsdk.Text("Write a two-line haiku about Go concurrency."),
		// vllmsdk.WithModel("QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ"),
	) {
		if err != nil {
			panic(err)
		}

		if result, ok := msg.(*vllmsdk.ResultMessage); ok && result.Result != nil {
			fmt.Println(*result.Result)
		}
	}
}

Surface

  • Query(ctx, content, ...opts) and QueryStream(...) return iter.Seq2[Message, error].
  • NewClient() exposes Start, StartWithContent, StartWithStream, Query, ReceiveMessages, ReceiveResponse, Interrupt, SetPermissionMode, SetModel, ListModels, ListModelsResponse, GetMCPStatus, RewindFiles, and Close.
  • Unsupported peer-parity controls such as ReconnectMCPServer, ToggleMCPServer, StopTask, and SendToolResult are present on Client and return typed UnsupportedControlErrors.
  • UserMessageContent is the canonical input shape. Use Text(...) for text-only calls and Blocks(...) with ImageInput(...), FileInput(...), AudioInput(...), or VideoInput(...) for multimodal chat-completions requests.
  • WithSDKTools(...) registers high-level in-process tools under mcp__sdk__<name>.
  • WithOnUserInput(...) handles SDK-owned user-input prompts built on top of tool calling.
  • ListModels(...) and ListModelsResponse(...) use vLLM model discovery via /v1/models.
  • StatSession(...), ListSessions(...), and GetSessionMessages(...) operate on the SDK's local persisted session store.

Model Discovery

  • Discovery uses /v1/models.
  • Returned ModelInfo values are projected from the OpenAI-compatible model cards that vLLM serves, so provider-rich VLLM metadata is no longer guaranteed.
  • ModelInfo still exposes helper methods such as CostTier(), SupportsToolCalling(), SupportsStructuredOutput(), SupportsReasoning(), SupportsImageInput(), SupportsImageOutput(), SupportsWebSearch(), SupportsPromptCaching(), MaxContextLength(), and parsed pricing helpers.

Image Output

  • Generated images are surfaced as *ImageBlock values inside AssistantMessage.Content.
  • ImageBlock.Decode() returns raw bytes plus media type for data-URL-backed images.
  • ImageBlock.Save(path) writes generated images to disk.
  • Live image-generation coverage is available behind the integration build tag when VLLM_IMAGE_MODEL is set.

Multimodal Input

Multimodal input in this SDK is block-based and targets the vLLM OpenAI-compatible chat surface.

content := vllmsdk.Blocks(
	vllmsdk.TextInput("Compare these two screenshots and the attached spec file."),
	vllmsdk.ImageInput("https://example.com/before.png"),
	vllmsdk.ImageInput("data:image/png;base64,..."),
	vllmsdk.FileInput("spec.pdf", "data:application/pdf;base64,..."),
)

for msg, err := range vllmsdk.Query(ctx, content,
	// vllmsdk.WithModel("QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ"),
) {
	_ = msg
	_ = err
}
  • ImageInput(...) accepts a normal URL or a base64 data URL.
  • FileInput(...) accepts a filename plus file_data URL/data URL.
  • AudioInput(...) accepts base64 audio data plus a format.
  • VideoInput(...) accepts a normal URL or a data URL.
  • Responses mode is routed to the vLLM /v1/responses surface when selected.

Session Semantics

Session APIs are local SDK APIs, not remote vLLM server sessions.

  • They read from the SDK session store configured with WithSessionStorePath(...) or VLLM_AGENT_SESSION_STORE_PATH.
  • They do not derive from chat session_id.
  • They do not derive from Responses previous_response_id.

Unsupported Controls

vLLM does not have meaningful backend equivalents for some sibling control-plane methods. The SDK exposes those methods where peer parity matters, but they fail explicitly with UnsupportedControlError instead of faking semantics.

Observability

The SDK provides opt-in OpenTelemetry metrics and distributed tracing. When no provider is configured all recording is a pure noop — zero overhead.

Options

Option Description
WithMeterProvider(mp) Sets an OTel metric.MeterProvider for SDK metrics
WithTracerProvider(tp) Sets an OTel trace.TracerProvider for SDK spans
WithPrometheusRegisterer(reg) Convenience: creates an OTel MeterProvider backed by a Prometheus Registerer

Metrics

GenAI semantic convention metrics:

Metric Type Description
gen_ai.client.operation.duration Histogram (s) Duration of query operations
gen_ai.client.token.usage Counter Token usage by type (input/output)
gen_ai.client.time_to_first_token Histogram (s) Time to first content token
gen_ai.client.time_per_output_token Histogram (s) Inter-token arrival time

vLLM-specific metrics:

Metric Type Description
vllm.http.requests Counter HTTP requests by status class and retry
vllm.tool.calls Counter Tool calls by name and outcome
vllm.tool.duration Histogram (s) Tool call duration
vllm.checkpoint.operations Counter Checkpoint create/restore operations
vllm.model.load_errors Counter Model listing errors
vllm.hook.duration Histogram (s) Hook execution duration by event

Spans

Span Kind Description
gen_ai.query Client Root span per Query/QueryStream call
gen_ai.stream Internal Streaming request (child of query)
http.request Client Individual HTTP request
tool.execute Internal Tool invocation
hook.run Internal Hook dispatch
vllm.list_models Client Model listing HTTP call

Prometheus Example

reg := prometheus.NewRegistry()

for msg, err := range vllmsdk.Query(ctx,
    vllmsdk.Text("Hello"),
    vllmsdk.WithPrometheusRegisterer(reg),
    vllmsdk.WithModel("my-model"),
) {
    // ...
}

// Serve metrics
http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{}))

See examples/prometheus_observability for a complete working example.

Examples

Runnable examples live under examples.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors