vllm-agent-sdk-go

Go SDK for building agentic applications backed by a local or self-hosted vLLM OpenAI-compatible server.

Package: vllmsdk
Default backend: http://127.0.0.1:8000/v1

Install

go get github.com/ethpandaops/vllm-agent-sdk-go

Configuration

The SDK resolves configuration from explicit options first, then environment variables, then defaults.

Environment Variables

Variable	Description	Default
`VLLM_BASE_URL`	vLLM server base URL	`http://127.0.0.1:8000/v1`
`VLLM_API_KEY`	Bearer auth token (optional, only if your server enforces auth)	(none)
`VLLM_MODEL`	Model name	(none — must be set via env or `WithModel()`)
`VLLM_AGENT_SESSION_STORE_PATH`	Local session store directory	(none)

Example-only variables (not resolved by the core SDK):

Variable	Description	Default
`VLLM_IMAGE_MODEL`	Image-capable model for multimodal examples	`QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ`
`VLLM_VISION_MODEL`	Vision model for multimodal input examples	Falls back to `VLLM_IMAGE_MODEL`, then `VLLM_MODEL`
`VLLM_IMAGE_OUTPUT_DIR`	Directory for saving generated images	(none)

Option Precedence

All settings follow the same resolution order:

Explicit option (e.g. WithBaseURL(...), WithAPIKey(...), WithModel(...))
Environment variable (VLLM_BASE_URL, VLLM_API_KEY, VLLM_MODEL)
Built-in default (where applicable)

Developer Workflow

The repo ships a sibling-style Makefile:

make test runs race-enabled package tests with coverage output.
make test-integration runs ./integration/... with -tags=integration.
make audit runs the aggregate quality gate.

Integration setup:

Set VLLM_BASE_URL or default to http://127.0.0.1:8000/v1.
Set VLLM_MODEL to the model served by your vLLM instance.
Set VLLM_API_KEY if your vLLM server enforces bearer auth.
Integration tests skip when the local vLLM server is unavailable.

Quick Start

package main

import (
	"context"
	"fmt"
	"time"

	vllmsdk "github.com/ethpandaops/vllm-agent-sdk-go"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
	defer cancel()

	// Model resolved from VLLM_MODEL env var, or set explicitly:
	for msg, err := range vllmsdk.Query(
		ctx,
		vllmsdk.Text("Write a two-line haiku about Go concurrency."),
		// vllmsdk.WithModel("QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ"),
	) {
		if err != nil {
			panic(err)
		}

		if result, ok := msg.(*vllmsdk.ResultMessage); ok && result.Result != nil {
			fmt.Println(*result.Result)
		}
	}
}

Surface

Query(ctx, content, ...opts) and QueryStream(...) return iter.Seq2[Message, error].
NewClient() exposes Start, StartWithContent, StartWithStream, Query, ReceiveMessages, ReceiveResponse, Interrupt, SetPermissionMode, SetModel, ListModels, ListModelsResponse, GetMCPStatus, RewindFiles, and Close.
Unsupported peer-parity controls such as ReconnectMCPServer, ToggleMCPServer, StopTask, and SendToolResult are present on Client and return typed UnsupportedControlErrors.
UserMessageContent is the canonical input shape. Use Text(...) for text-only calls and Blocks(...) with ImageInput(...), FileInput(...), AudioInput(...), or VideoInput(...) for multimodal chat-completions requests.
WithSDKTools(...) registers high-level in-process tools under mcp__sdk__<name>.
WithOnUserInput(...) handles SDK-owned user-input prompts built on top of tool calling.
ListModels(...) and ListModelsResponse(...) use vLLM model discovery via /v1/models.
StatSession(...), ListSessions(...), and GetSessionMessages(...) operate on the SDK's local persisted session store.

Model Discovery

Discovery uses /v1/models.
Returned ModelInfo values are projected from the OpenAI-compatible model cards that vLLM serves, so provider-rich VLLM metadata is no longer guaranteed.
ModelInfo still exposes helper methods such as CostTier(), SupportsToolCalling(), SupportsStructuredOutput(), SupportsReasoning(), SupportsImageInput(), SupportsImageOutput(), SupportsWebSearch(), SupportsPromptCaching(), MaxContextLength(), and parsed pricing helpers.

Image Output

Generated images are surfaced as *ImageBlock values inside AssistantMessage.Content.
ImageBlock.Decode() returns raw bytes plus media type for data-URL-backed images.
ImageBlock.Save(path) writes generated images to disk.
Live image-generation coverage is available behind the integration build tag when VLLM_IMAGE_MODEL is set.

Multimodal Input

Multimodal input in this SDK is block-based and targets the vLLM OpenAI-compatible chat surface.

content := vllmsdk.Blocks(
	vllmsdk.TextInput("Compare these two screenshots and the attached spec file."),
	vllmsdk.ImageInput("https://example.com/before.png"),
	vllmsdk.ImageInput("data:image/png;base64,..."),
	vllmsdk.FileInput("spec.pdf", "data:application/pdf;base64,..."),
)

for msg, err := range vllmsdk.Query(ctx, content,
	// vllmsdk.WithModel("QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ"),
) {
	_ = msg
	_ = err
}

ImageInput(...) accepts a normal URL or a base64 data URL.
FileInput(...) accepts a filename plus file_data URL/data URL.
AudioInput(...) accepts base64 audio data plus a format.
VideoInput(...) accepts a normal URL or a data URL.
Responses mode is routed to the vLLM /v1/responses surface when selected.

Session Semantics

Session APIs are local SDK APIs, not remote vLLM server sessions.

They read from the SDK session store configured with WithSessionStorePath(...) or VLLM_AGENT_SESSION_STORE_PATH.
They do not derive from chat session_id.
They do not derive from Responses previous_response_id.

Unsupported Controls

vLLM does not have meaningful backend equivalents for some sibling control-plane methods. The SDK exposes those methods where peer parity matters, but they fail explicitly with UnsupportedControlError instead of faking semantics.

Observability

The SDK provides opt-in OpenTelemetry metrics and distributed tracing. When no provider is configured all recording is a pure noop — zero overhead.

Options

Option	Description
`WithMeterProvider(mp)`	Sets an OTel `metric.MeterProvider` for SDK metrics
`WithTracerProvider(tp)`	Sets an OTel `trace.TracerProvider` for SDK spans
`WithPrometheusRegisterer(reg)`	Convenience: creates an OTel MeterProvider backed by a Prometheus Registerer

Metrics

GenAI semantic convention metrics:

Metric	Type	Description
`gen_ai.client.operation.duration`	Histogram (s)	Duration of query operations
`gen_ai.client.token.usage`	Counter	Token usage by type (input/output)
`gen_ai.client.time_to_first_token`	Histogram (s)	Time to first content token
`gen_ai.client.time_per_output_token`	Histogram (s)	Inter-token arrival time

vLLM-specific metrics:

Metric	Type	Description
`vllm.http.requests`	Counter	HTTP requests by status class and retry
`vllm.tool.calls`	Counter	Tool calls by name and outcome
`vllm.tool.duration`	Histogram (s)	Tool call duration
`vllm.checkpoint.operations`	Counter	Checkpoint create/restore operations
`vllm.model.load_errors`	Counter	Model listing errors
`vllm.hook.duration`	Histogram (s)	Hook execution duration by event

Spans

Span	Kind	Description
`gen_ai.query`	Client	Root span per Query/QueryStream call
`gen_ai.stream`	Internal	Streaming request (child of query)
`http.request`	Client	Individual HTTP request
`tool.execute`	Internal	Tool invocation
`hook.run`	Internal	Hook dispatch
`vllm.list_models`	Client	Model listing HTTP call

Prometheus Example

reg := prometheus.NewRegistry()

for msg, err := range vllmsdk.Query(ctx,
    vllmsdk.Text("Hello"),
    vllmsdk.WithPrometheusRegisterer(reg),
    vllmsdk.WithModel("my-model"),
) {
    // ...
}

// Serve metrics
http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{}))

See examples/prometheus_observability for a complete working example.

Examples

Runnable examples live under examples.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
examples		examples
integration		integration
internal		internal
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
client.go		client.go
client_impl.go		client_impl.go
client_test.go		client_test.go
cutover_test.go		cutover_test.go
doc.go		doc.go
errors.go		errors.go
errors_test.go		errors_test.go
go.mod		go.mod
go.sum		go.sum
hooks.go		hooks.go
input.go		input.go
logger.go		logger.go
mcp.go		mcp.go
mcp_config_loader.go		mcp_config_loader.go
mcp_config_loader_test.go		mcp_config_loader_test.go
models.go		models.go
models_test.go		models_test.go
observability_helpers_test.go		observability_helpers_test.go
observability_test.go		observability_test.go
options.go		options.go
otel.go		otel.go
query.go		query.go
query_test.go		query_test.go
schema.go		schema.go
sdk_mcp_server.go		sdk_mcp_server.go
sdk_mcp_tool.go		sdk_mcp_tool.go
session_stat.go		session_stat.go
session_stat_test.go		session_stat_test.go
sessions.go		sessions.go
sessions_test.go		sessions_test.go
stream.go		stream.go
stream_test.go		stream_test.go
transport.go		transport.go
types.go		types.go
types_test.go		types_test.go
typos.toml		typos.toml
user_input_tool.go		user_input_tool.go
version.go		version.go
with_client.go		with_client.go
with_client_test.go		with_client_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-agent-sdk-go

Install

Configuration

Environment Variables

Option Precedence

Developer Workflow

Quick Start

Surface

Model Discovery

Image Output

Multimodal Input

Session Semantics

Unsupported Controls

Observability

Options

Metrics

Spans

Prometheus Example

Examples

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vllm-agent-sdk-go

Install

Configuration

Environment Variables

Option Precedence

Developer Workflow

Quick Start

Surface

Model Discovery

Image Output

Multimodal Input

Session Semantics

Unsupported Controls

Observability

Options

Metrics

Spans

Prometheus Example

Examples

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages