Skip to content

Latest commit

 

History

History
328 lines (249 loc) · 15.2 KB

File metadata and controls

328 lines (249 loc) · 15.2 KB

Services Reference

LiteLLM

Endpoint URL Auth
Chat completions POST /chat/completions Bearer $LITELLM_MASTER_KEY
Embeddings POST /embeddings Bearer $LITELLM_MASTER_KEY
Image generation POST /images/generations Bearer $LITELLM_MASTER_KEY
Audio transcription POST /audio/transcriptions Bearer $LITELLM_MASTER_KEY
Models list GET /models Bearer $LITELLM_MASTER_KEY
Health check GET /health/liveliness none
MCP server (all tools) POST /mcp/ Bearer $LITELLM_MASTER_KEY
Admin UI GET /ui/ optional basic auth

The admin UI at /ui/ is rate-limited to 30 requests/minute by default (configurable via RATELIMIT_ADMIN in .env). Set LITELLM_UI_BASIC_AUTH=user:password in .env to enable HTTP basic auth on top of that.


Claudebox

Chat (via LiteLLM)

Use claudebox models through the standard LiteLLM chat completions endpoint. Pass workspace via extra headers:

curl http://localhost:4000/chat/completions \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claudebox-sonnet",
    "messages": [{"role": "user", "content": "analyze data.csv and summarize it"}],
    "extra_headers": {"X-Claude-Workspace": "myproject"}
  }'

Available models: claudebox-haiku, claudebox-sonnet, claudebox-opus, claudebox-glm-4.5-air, claudebox-glm-4.7, claudebox-glm-5.1

Direct API endpoints

Base URLs: http://localhost:4000/claudebox/ (OAuth/API key) and http://localhost:4000/claudebox-zai/ (GLM).

All endpoints (except /health) require Authorization: Bearer $CLAUDEBOX_API_TOKEN (or $CLAUDEBOX_ZAI_API_TOKEN for the zai instance).

Method Path Description
GET /claudebox/health Health check — no auth required
GET /claudebox/status Returns which workspaces currently have running Claude processes
POST /claudebox/run Run a prompt through Claude Code
POST /claudebox/run/cancel?workspace=<x> Kill the running Claude process in a workspace
PUT /claudebox/files/<workspace>/<path> Upload a file to a workspace
GET /claudebox/files/<workspace>/<path> Download a file from a workspace
GET /claudebox/files/<workspace> List files in a workspace
GET /claudebox/files List files in the root workspace directory
DELETE /claudebox/files/<workspace>/<path> Delete a file from a workspace

POST /claudebox/run — request body

Field Type Description Default
prompt string The prompt to send to Claude Code (required)
workspace string Subpath under /workspaces for isolation default workspace
model string haiku, sonnet, opus, or full model name account default
systemPrompt string Replace the default system prompt entirely (none)
appendSystemPrompt string Append to the default system prompt without replacing it (none)
jsonSchema string JSON Schema string — Claude returns JSON matching this schema (none)
effort string Reasoning effort: low, medium, high, max (none)
outputFormat string json or json-verbose (includes full tool call history) json
noContinue bool Start a fresh session instead of continuing the previous one false
resume string Resume a specific session by session ID (none)
fireAndForget bool Keep the Claude process running even if the HTTP client disconnects false

Returns 409 Conflict if the workspace already has a running Claude process.

Response format (json)

{
  "type": "result",
  "subtype": "success",
  "isError": false,
  "result": "the response text",
  "numTurns": 3,
  "durationMs": 12400,
  "totalCostUsd": 0.049,
  "sessionId": "abc123-...",
  "usage": {
    "inputTokens": 312,
    "outputTokens": 87,
    "cacheReadInputTokens": 1024
  }
}

Response format (json-verbose)

Same as json but includes a turns array with every tool call, tool result, and assistant message:

{
  "type": "result",
  "subtype": "success",
  "result": "Done. I created data_summary.md with statistics.",
  "turns": [
    {
      "role": "assistant",
      "content": [
        {"type": "tool_use", "id": "toolu_abc", "name": "Bash", "input": {"command": "head data.csv"}}
      ]
    },
    {
      "role": "tool_result",
      "content": [
        {"type": "toolResult", "toolUseId": "toolu_abc", "isError": false, "content": "id,name,value\n1,foo,42\n..."}
      ]
    }
  ],
  "numTurns": 5,
  "totalCostUsd": 0.089,
  "sessionId": "abc123-..."
}

OpenAI-compatible endpoint

Claudebox also speaks OpenAI's chat/completions protocol directly. This is what LiteLLM uses internally, but you can also hit it directly:

Method Path Description
GET /claudebox/openai/v1/models List available models
POST /claudebox/openai/v1/chat/completions Chat completions (streaming + non-streaming)

Custom headers for workspace control:

Header Description
X-Claude-Workspace Workspace subpath to run in
X-Claude-Continue Set to 1, true, or yes to continue the previous session
X-Claude-Append-System-Prompt Text to append to the system prompt for this request

Note: temperature, max_tokens, tools, and other standard OpenAI fields are accepted but silently ignored — Claude Code manages these internally.

MCP server

Claudebox exposes an MCP server at /claudebox/mcp/. 5 tools: claude_run, read_file, write_file, list_files, delete_file. See mcp-tools.md for full parameter reference.

Workspace isolation

Each workspace subpath gets its own directory, file context, and conversation history. Only one Claude process can run per workspace at a time — concurrent requests return 409. Use different workspace names for parallel work:

# these run concurrently without conflicting
curl -X POST http://localhost:4000/claudebox/run \
  -H "Authorization: Bearer $CLAUDEBOX_API_TOKEN" \
  -d '{"prompt": "write a Go HTTP server", "workspace": "go-project"}'

curl -X POST http://localhost:4000/claudebox/run \
  -H "Authorization: Bearer $CLAUDEBOX_API_TOKEN" \
  -d '{"prompt": "write pytest tests", "workspace": "py-tests"}'

Object Storage (hybrids3)

Base URL: http://localhost:4000/storage/

HTTP API

Method Path Auth Description
GET /storage/health none Returns {"status":"ok"}
GET /storage/ master or bucket key List buckets (master sees all, bucket key sees own)
GET /storage/<bucket> public: none / private: key List objects (supports ?prefix= and ?max-keys=)
PUT /storage/<bucket>/<key> bucket key or master key Upload object (MIME auto-detected)
GET /storage/<bucket>/<key> public: none / private: key Download object
HEAD /storage/<bucket>/<key> public: none / private: key Object metadata — no body
DELETE /storage/<bucket>/<key> bucket key or master key Delete object — 204 even if it doesn't exist
POST /storage/presign/<bucket>/<key> bucket key or master key Generate presigned URL
POST /storage/mcp/ per-tool auth_key MCP endpoint

Authentication: pass Authorization: Bearer <key> where <key> is the bucket's private key or $HYBRIDS3_MASTER_KEY.

The uploads bucket is configured as public-read — GET/LIST require no auth. PUT/DELETE always require the bucket key.

Presigned URLs

# generate a presigned URL (expires in 1 hour by default, max 7 days)
curl -X POST "http://localhost:4000/storage/presign/uploads/photo.jpg?expires=3600" \
  -H "Authorization: Bearer $HYBRIDS3_UPLOADS_KEY"

# response for public bucket — plain URL, no expiry
{"url": "http://localhost:4000/storage/uploads/photo.jpg", "expires": null}

# response for private bucket — signed URL with expiry
{"url": "http://localhost:4000/storage/private/doc.pdf?X-Amz-Algorithm=...&X-Amz-Signature=...", "expires": 3600}

# use the presigned URL — no auth header needed
curl "http://localhost:4000/storage/uploads/photo.jpg"

S3-compatible access (boto3)

import boto3
from botocore.config import Config

s3 = boto3.client(
    "s3",
    endpoint_url="http://localhost:4000/storage",
    aws_access_key_id="uploads",           # bucket name (public_key)
    aws_secret_access_key=HYBRIDS3_UPLOADS_KEY,
    region_name="us-east-1",
    config=Config(signature_version="s3v4"),
)

s3.upload_file("image.png", "uploads", "image.png")
s3.download_file("uploads", "image.png", "local.png")
s3.list_objects_v2(Bucket="uploads", Prefix="images/")
s3.delete_object(Bucket="uploads", Key="image.png")

# generate presigned URL via boto3
url = s3.generate_presigned_url(
    "get_object",
    Params={"Bucket": "uploads", "Key": "image.png"},
    ExpiresIn=3600,
)

Response headers

Every response includes X-Request-Id for log correlation and X-Content-Type-Options: nosniff. Upload responses include ETag (MD5 of content). GET/HEAD responses include ETag, Last-Modified, Content-Length, and Content-Type (auto-detected from content).

Concurrency and locking

Each object key has its own async read-write lock. Multiple concurrent reads are allowed. Writes are exclusive — a write blocks all other readers and writers on that key. Requests that can't acquire the lock within 30 seconds, or that hold it for more than 300 seconds, get 503.

TTL

The uploads bucket has TTL configured (default: HYBRIDS3_UPLOADS_TTL, typically 168h / 7 days). Uploading a file resets its expiry clock. A background sweep runs every minute and deletes expired objects.


Browser Cluster (stealthy-auto-browse)

Endpoint URL Auth
Browser API POST /stealthy-auto-browse/ optional bearer token
Screenshot (browser) GET /stealthy-auto-browse/screenshot/browser optional bearer token
Screenshot (desktop) GET /stealthy-auto-browse/screenshot/desktop optional bearer token
MCP server POST /stealthy-auto-browse/mcp/ optional bearer token
Queue health GET /stealthy-auto-browse/__queue/health none
Cluster status GET /stealthy-auto-browse/__queue/status none

Set STEALTHY_AUTO_BROWSE_AUTH_TOKEN in .env to set the bearer auth token. Defaults to lulz-4-security if unset — always change this in production.

Cluster configuration

  • 5 browser replicas by default — set STEALTHY_AUTO_BROWSE_NUM_REPLICAS to change
  • Each replica: 256 MB RAM, up to 1 GB swap
  • HAProxy routes requests to replicas and enforces session stickiness:
    • MCP requests: pinned by Mcp-Session-Id header
    • All other requests: pinned by INSTANCEID cookie, max 1 concurrent request per replica

Browser API request body

{
  "action": "goto",
  "url": "https://example.com"
}

Atomic actions: goto, get_text, get_html, get_interactive_elements, screenshot, system_click, system_type, send_key, click, fill, scroll, mouse_move, wait_for_element, wait_for_text, eval_js, browser_action.

run_script composes multiple actions into a single request — executes them sequentially on the same replica in a single HTTP round-trip:

{
  "action": "run_script",
  "steps": [
    {"action": "goto", "url": "https://example.com"},
    {"action": "wait_for_element", "selector": "h1", "timeout": 5},
    {"action": "get_text"}
  ]
}

Cloudflared (optional, CLOUDFLARED=1)

Disabled by default. Enable by setting CLOUDFLARED=1 in .env.

Quick tunnel (no account needed)

CLOUDFLARED=1

Cloudflare assigns a random *.trycloudflare.com URL and logs it on startup:

docker compose up -d
docker compose logs cloudflared | grep trycloudflare

Named tunnel (fixed domain, requires Cloudflare account)

CLOUDFLARED=1
CLOUDFLARED_CONFIG=/absolute/path/to/config.yml
CLOUDFLARED_CREDS=/absolute/path/to/credentials.json

Example config.yml:

tunnel: <your-tunnel-id>
credentials-file: /etc/cloudflared/credentials.json
ingress:
  - hostname: aigate.yourdomain.com
    service: http://nginx:4000
  - service: http_status:404

Get your tunnel ID and credentials: Cloudflare Tunnel guide