Stop wasting premium requests. Pool your Copilot subscriptions, expose them as APIs, and put every last request to work.
GitHub Copilot subscriptions come with a monthly premium request quota, but most individuals and teams never use it all. Quotas reset at the end of each billing cycle — unused requests are simply lost. Across a team of 10, 50, or 100 developers, this waste adds up to thousands of premium requests thrown away every month.
Meanwhile, teams that could benefit from LLM API access — for CI/CD automation, internal tooling, agentic workflows (Claude Code, Codex CLI), or custom chatbots — are forced to procure separate OpenAI or Anthropic API contracts, adding cost, vendor relationships, and compliance overhead on top of the Copilot subscriptions they already pay for.
Copilot LLM Provider solves this by pooling multiple Copilot subscriptions into a unified resource pool and exposing them as standard OpenAI- and Anthropic-compatible API endpoints using the GitHub Copilot SDK.
Clients & Tools Unified Resource Pool Individual Quotas (often underused)
┌────────────────┐ ┌────────────────────────┐ ┌──────────┐
│ OpenAI API │ │ │ │ Dev A │
│ /openai/v1/... │ ───► │ Copilot LLM Provider │ ───► │ 300/1000 │ 70% waste
└────────────────┘ │ │ └──────────┘
│ Round-Robin Balancing │ ┌──────────┐
┌────────────────┐ │ Quota Tracking │ │ Dev B │
│ Anthropic API │ ───► │ Combined: 3000 reqs │ ───► │ 50/1000 │ 95% waste
│/anthropic/v1/..│ │ │ └──────────┘
└────────────────┘ │ ► Near-zero waste │ ┌──────────┐
│ │ │ Dev C │
└────────────────────────┘ ───► │ 120/1000 │ 88% waste
└──────────┘
Each developer's GitHub token is added to the pool. The gateway distributes requests across tokens via round-robin load balancing, tracks per-token quota usage in real time, and ensures no single account is over-utilized. Any existing client library or AI tool works without code changes — just point base_url at the gateway.
| Capability | Why It Matters |
|---|---|
| Multi-Token Pooling | Combine N subscriptions into one pool; round-robin balancing maximizes total quota utilization |
| Real-Time Quota Monitoring | Dashboard shows per-token used/remaining/reset date so you always know where you stand |
| Dual API Compatibility | Drop-in replacement for both OpenAI (/openai/v1/chat/completions) and Anthropic (/anthropic/v1/messages) SDKs — no client code changes needed |
| API Key Governance | Managed keys with per-key model restrictions, usage quotas, and enable/disable controls for multi-team access |
| Session Recording | Full audit trail of every request/response for compliance and debugging |
| Streaming Support | Full SSE streaming in both OpenAI and Anthropic formats |
| MCP Server | Expose Copilot models to Claude Desktop, Claude Code, and other MCP clients |
┌─────────────────────────────────────────────────────────────────────┐
│ Clients & Tools │
│ OpenAI SDK │ Anthropic SDK │ Claude Code │ Codex CLI │ curl │
└──────────┬───────────┬──────────────┬────────────┬──────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────────────────────────────────────────────────────────────────┐
│ FastAPI Gateway (main.py) │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Auth Layer │ │ OpenAI API │ │Anthropic │ │ Admin API │ │
│ │ Session/Key │ │/openai/v1/.. │ │ API │ │ /api/admin/ │ │
│ │ Managed Key │ │/openai/v1/mod│ │/anthropic│ │ tokens/keys │ │
│ └──────┬──────┘ └──────┬───────┘ └────┬─────┘ └──────┬───────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Services Layer │ │
│ │ UsageTracker │ SessionStore │ ApiKeyStore │ UserStore │ │
│ └───────────────────────────┬─────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼─────────────────────────────────┐ │
│ │ Token Pool (Round-Robin) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Token A │ │ Token B │ │ Token C │ ... │ │
│ │ │ Provider │ │ Provider │ │ Provider │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
│ └───────┼──────────────┼─────────────┼────────────────────────┘ │
│ │ │ │ │
└──────────┼──────────────┼─────────────┼────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────┐
│ GitHub Copilot SDK (CopilotClient per token) │
│ JSON-RPC over stdio → Copilot CLI binary process │
└──────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────────┐
│ GitHub Copilot Service │
│ gpt-4.1 │ claude-sonnet-4 │ o4-mini │ ... │
└──────────────────────────────────────────────────────────────────────┘
1. Client sends OpenAI/Anthropic format request
2. Auth layer validates session token, managed API key, or legacy key
3. Token Pool selects next GitHub token via round-robin (or explicit selection)
4. CopilotProvider creates a session, sends request, returns response
5. UsageTracker records request (per-model, per-key, per-token, daily)
6. SessionStore persists full request/response for audit
7. API layer converts internal response back to wire format
8. Client receives standard OpenAI/Anthropic response
- Python 3.11+
- Node.js 20+ (for frontend build)
- GitHub account with an active Copilot subscription
- GitHub Personal Access Token (PAT) with Copilot access
# Clone the repository
git clone https://github.com/satomic/copilot-llm-provider.git
cd copilot-llm-provider
# Backend setup
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# Configure
cp .env.example .env
# Edit .env → set GITHUB_TOKEN=ghp_your_token_here
# Start backend
uvicorn src.backend.app.main:app --reload --host 0.0.0.0 --port 8000
# Frontend (separate terminal)
cd src/frontend && npm install && npm run dev# Build frontend
cd src/frontend && npm ci && npm run build && cd ../..
# Run with built frontend
FRONTEND_DIR=src/frontend/dist uvicorn src.backend.app.main:app --host 0.0.0.0 --port 8000- Navigate to
http://localhost:8000in your browser - Create an admin account (first user becomes admin)
- Go to Settings → manage GitHub tokens and API keys
- Go to Dashboard → verify models are available and quota is loaded
| Variable | Required | Default | Description |
|---|---|---|---|
GITHUB_TOKEN |
No* | — | GitHub PAT with Copilot access |
API_KEY |
No | — | Legacy API key for server auth |
HOST |
No | 0.0.0.0 |
Server bind host |
PORT |
No | 8000 |
Server bind port |
CORS_ORIGINS |
No | * |
Allowed CORS origins (comma-separated) |
LOG_LEVEL |
No | info |
Logging level |
FRONTEND_DIR |
No | — | Path to built frontend static files |
* If not set, the SDK falls back to GH_TOKEN, GITHUB_TOKEN env vars, or stored OAuth from GitHub CLI.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/openai/v1",
api_key="your-managed-api-key",
)
response = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:8000/anthropic",
api_key="your-managed-api-key",
)
message = client.messages.create(
model="claude-sonnet-4",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8000/anthropic
export ANTHROPIC_API_KEY=your-managed-api-key
claude
# Codex CLI
export OPENAI_BASE_URL=http://localhost:8000/openai/v1
export OPENAI_API_KEY=your-managed-api-key
codexNote: Legacy non-prefixed routes (
/v1/chat/completions,/v1/models,/v1/messages) are also available for backward compatibility.
This project also runs as an MCP server, allowing MCP clients (Claude Desktop, Claude Code, etc.) to use Copilot models as tools.
Available MCP tools:
| Tool | Description |
|---|---|
chat |
Send a message to a Copilot model (supports model selection, system prompt, temperature, max_tokens) |
list_models |
List all available models with premium/free status and billing multiplier |
get_quota |
Check premium request quota for all configured GitHub tokens |
Setup — add to your MCP client config (e.g., claude_desktop_config.json):
{
"mcpServers": {
"copilot-llm-provider": {
"command": "python",
"args": ["-m", "src.backend.app.mcp_server"],
"cwd": "/absolute/path/to/copilot-llm-provider",
"env": {
"GITHUB_TOKEN": "ghp_your_token_here"
}
}
}
}Or run standalone: python -m src.backend.app.mcp_server
Pool multiple GitHub Copilot accounts and distribute requests fairly:
- Round-robin selection ensures even load distribution across tokens
- Per-token status monitoring (active / error / stopped)
- Live quota tracking via the SDK's
account.getQuota()RPC - Dynamic token management — add, remove, enable/disable tokens at runtime via Admin API or Dashboard UI
- Explicit token selection via
X-GitHub-Token-Idheader for deterministic routing
Create managed API keys with fine-grained controls:
- Model restrictions — limit which models each key can access
- Usage quotas — set max total and max premium request limits per key
- Enable/disable — instantly revoke access without deleting the key
- Usage tracking — per-key request counts and model breakdown
- Alias tagging — human-readable names for team/service identification
- Real-time Dashboard — server status, model availability, usage charts, daily trends
- Session Recording — every request/response persisted as JSON for audit
- Usage Statistics — per-model, per-API-key, per-token breakdown with daily trends
- Quota Monitoring — premium request entitlement, used, remaining %, reset date per token
- No training data collection — this is a gateway; it does not fine-tune or train models
- Session recording is local — all session data is stored as local JSON files on the server, never transmitted to third parties
- Token masking — GitHub tokens are masked in all API responses and logs (only first 10 + last 4 characters shown)
- No PII extraction — the system does not extract, store, or process personal information beyond what the user sends in prompts
- Authentication required — username/password admin accounts with session tokens
- Managed API keys — granular access control with model restrictions and usage limits
- Audit trail — every API request is logged with timestamp, model, API key alias, token alias, client IP, and full request/response content
- No anonymous access — all endpoints (except
/health) require authentication when auth is configured
- Model visibility — only models available through the user's Copilot subscription are exposed
- Billing transparency — each model's billing multiplier is displayed, distinguishing free (x0) from premium (x>0) models
- Usage limits — API keys can be configured with maximum request counts to prevent runaway usage
- Quota awareness — real-time premium request quota monitoring prevents unexpected overage
- This project proxies requests to GitHub Copilot's backend models; it inherits the content policies and limitations of those models
- Output quality and safety depend on the underlying models (GPT-4.1, Claude Sonnet 4, etc.)
- Administrators should implement additional content filtering if required by their organization's policies
copilot-llm-provider/
├── src/
│ ├── backend/
│ │ ├── app/
│ │ │ ├── main.py # FastAPI app, lifespan, routing
│ │ │ ├── core/
│ │ │ │ ├── config.py # Settings (pydantic-settings)
│ │ │ │ ├── auth.py # Multi-method authentication
│ │ │ │ ├── dependencies.py # DI: provider selection, token pool
│ │ │ │ └── runtime_config.py # Dynamic runtime configuration
│ │ │ ├── providers/
│ │ │ │ ├── base.py # Abstract Provider interface
│ │ │ │ └── copilot.py # CopilotProvider + quota fetching
│ │ │ ├── services/
│ │ │ │ ├── token_pool.py # Multi-token round-robin pool
│ │ │ │ ├── session_store.py # Session persistence (JSON files)
│ │ │ │ ├── usage_tracker.py # Per-model/key/token usage stats
│ │ │ │ ├── api_key_store.py # Managed API key CRUD
│ │ │ │ └── user_store.py # Admin user management
│ │ │ ├── api/
│ │ │ │ ├── openai/chat.py # POST /openai/v1/chat/completions
│ │ │ │ ├── openai/models.py # GET /openai/v1/models
│ │ │ │ ├── anthropic/messages.py # POST /anthropic/v1/messages
│ │ │ │ ├── admin.py # Token/key management endpoints
│ │ │ │ ├── sessions.py # Session CRUD + continue-chat
│ │ │ │ └── stats.py # Usage statistics endpoint
│ │ │ └── schemas/
│ │ │ ├── openai.py # OpenAI Pydantic models
│ │ │ └── anthropic.py # Anthropic Pydantic models
│ │ └── tests/
│ └── frontend/
│ └── src/
│ ├── pages/
│ │ ├── DashboardPage.tsx # Usage stats, charts, model list
│ │ ├── PlaygroundPage.tsx # Interactive chat testing
│ │ ├── SessionsPage.tsx # Session audit viewer
│ │ ├── SettingsPage.tsx # Token & API key management
│ │ └── ...
│ ├── contexts/I18nContext.tsx # EN/ZH internationalization
│ └── services/api.ts # Centralized API client
├── docs/ # Documentation
├── presentations/ # Challenge presentation deck
└── AGENTS.md # Multi-agent development architecture
| Layer | Technology |
|---|---|
| SDK | github-copilot-sdk (Python) — Copilot CLI binary via JSON-RPC over stdio |
| Backend | Python 3.11+, FastAPI, Pydantic v2, uvicorn |
| Frontend | React 18, TypeScript 5, Vite, Tailwind CSS |
| Auth | Session tokens + managed API keys + legacy key |
| Persistence | JSON files (zero-dependency, no database required) |
| Deployment | uvicorn with production frontend build |
| Logging | Python logging with JSON structured output |
| Testing | pytest, pytest-asyncio, httpx |
| Linting | ruff (Python), TypeScript strict mode |
MIT License. See LICENSE for details.