Python SDK for the Computer Use Protocol
The official Python SDK for the Computer Use Protocol (CUP), a universal protocol for AI agents to perceive and interact with any desktop UI. This package provides tree capture, action execution, semantic search, and an MCP server for AI agent integration.
pip install computeruseprotocol
# Linux additionally requires system packages
sudo apt install python3-gi gir1.2-atspi-2.0
# Web adapter (Chrome DevTools Protocol, works on any OS)
pip install computeruseprotocol[web]
# Screenshot support (mss; not needed on macOS)
pip install computeruseprotocol[screenshot]
# MCP server for AI agent integration
pip install computeruseprotocol[mcp]import cup
# Snapshot the foreground window, optimized for LLM context windows
screen = cup.snapshot()
print(screen)
# All windows
screen = cup.snapshot("full")
# Structured CUP envelope (dict) instead of compact text
envelope = cup.snapshot_raw()Output:
# CUP 0.1.0 | windows | 2560x1440
# app: Spotify
# 63 nodes (280 before pruning)
[e0] win "Spotify" 120,40 1680x1020
[e1] doc "Spotify" 120,40 1680x1020
[e2] btn "Back" 132,52 32x32 [clk]
[e3] btn "Forward" 170,52 32x32 {dis} [clk]
[e7] nav "Main" 120,88 240x972
[e8] lnk "Home" 132,100 216x40 {sel} [clk]
[e9] lnk "Search" 132,148 216x40 [clk]# Print the foreground window tree (default)
python -m cup
# Filter by app name
python -m cup --scope full --app Discord
# Save JSON envelope to file
python -m cup --json-out tree.json
# Capture from Chrome via CDP
python -m cup --platform web --cdp-port 9222
# Include diagnostics (timing, role distribution, sizes)
python -m cup --verbose| Platform | Adapter | Tree Capture | Actions |
|---|---|---|---|
| Windows | UIA COM (comtypes) | Stable | Stable |
| macOS | AXUIElement (pyobjc) | Stable | Stable |
| Linux | AT-SPI2 (PyGObject) | Stable | Stable |
| Web | Chrome DevTools Protocol | Stable | Stable |
| Android | Planned | Planned | |
| iOS | Planned | Planned |
CUP auto-detects your platform. Platform-specific dependencies (comtypes on Windows, pyobjc on macOS) are installed automatically.
cup/
βββ __init__.py # Public API: snapshot, action, find, ...
βββ __main__.py # CLI entry point
βββ _base.py # Abstract PlatformAdapter interface
βββ _router.py # Platform detection & adapter dispatch
βββ format.py # Envelope builder, compact serializer, tree pruning
βββ search.py # Semantic element search with fuzzy matching
βββ actions/ # Action execution layer
β βββ executor.py # ActionExecutor orchestrator
β βββ _handler.py # Abstract ActionHandler interface
β βββ _keys.py # Key name mapping utilities
β βββ _windows.py # Windows UIA actions
β βββ _web.py # Chrome CDP actions
β βββ _macos.py # macOS actions (Quartz CGEvents + AX)
β βββ _linux.py # Linux actions (XTest + AT-SPI2)
βββ platforms/ # Platform-specific tree capture
β βββ windows.py # Windows UIA adapter
β βββ macos.py # macOS AXUIElement adapter
β βββ linux.py # Linux AT-SPI2 adapter
β βββ web.py # Chrome CDP adapter
βββ mcp/ # MCP server integration
βββ __main__.py # python -m cup.mcp entry point
βββ server.py # MCP protocol server
Adding a new platform means implementing PlatformAdapter. See cup/_base.py for the interface.
CUP ships an MCP server for integration with AI agents (Claude, Copilot, etc.).
# Run directly
cup-mcp
# Or via Python
python -m cup.mcpAdd to your MCP client config (e.g., .mcp.json for Claude Code):
{
"mcpServers": {
"cup": {
"command": "cup-mcp",
"args": []
}
}
}Tools: snapshot, snapshot_app, overview, snapshot_desktop, find, action, open_app, screenshot
CUP is in early development (v0.1.0). Contributions welcome, especially:
- Android adapter (
cup/platforms/android.py) - iOS adapter (
cup/platforms/ios.py) - Tests, especially cross-platform integration tests
- Documentation and examples
For protocol or schema changes, please contribute to computeruseprotocol.
See CONTRIBUTING.md for setup instructions and guidelines.
- API Reference - Session API, actions, envelope format, MCP server
- Protocol Specification - Schema, roles, states, actions, compact format