An open standard for AI agents to interact with web browsers.
v0.1.0: First public release. APIs may evolve based on feedback.
BAP (Browser Agent Protocol) provides a standardized way for AI agents to control web browsers. It uses JSON-RPC 2.0 over WebSocket for communication and includes semantic selectors designed for AI comprehension.
- Semantic Selectors: Use accessibility roles, text content, and labels instead of brittle CSS selectors
- Accessibility-First: Built-in support for accessibility tree inspection
- AI-Optimized: Designed for LLM-based agents with token-efficient observations
- MCP Integration: Works seamlessly with Model Context Protocol
- Composite Actions: Execute multi-step action sequences in a single round-trip (
agent/act,agent/observe,agent/extract) - Element References: Stable element refs (
@submitBtn,@e7f3a2) that persist across observations - Screenshot Annotation: Set-of-Marks style overlays with numbered badges for vision models
- Multi-Context Support: Parallel isolated browser sessions with
context/create,context/list,context/destroy - Human-in-the-Loop Approval: Enterprise workflow for human oversight of sensitive actions
- Frame Support: Explicit frame switching for iframes with
frame/list,frame/switch,frame/main - Streaming Responses: Chunked transfers for large observations with checksum verification
| Package | Description | npm |
|---|---|---|
@browseragentprotocol/protocol |
Protocol types, schemas, and utilities | |
@browseragentprotocol/logger |
Pretty logging utilities with colors and icons | |
@browseragentprotocol/client |
TypeScript client SDK | |
@browseragentprotocol/server-playwright |
Server implementation using Playwright | |
@browseragentprotocol/mcp |
Model Context Protocol integration |
| Package | Description | PyPI |
|---|---|---|
browser-agent-protocol |
Python SDK with async/sync APIs |
BAP works with any MCP-compatible client including Claude Code, Claude Desktop, OpenAI Codex, and Google Antigravity.
Claude Code:
claude mcp add --transport stdio bap-browser -- npx @browseragentprotocol/mcp

Claude Code browsing Hacker News with BAP
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"bap-browser": {
"command": "npx",
"args": ["@browseragentprotocol/mcp"]
}
}
}

Claude Desktop browsing Hacker News with BAP
Codex CLI:
codex mcp add bap-browser -- npx @browseragentprotocol/mcp

Codex CLI browsing Hacker News with BAP
Codex Desktop (~/.codex/config.toml):
[mcp_servers.bap-browser]
command = "npx"
args = ["@browseragentprotocol/mcp"]

Codex Desktop browsing Hacker News with BAP
đź’ˇ Tip: Codex may default to web search. Be explicit: "Using the bap-browser MCP tools..."
Antigravity (mcp_config.json via "..." → MCP Store → Manage):
{
"mcpServers": {
"bap-browser": {
"command": "npx",
"args": ["@browseragentprotocol/mcp"]
}
}
}npx @browseragentprotocol/server-playwrightimport { BAPClient, role } from "@browseragentprotocol/client";
const client = new BAPClient("ws://localhost:9222");
await client.connect();
// Launch browser and navigate
await client.launch({ browser: "chromium", headless: false });
await client.createPage({ url: "https://example.com" });
// Use semantic selectors (AI-friendly)
await client.click(role("button", "Submit"));
await client.fill(role("textbox", "Email"), "user@example.com");
// Get accessibility tree for AI reasoning
const { tree } = await client.accessibility();
await client.close();pip install browser-agent-protocolimport asyncio
from browseragentprotocol import BAPClient, role, label
async def main():
async with BAPClient("ws://localhost:9222") as client:
# Launch browser and navigate
await client.launch(browser="chromium", headless=False)
await client.create_page(url="https://example.com")
# Use semantic selectors (AI-friendly)
await client.click(role("button", "Submit"))
await client.fill(label("Email"), "user@example.com")
# Get accessibility tree for AI reasoning
tree = await client.accessibility()
asyncio.run(main())For synchronous usage (scripts, notebooks):
from browseragentprotocol import BAPClientSync, role
with BAPClientSync("ws://localhost:9222") as client:
client.launch(browser="chromium", headless=True)
client.create_page(url="https://example.com")
client.click(role("button", "Submit"))BAP uses semantic selectors that are more stable and readable than CSS selectors:
import { role, text, label, testId, ref } from "@browseragentprotocol/client";
// By accessibility role and name
await client.click(role("button", "Submit"));
await client.fill(role("textbox", "Search"));
// By visible text content
await client.click(text("Sign in"));
// By associated label
await client.fill(label("Email address"), "user@example.com");
// By test ID (for automation)
await client.click(testId("submit-button"));
// By stable element reference (from agent/observe)
await client.click(ref("@submitBtn"));BAP provides composite methods optimized for AI agents:
// agent/observe - Get AI-optimized page snapshot
const observation = await client.observe({
includeAccessibility: true,
includeInteractiveElements: true,
includeScreenshot: true,
maxElements: 50,
annotateScreenshot: true, // Set-of-Marks style
});
// Interactive elements with stable refs
for (const el of observation.interactiveElements) {
console.log(`${el.ref}: ${el.role} - ${el.name}`);
// @e1: button - Submit
// @e2: textbox - Email
}
// agent/act - Execute multi-step sequences atomically
const result = await client.act({
steps: [
{ action: "action/fill", params: { selector: label("Email"), value: "user@example.com" } },
{ action: "action/fill", params: { selector: label("Password"), value: "secret123" } },
{ action: "action/click", params: { selector: role("button", "Sign In") } },
],
});
console.log(`Completed ${result.completed}/${result.total} steps`);
// agent/extract - Extract structured data
const data = await client.extract({
instruction: "Extract all product names and prices",
schema: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "number" },
},
},
},
});Note:
agent/extract(andbap_extractin MCP) uses heuristic-based extraction (CSS patterns). For complex pages, consider usingbap_contentto get page content as markdown and extract data yourself.
npx @browseragentprotocol/server-playwright [options]
Options:
-p, --port <number> WebSocket port (default: 9222)
-h, --host <host> Host to bind to (default: localhost)
-b, --browser <browser> Browser: chromium, firefox, webkit (default: chromium)
--headless Run in headless mode (default: true)
--no-headless Run with visible browser window
-t, --timeout <ms> Default timeout in milliseconds (default: 30000)
-d, --debug Enable debug logging
--token <token> Authentication token for client connections
--help Show help
-v, --version Show version# Test connection to a BAP server
bap connect ws://localhost:9222
# Get server info
bap info ws://localhost:9222 --json# Start the server
npx @browseragentprotocol/server-playwright --port 9222 --no-headless
# Start with debug logging
npx @browseragentprotocol/server-playwright --debug# Clone the repository
git clone https://github.com/browseragentprotocol/bap.git
cd bap
# Install dependencies
pnpm install
# Build all packages
pnpm build
# Run type checking
pnpm typecheck
# Run linting
pnpm lint
# Install Python SDK in development mode
cd packages/python-sdk && pip install -e .We welcome contributions! Please open an issue or submit a pull request on GitHub.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.