Web-Use is an intelligent autonomous browsing agent, built to seamlessly navigate websites, interact with dynamic content, perform smart searches, download files, and adapt to ever-changing pages β all with minimal effort from you. Powered by advanced LLMs and the Chrome DevTools Protocol, it transforms complex web tasks into streamlined, automated workflows that boost productivity and save time.
- π€ Autonomous Web Navigation β Navigate websites, fill forms, and interact with dynamic content without manual intervention
- π οΈ Multi-LLM Support β Works with Anthropic Claude, Google Gemini, OpenAI, Groq, Ollama, Cerebras, Mistral, and more
- πΈ Vision Capability β Understands visual content on pages with scroll-aware bounding boxes for accurate element highlighting
- π³ Semantic Tree β DOM traversal-based tree showing real page structure with roles, ids, classes, and text content
- π Web Model Context Protocol (WebMCP) β Discovers and uses custom tools exposed by websites
- β‘ Efficient Element Interaction β Indexed DOM elements for fast, accurate clicking and typing
- π₯ File Operations β Download files and upload content to forms
- π State Awareness β Maintains understanding of page state to avoid loops and recover from errors
- β±οΈ Intelligent Waiting β Handles loading states, animations, and user interactions (CAPTCHA, OTP)
- π OAuth 2.0 + PKCE β Built-in authenticated workflows for OAuth-protected services with persistent token storage
Web-Use builds a semantic tree of the visible page directly from the real DOM parent-child relationships captured via CDP β not reconstructed from XPaths. This gives the agent accurate structural context around every element.
Each node in the tree is rendered with CSS selector notation showing tag, id, class, and role:
document [role: document]
βββ nav#main-nav.navbar
βββ [#0] a.nav-link "Home" β /
βββ [#1] a.nav-link "About" β /about
βββ [#2] div.dropdown [button] "Products"
form#checkout-form
βββ p.hint "Fill in your details below"
βββ [#3] input#email.form-input "Email"
βββ [#4] input#name.form-input "Name"
βββ [#5] div.btn-group [button] "Submit"
What's included:
- Interactive elements β buttons, links, inputs, selects, checkboxes, anything clickable β labelled
[#id] - Informative elements β headings, paragraphs, list items, labels, table cells, blockquotes, figcaptions, and more
- Structural containers β
nav,header,footer,main,section,form,ul,aside,dialog, etc. shown as grouping context - Roles shown in
[brackets]when they differ from the tag (e.g.div [button],span [link]) - Text content extracted correctly even when wrapped in inline elements (
em,strong,span,a, etc.)
Web-Use has built-in support for OAuth 2.0 Authorization Code flow with PKCE, enabling the agent to authenticate with any OAuth provider (Google, GitHub, Microsoft, etc.) without storing passwords.
- A local HTTP server starts on
localhost:PORT - The browser navigates to the provider's login page
- The user logs in once β the provider redirects back with an authorization code
- The code is exchanged for tokens using the PKCE verifier
Authorization: Bearer <token>is injected into every browser request automatically- Tokens are saved to
~/.web-use/oauth/and reloaded on future runs β no login required again
import asyncio
import os
from src.agent.auth import OAuthConfig
oauth_config = OAuthConfig(
client_id=os.getenv('OAUTH_CLIENT_ID'),
auth_url='https://accounts.google.com/o/oauth2/v2/auth',
token_url='https://oauth2.googleapis.com/token',
scopes=['openid', 'email', 'profile'],
redirect_uri='http://localhost:8765/callback',
)
async def setup_auth():
await agent.browser.ensure_open()
# Load saved token (silently refreshes if expired)
token = await agent.browser.oauth.load(oauth_config)
if token is None:
# First run β opens login page, user authenticates once
token = await agent.browser.oauth.authorize(oauth_config)
asyncio.run(setup_auth())First run: login page opens, user authenticates, token saved.
Every run after: token loaded from disk, refreshed silently if needed β no user interaction.
To clear saved tokens:
await agent.browser.oauth.revoke()Web-Use supports WebMCP, a protocol that allows websites to expose custom tools and capabilities directly to the agent. When visiting a website with WebMCP support:
- Auto-Discovery β The agent automatically detects available tools
- Dynamic Registration β Tools are added to the agent's toolkit on-the-fly
- Full Integration β WebMCP tools appear in the browser state with complete schema information
- Seamless Execution β Tools are called like built-in tools with proper parameter validation
If you visit a documentation site that supports WebMCP with a search_docs tool:
**WebMCP Tools Available:**
**search_docs** β Search documentation
- `query` (string) [β required]
- `limit` (integer) [β optional]
The agent will automatically use this tool when relevant to the task.
Enable WebMCP support:
agent = Agent(
config=config,
llm=llm,
use_web_mcp=True,
max_steps=100
)- Python 3.11 or higher
- UV
Clone the repository:
git clone https://github.com/CursorTouch/Web-Use.git
cd Web-UseInstall dependencies:
uv syncSetting up the .env file:
GOOGLE_API_KEY="<API_KEY_HERE>"Basic Setup:
from src.agent.browser.config import BrowserConfig
from src.providers.ollama import ChatOllama
from src.agent import Agent
from dotenv import load_dotenv
load_dotenv()
llm = ChatOllama(model='qwen3.5:397b-cloud', temperature=0.5)
config = BrowserConfig(
browser='chrome',
headless=False,
use_system_profile=True
)
agent = Agent(
config=config,
llm=llm,
use_vision=True,
use_web_mcp=True,
max_steps=100
)
user_query = input('Enter your query: ')
agent.print_response(user_query)Execute:
uv run main.py| Parameter | Type | Default | Description |
|---|---|---|---|
config |
BrowserConfig | Required | Browser configuration |
llm |
BaseChatLLM | Required | Language model for reasoning |
use_vision |
bool | False | Enable screenshot-based visual understanding |
use_web_mcp |
bool | False | Enable WebMCP tool discovery |
max_steps |
int | 25 | Maximum actions before timeout |
max_consecutive_failures |
int | 3 | Retry limit for failed tool calls |
include_human_in_loop |
bool | False | Allow pausing for human input |
keep_alive |
bool | False | Keep browser open after task completion |
config = BrowserConfig(
browser='chrome', # 'chrome' or 'edge'
headless=False, # Run in headless mode
use_system_profile=True, # Use real browser profile with auth
user_data_dir='/path/to/profile', # Custom profile directory
cdp_port=9222, # Chrome DevTools Protocol port
downloads_dir='/Downloads', # Where to save files
attach_to_existing=False, # Connect to running browser
update_cdp=False, # Regenerate CDP protocol files
)from src.agent.auth import OAuthConfig
config = OAuthConfig(
client_id='your-client-id', # From your OAuth app registration
auth_url='https://...', # Provider authorization endpoint
token_url='https://...', # Provider token endpoint
scopes=['openid', 'email'], # Requested OAuth scopes
redirect_uri='http://localhost:8765/callback', # Must match app registration
client_secret=None, # Optional β not needed with PKCE
)Prompt: I want to know the price details of the RTX 4060 laptop gpu from various sellers from amazon.in
Amazon.mov
Prompt: Make a twitter post about AI on X
Twitter.mov
Prompt: Can you play the trailer of GTA 6 on youtube
Youtube.mov
Prompt: Can you go to my github account and visit the Windows MCP
Github.mov
This project is licensed under MIT License - see the LICENSE file for details.
Contributions are welcome! Please see CONTRIBUTING for setup instructions and development guidelines.
Made with β€οΈ by Jeomon George, Muhammad Yaseen