Skip to content

Commit b3fad72

Browse files
authored
Merge pull request #232 from PredicateSystems/planner_executor_agent2
Abstraction for browser automation tasks
2 parents a6a6bf8 + eeb3944 commit b3fad72

18 files changed

+7830
-87
lines changed

docs/PLANNER_EXECUTOR_AGENT.md

Lines changed: 1368 additions & 0 deletions
Large diffs are not rendered by default.

examples/planner-executor/README.md

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,16 @@
33
This directory contains examples for the `PlannerExecutorAgent`, a two-tier agent
44
architecture with separate Planner (7B+) and Executor (3B-7B) models.
55

6+
> **See also**: [Full User Manual](../../docs/PLANNER_EXECUTOR_AGENT.md) for comprehensive documentation.
7+
68
## Examples
79

810
| File | Description |
911
|------|-------------|
1012
| `minimal_example.py` | Basic usage with OpenAI models |
13+
| `stepwise_example.py` | Stepwise (ReAct-style) planning for unfamiliar sites |
14+
| `automation_task_example.py` | Using AutomationTask for flexible task definition |
15+
| `captcha_example.py` | CAPTCHA handling with different solvers |
1116
| `local_models_example.py` | Using local HuggingFace/MLX models |
1217
| `custom_config_example.py` | Custom configuration (escalation, retry, vision) |
1318
| `tracing_example.py` | Full tracing integration for Predicate Studio |
@@ -23,6 +28,7 @@ architecture with separate Planner (7B+) and Executor (3B-7B) models.
2328
│ • Generates JSON plan │ • Executes each step │
2429
│ • Includes predicates │ • Snapshot-first approach │
2530
│ • Handles replanning │ • Vision fallback │
31+
│ • Stepwise (ReAct) mode │ │
2632
└─────────────────────────────────────────────────────────────┘
2733
2834
@@ -34,6 +40,50 @@ architecture with separate Planner (7B+) and Executor (3B-7B) models.
3440
└─────────────────────────────────────────────────────────────┘
3541
```
3642

43+
## Planning Modes
44+
45+
### Upfront Planning (Default)
46+
47+
The planner generates a complete multi-step plan before execution. Use for well-known sites.
48+
49+
```python
50+
result = await agent.run(runtime, task)
51+
```
52+
53+
### Stepwise Planning (ReAct-style)
54+
55+
The planner decides one action at a time based on current page state. **Recommended for unfamiliar sites.**
56+
57+
```python
58+
from predicate.agents import StepwisePlanningConfig
59+
60+
config = PlannerExecutorConfig(
61+
stepwise=StepwisePlanningConfig(
62+
max_steps=30,
63+
action_history_limit=5,
64+
),
65+
)
66+
67+
agent = PlannerExecutorAgent(planner=planner, executor=executor, config=config)
68+
result = await agent.run_stepwise(runtime, task)
69+
```
70+
71+
### Auto-Fallback (Default Behavior)
72+
73+
By default, `agent.run()` automatically falls back to stepwise planning when upfront planning fails:
74+
75+
```python
76+
# Default: auto_fallback_to_stepwise=True
77+
result = await agent.run(runtime, task)
78+
79+
# Check if fallback was used
80+
if result.fallback_used:
81+
print("Automatically switched to stepwise planning")
82+
83+
# Disable auto-fallback
84+
config = PlannerExecutorConfig(auto_fallback_to_stepwise=False)
85+
```
86+
3787
## Quick Start
3888

3989
```python
@@ -139,3 +189,117 @@ agent = PlannerExecutorAgent(
139189

140190
tracer.close() # Upload trace to Studio
141191
```
192+
193+
## AutomationTask
194+
195+
Use `AutomationTask` for flexible task definition with built-in recovery:
196+
197+
```python
198+
from predicate.agents import AutomationTask, TaskCategory
199+
200+
# Basic task
201+
task = AutomationTask(
202+
task_id="search-products",
203+
starting_url="https://amazon.com",
204+
task="Search for laptops and add the first result to cart",
205+
category=TaskCategory.TRANSACTION,
206+
enable_recovery=True,
207+
)
208+
209+
# Add success criteria
210+
task = task.with_success_criteria(
211+
{"predicate": "url_contains", "args": ["/cart"]},
212+
{"predicate": "exists", "args": [".cart-item"]},
213+
)
214+
215+
result = await agent.run(runtime, task)
216+
```
217+
218+
## Permissions
219+
220+
Grant browser permissions to prevent permission dialogs from interrupting automation:
221+
222+
```python
223+
from predicate import AsyncPredicateBrowser
224+
225+
# Grant permissions to avoid "Allow this site to access your location?" dialogs
226+
permission_policy = {
227+
"auto_grant": [
228+
"geolocation", # Store locators, local inventory
229+
"notifications", # Push notification prompts
230+
"clipboard-read", # Paste coupon codes
231+
"clipboard-write", # Copy product info
232+
],
233+
"geolocation": {"latitude": 47.6762, "longitude": -122.2057}, # Mock location
234+
}
235+
236+
async with AsyncPredicateBrowser(
237+
permission_policy=permission_policy,
238+
) as browser:
239+
# Run automation without permission dialogs
240+
...
241+
```
242+
243+
## CAPTCHA Handling
244+
245+
Configure CAPTCHA solving with different strategies:
246+
247+
```python
248+
from predicate.agents.browser_agent import CaptchaConfig
249+
from predicate.captcha_strategies import HumanHandoffSolver, ExternalSolver
250+
251+
# Human handoff: wait for manual solve
252+
config = PlannerExecutorConfig(
253+
captcha=CaptchaConfig(
254+
policy="callback",
255+
handler=HumanHandoffSolver(timeout_ms=120_000),
256+
),
257+
)
258+
259+
# External solver: integrate with 2Captcha, CapSolver, etc.
260+
def solve_captcha(ctx):
261+
# Call your CAPTCHA solving service
262+
pass
263+
264+
config = PlannerExecutorConfig(
265+
captcha=CaptchaConfig(
266+
policy="callback",
267+
handler=ExternalSolver(resolver=solve_captcha),
268+
),
269+
)
270+
```
271+
272+
## Modal/Drawer Dismissal
273+
274+
Automatic modal and drawer dismissal is enabled by default in both upfront and stepwise planning modes.
275+
276+
After successful CLICK actions, the agent automatically detects and dismisses blocking overlays:
277+
278+
```python
279+
from predicate.agents import PlannerExecutorConfig, ModalDismissalConfig
280+
281+
# Default: enabled with common patterns (works in both modes)
282+
config = PlannerExecutorConfig()
283+
284+
# Custom patterns for non-English sites
285+
config = PlannerExecutorConfig(
286+
modal=ModalDismissalConfig(
287+
dismiss_patterns=(
288+
"no thanks", "not now", "close", "skip", # English
289+
"nein danke", "schließen", # German
290+
"no gracias", "cerrar", # Spanish
291+
),
292+
),
293+
)
294+
295+
# Disable modal dismissal
296+
config = PlannerExecutorConfig(
297+
modal=ModalDismissalConfig(enabled=False),
298+
)
299+
```
300+
301+
This handles common e-commerce scenarios like:
302+
- Amazon's "Add Protection Plan" drawer after Add to Cart
303+
- Cookie consent banners
304+
- Newsletter signup popups
305+
- Promotional overlays

0 commit comments

Comments
 (0)