Hey! Some ideas from Windows-MCP that might be useful here




**Labels:** enhancement, discussion

---

## Hey there!

I've been designing some features for [Windows-MCP](https://github.com/CursorTouch/Windows-MCP) (that Windows automation project with 2M+ users on Claude Desktop) and realized these patterns could be really valuable for CUP too.

Thought I'd share since you're building the cross-platform version of this — these ideas could work across all your target platforms, not just Windows. Here are four things I'm working on:

1. **WaitFor tool** — so agents don't have to spam snapshots or use dumb fixed sleeps
2. **Advanced selector queries** — fuzzy matching, regex, and smarter element disambiguation
3. **Return state on actions** — cuts agent round-trips in half (with some caveats)

None of this is Windows-specific — it's all stuff that could work on macOS, Linux, Web, etc. Just thought it might be worth considering for CUP's roadmap.

---

## Why this might be interesting

### The problem
Right now CUP does full tree re-traversal on every `snapshot()` call, which is expensive (like 50-300 COM calls per window on Windows, similar costs on other platforms). Agents end up either:
- Spamming snapshots after every action to see what changed
- Using fixed `wait(2000)` sleeps and hoping the UI settled
- Struggling to target the right element when there are multiple "Submit" buttons

### What I'm designing for Windows-MCP
I'm working on specs for a bunch of stuff to fix these issues:
- Condition-based waits so agents can say "wait until this button appears" instead of polling
- Smart selectors with fuzzy matching and better disambiguation
- Actions that optionally return the new state so you don't always need a separate snapshot call

(I originally explored event-driven incremental snapshots too, but tabled that — the UI tree changes too fast and too unpredictably for reliable cache invalidation. Turns out full re-traversal is the pragmatic choice for now.)

The cool part: Windows-MCP has 2M+ users already, so I'm designing this based on real production pain points, not theoretical problems.

---

## The ideas (in more detail)

### 1. WaitFor Tool

**The problem:** Agents currently do this nonsense:
```python
action("click", ...)
wait(2000)  # Hope the modal appeared?
snapshot()  # Check if it's there
wait(1000)  # Still loading?
snapshot()  # Check again...
```

**My approach:** Just let agents say what they're waiting for:
```python
wait_for("element_exists", {"name": "Login", "control_type": "Button"})
wait_for("window_active", {"window": "Spotify*"})
wait_for("element_gone", {"name": "Loading..."})  # Wait for spinner to disappear
```

Uses events to wake up fast when the condition is met, with polling as a fallback. Works on any platform that has event systems (which is all of them).

---

### 2. Advanced Selector Query Schema

**The problem:** "Click the Submit button" — which one? There are three on this page.

**My approach:** Let agents be way more specific:
```python
{
    "name": "Submit",
    "name_re": "Submit|Send|OK",      # Regex for variations
    "control_type": "Button",
    "automation_id": "btnSubmit",     # Stable ID when available
    "window": "Spotify*",             # Scope to a specific window
    "fuzzy": 0.8,                     # "Submitt" typo? Still matches
    "index": 0                        # First match if multiple remain
}
```

If exact match fails, auto-fallback to fuzzy matching and tell the agent what it found — including the closest matches so the agent can self-correct. This is basically how UiPath and Power Automate do it — proven pattern.

You can also scope searches to a specific window (glob matching), use `automation_id` for stable targeting when available, or use `index` as a last-resort positional disambiguator.

---

### 3. Return State on Actions

**The problem:** Every agent loop is like:
```python
action("click", ...)  # Round-trip 1
snapshot()            # Round-trip 2 to see what happened
action("type", ...)   # Round-trip 3
snapshot()            # Round-trip 4...
```

**My approach:** Optionally return the state with the action result:
```python
action("click", query={"name": "Login"}, return_state=True, settle_ms=300)
# Returns: {"status": "ok", "state": "... snapshot here ..."}
```

Cuts round-trips in half for actions with immediate effects. Optional parameter so it's backward compatible.

**Caveat though:** Some actions take unpredictable time to settle — opening an app, loading a page, waiting for a modal. A fixed `settle_ms` won't always cut it. For those cases you'd pair this with `WaitFor` instead. So `return_state` is best for quick actions (click a button, type text, toggle a checkbox) where you're confident the UI settles fast. For anything heavier, `WaitFor` is the right tool.

---

## How this could work for CUP

You could do this in phases so each piece delivers value independently:

**Phase 1: WaitFor tool** (2-3 weeks)
- Add `wait_for()` to the platform adapter interface
- Implement for Windows (polling-based, with event hooks where feasible)
- Add the MCP tool
- Biggest bang for the buck — agents immediately stop wasting turns on fixed sleeps

**Phase 2: Selector queries** (2-3 weeks)
- Extend `cup/search.py` with the advanced query fields
- Add `query` parameter to action tools (alongside `element_id`)
- Fuzzy matching + window scoping + diagnostic feedback

**Phase 3: Return state** (1 week)
- Add optional `return_state` and `settle_ms` parameters to action tools
- Document when to use `return_state` vs `WaitFor`

Total: ~6 weeks if you wanted to do all of it. But each phase works independently.

---

## Why this could be cool for CUP

**For users:**
- Agents get way faster (fewer wasted turns on polling and redundant snapshots)
- More reliable (no more race conditions from fixed sleeps)
- Easier to target elements (fuzzy matching, window scoping, diagnostic feedback)

**For the project:**
- These are patterns designed against real production pain points (2M+ user base)
- Cross-platform from day one — nothing Windows-specific in any of this
- WaitFor alone would be a major differentiator vs other automation protocols

---

## Prior art (this isn't new, just not in CUP yet)

| Feature | Windows-MCP (planned) | UiPath | Power Automate | Playwright | pywinauto |
|---------|-------------|--------|----------------|------------|-----------||
| WaitFor conditions | ✅ | ✅ | ✅ | ✅ | ✅ |
| Fuzzy matching | ✅ | ✅ | ✅ | ❌ | ✅ |
| Return state | ✅ | ❌ | ❌ | ❌ | ❌ |

Basically borrowing the best ideas from enterprise RPA (UiPath, Power Automate) and modern web automation (Playwright) and making them cross-platform.

---

## Potential concerns

**"Breaking changes?"**  
Nah, everything's additive — new tools, optional parameters. Existing code keeps working.

**"This sounds Windows-specific"**  
It's really not. WaitFor is just polling + condition checking — works on any platform. Selectors are pure Python filtering on the tree. Return state is just "call snapshot after the action." None of this requires platform-specific hooks.

**"Return state might be stale"**  
Yep, that's real. Some actions take time (opening apps, loading pages). That's why `return_state` is best for quick actions, and `WaitFor` handles the slow ones. They complement each other.


If this seems interesting, let me know.  just thought these ideas were cool and might be useful for what you're building!

Cheers


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hey! Some ideas from Windows-MCP that might be useful here #1

Hey there!

Why this might be interesting

The problem

What I'm designing for Windows-MCP

The ideas (in more detail)

1. WaitFor Tool

2. Advanced Selector Query Schema

3. Return State on Actions

How this could work for CUP

Why this could be cool for CUP

Prior art (this isn't new, just not in CUP yet)

Potential concerns

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hey! Some ideas from Windows-MCP that might be useful here #1

Description

Hey there!

Why this might be interesting

The problem

What I'm designing for Windows-MCP

The ideas (in more detail)

1. WaitFor Tool

2. Advanced Selector Query Schema

3. Return State on Actions

How this could work for CUP

Why this could be cool for CUP

Prior art (this isn't new, just not in CUP yet)

Potential concerns

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions