Skip to content

Commit 4a5a3dd

Browse files
k4cper-gclaude
andcommitted
Add screen numbering to cross-os MCP, fix macOS stale foreground
Cross-OS MCP now has full parity with the main MCP server: - Screen numbering (1, 2, 3...) for each connected machine - Added snapshot_app, snapshot_desktop, screenshot, full find params - Every tool accepts screen by number or name Fixed macOS foreground detection using CGWindowListCopyWindowInfo instead of NSWorkspace.frontmostApplication() which goes stale in long-running processes without an NSRunLoop (e.g., MCP servers). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e83cd54 commit 4a5a3dd

File tree

5 files changed

+445
-106
lines changed

5 files changed

+445
-106
lines changed

cup/platforms/macos.py

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -395,7 +395,50 @@ def _cg_window_apps() -> dict[int, str]:
395395

396396

397397
def _macos_foreground_app() -> tuple[int, str, str | None]:
398-
"""Return (pid, app_name, bundle_id) of the frontmost application."""
398+
"""Return (pid, app_name, bundle_id) of the frontmost application.
399+
400+
Uses CGWindowListCopyWindowInfo to get fresh data from the window server.
401+
NSWorkspace.frontmostApplication() goes stale in long-running processes
402+
without an active NSRunLoop (e.g., MCP servers), so we only use it as a
403+
fallback.
404+
"""
405+
# CGWindowList returns windows in front-to-back order. The first
406+
# layer-0 (normal) window that isn't a system daemon is the frontmost app.
407+
try:
408+
from Quartz import (
409+
CGWindowListCopyWindowInfo,
410+
kCGNullWindowID,
411+
kCGWindowListOptionOnScreenOnly,
412+
)
413+
414+
cg_windows = CGWindowListCopyWindowInfo(
415+
kCGWindowListOptionOnScreenOnly, kCGNullWindowID,
416+
)
417+
if cg_windows:
418+
for w in cg_windows:
419+
if w.get("kCGWindowLayer", -1) != 0:
420+
continue
421+
pid = w.get("kCGWindowOwnerPID")
422+
owner = w.get("kCGWindowOwnerName", "")
423+
if not pid or not owner:
424+
continue
425+
if owner in _SYSTEM_OWNER_NAMES:
426+
continue
427+
# Found the frontmost app — look up bundle ID via NSRunningApplication
428+
bundle_id = None
429+
try:
430+
from AppKit import NSRunningApplication
431+
ns_app = NSRunningApplication.runningApplicationWithProcessIdentifier_(pid)
432+
if ns_app is not None:
433+
owner = ns_app.localizedName() or owner
434+
bundle_id = ns_app.bundleIdentifier()
435+
except Exception:
436+
pass
437+
return (pid, owner, bundle_id)
438+
except Exception:
439+
pass
440+
441+
# Fallback: NSWorkspace (may be stale without NSRunLoop)
399442
workspace = NSWorkspace.sharedWorkspace()
400443
app = workspace.frontmostApplication()
401444
return (

examples/cross-os/README.md

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@ This demonstrates CUP's core value: **one protocol, every OS**. Claude sees iden
99
```
1010
┌──────────────────────────────────────────────┐
1111
│ Claude Code / MCP Client │
12-
│ "Open Notepad on windows, type hello,
13-
│ then open Notes on mac and paste it"
12+
│ "Open Notepad on screen 1, type hello, │
13+
│ then open Notes on screen 2 and paste it" │
1414
└──────────────────┬───────────────────────────┘
1515
│ MCP (stdio)
1616
┌──────────────────▼───────────────────────────┐
1717
│ mcp_server.py (MCP bridge) │
18-
Exposes snapshot, action, find tools
19-
for each connected machine
18+
Screen 1 = windows, Screen 2 = mac
19+
Tools: snapshot, action, find, screenshot
2020
└──────┬──────────────────────────┬────────────┘
2121
│ WebSocket │ WebSocket
2222
┌──────▼──────┐ ┌─────▼────────────┐
@@ -26,6 +26,8 @@ This demonstrates CUP's core value: **one protocol, every OS**. Claude sees iden
2626
└─────────────┘ └──────────────────┘
2727
```
2828

29+
Each connected machine is a numbered **screen** (1, 2, 3...). Every tool accepts a `screen` parameter — either the number or the friendly name.
30+
2931
## Files
3032

3133
| File | Purpose |
@@ -79,23 +81,39 @@ Add to your Claude Code MCP config:
7981
}
8082
```
8183

82-
Replace the paths and IPs for your setup. If you're running Claude Code on your Windows machine, `windows` can point to `localhost`.
84+
Replace the paths and IPs for your setup. Machines are numbered as screens (1, 2, 3...) in the order listed.
8385

8486
### 4. Talk to Claude
8587

8688
Now just ask Claude Code naturally:
8789

8890
```
89-
"What apps are open on both machines?"
91+
"What apps are open on all screens?"
92+
93+
"Take a snapshot of screen 1"
9094
91-
"Open Notepad on windows and type 'Hello from Mac', then open TextEdit on mac and type 'Hello from Windows'"
95+
"Open Notepad on windows and type 'Hello from Mac',
96+
then open TextEdit on mac and type 'Hello from Windows'"
9297
93-
"Take a snapshot of the foreground window on mac"
98+
"Click the Submit button on screen 2"
9499
95-
"Click the Submit button on windows"
100+
"Take a screenshot of screen 1"
96101
```
97102

98-
Claude sees the CUP tools (`snapshot_machine`, `act_on_machine`, etc.) and uses them to interact with both machines.
103+
## Available Tools
104+
105+
| Tool | Description |
106+
|------|-------------|
107+
| `list_screens()` | List all connected screens with number, name, OS |
108+
| `snapshot(screen)` | Capture foreground window's UI tree |
109+
| `snapshot_app(screen, app)` | Capture a specific app by title |
110+
| `snapshot_desktop(screen)` | Capture desktop icons/widgets |
111+
| `overview(screen)` | List open windows (near-instant) |
112+
| `action(screen, action, ...)` | Click, type, press keys, scroll, etc. |
113+
| `find(screen, query/role/name/state)` | Search the last tree for elements |
114+
| `open_app(screen, app_name)` | Open an app by name (fuzzy match) |
115+
| `screenshot(screen, region_*)` | Capture a PNG screenshot |
116+
| `snapshot_all(scope)` | Snapshot all screens in parallel |
99117

100118
## Standalone Agent (alternative)
101119

@@ -117,16 +135,16 @@ python agent.py windows=ws://localhost:9800 mac=ws://192.168.1.30:9800
117135

118136
```
119137
# Cross-OS text relay
120-
"Copy the title of the focused window on Windows and type it into the terminal on Mac"
138+
"Copy the title of the focused window on screen 1 and type it into the terminal on screen 2"
121139
122140
# Parallel app launch
123-
"Open a text editor on both machines and type today's date in each"
141+
"Open a text editor on all screens and type today's date in each"
124142
125143
# Cross-OS comparison
126-
"Take a snapshot of both machines and tell me what apps are running on each"
144+
"Snapshot all screens and tell me what apps are running on each"
127145
128146
# Multi-step workflow
129-
"On Windows, open Chrome and navigate to example.com. On Mac, open Safari and navigate to the same URL."
147+
"On windows, open Chrome and navigate to example.com. On mac, open Safari and navigate to the same URL."
130148
```
131149

132150
## Using the client library directly
@@ -139,6 +157,7 @@ with RemoteSession("ws://192.168.1.10:9800") as win:
139157
print(win.snapshot(scope="overview"))
140158
win.open_app("notepad")
141159
tree = win.snapshot(scope="foreground")
160+
png = win.screenshot() # full screen PNG bytes
142161

143162
# Multiple machines in parallel
144163
with MultiSession({
@@ -163,4 +182,4 @@ The cup_server uses a simple JSON-RPC protocol over WebSocket:
163182
{"id": 1, "result": "# CUP 0.1.0 | windows | 1920x1080\n..."}
164183
```
165184

166-
Methods: `snapshot`, `action`, `press`, `find`, `overview`, `open_app`, `batch`, `info`
185+
Methods: `snapshot`, `snapshot_desktop`, `action`, `press`, `find`, `overview`, `open_app`, `screenshot`, `batch`, `info`

examples/cross-os/cup_remote.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525

2626
from __future__ import annotations
2727

28+
import base64
2829
import json
2930
import threading
3031
from concurrent.futures import ThreadPoolExecutor, as_completed
@@ -144,6 +145,26 @@ def find(
144145
) -> list[dict]:
145146
return self._call("find", query=query, role=role, name=name, state=state, limit=limit)
146147

148+
def snapshot_desktop(self, *, compact: bool = True) -> str | dict:
149+
return self._call("snapshot_desktop", compact=compact)
150+
151+
def screenshot(
152+
self,
153+
*,
154+
region: dict[str, int] | None = None,
155+
) -> bytes:
156+
"""Capture a screenshot and return PNG bytes."""
157+
params: dict[str, Any] = {}
158+
if region is not None:
159+
params["region_x"] = region["x"]
160+
params["region_y"] = region["y"]
161+
params["region_w"] = region["w"]
162+
params["region_h"] = region["h"]
163+
result = self._call("screenshot", **params)
164+
if not result.get("success"):
165+
raise RuntimeError(result.get("error", "Screenshot failed"))
166+
return base64.b64decode(result["data"])
167+
147168
def open_app(self, name: str) -> ActionResult:
148169
result = self._call("open_app", name=name)
149170
return ActionResult(**result)

examples/cross-os/cup_server.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,15 @@
1414
-> {"id": 2, "method": "action", "params": {"element_id": "e5", "action": "click"}}
1515
<- {"id": 2, "result": {"success": true, "message": "Clicked"}}
1616
17-
Supported methods: snapshot, action, press, find, overview, open_app, info
17+
Supported methods: snapshot, snapshot_desktop, action, press, find, overview,
18+
open_app, screenshot, batch, info
1819
"""
1920

2021
from __future__ import annotations
2122

2223
import argparse
2324
import asyncio
25+
import base64
2426
import json
2527
import platform
2628
import sys
@@ -104,6 +106,26 @@ def rpc_open_app(self, name: str) -> dict:
104106
result = self._session.open_app(name)
105107
return {"success": result.success, "message": result.message, "error": result.error}
106108

109+
def rpc_snapshot_desktop(self, compact: bool = True) -> str | dict:
110+
return self._session.snapshot(scope="desktop", compact=compact)
111+
112+
def rpc_screenshot(
113+
self,
114+
region_x: int | None = None,
115+
region_y: int | None = None,
116+
region_w: int | None = None,
117+
region_h: int | None = None,
118+
) -> dict:
119+
"""Capture screenshot and return as base64-encoded PNG."""
120+
region = None
121+
if all(v is not None for v in (region_x, region_y, region_w, region_h)):
122+
region = {"x": region_x, "y": region_y, "w": region_w, "h": region_h}
123+
try:
124+
png_bytes = self._session.screenshot(region=region)
125+
return {"success": True, "data": base64.b64encode(png_bytes).decode("ascii")}
126+
except (ImportError, RuntimeError) as e:
127+
return {"success": False, "error": str(e)}
128+
107129
def rpc_batch(self, actions: list[dict]) -> list[dict]:
108130
results = self._session.batch(actions)
109131
return [{"success": r.success, "message": r.message, "error": r.error} for r in results]

0 commit comments

Comments
 (0)