-
Notifications
You must be signed in to change notification settings - Fork 73
Closed
Description
Summary
Large text surfaces such as code editors, document views, chats, logs, and rich text panes are currently awkward for agent use.
Today snapshot -i may expose these surfaces by dumping a partial text blob, which is expensive in tokens and still incomplete for discovery. At the same time, get text is snapshot-derived, so if the snapshot text is truncated there is no reliable way to expand that same surface into the full visible text.
A concrete example is Android Studio on macOS: the editor is exposed as a TextView, but the interactive snapshot shows only a partial code fragment, while important visible content may still be missing.
Problem
We need a cleaner split between:
- discovery: what visible text surfaces exist and which one should the agent inspect next
- extraction: retrieving the text for a chosen surface after discovery
Without that split, snapshots either:
- spend too many tokens on giant text blobs, or
- truncate text in a way that makes the hidden content unreachable
Goals
- Keep large visible text surfaces discoverable in
snapshot -i - Make
snapshot -isummarize those surfaces semantically instead of dumping long text bodies - Add a reliable way to expand a selected text surface after discovery
- Keep the design transferable across macOS, iOS, and Android
Proposed Direction
- Add runner-backed text extraction for element-targeted reads instead of relying only on the stored snapshot node.
- Update
snapshot -irendering for large text surfaces (TextView,TextField, editor-like panes, etc.) to prefer semantic labels plus a short preview. - Mark truncation explicitly so agents know more content exists.
- Include useful metadata when available, such as
editable,scrollable,focused, or similar state.
Example desired shape:
@e32 [text-view] "Editor for MainActivity.kt" [editable] [scrollable] [preview:"package com.example..."] [truncated]
Acceptance Criteria
snapshot -ishows large visible text surfaces as first-class nodes without dumping the full body by default.- Agents can follow up with an element-targeted text read and retrieve the visible or full text for that surface.
- Truncation is explicit in discovery output.
- Behavior works consistently enough to support desktop editors on macOS and similar large text surfaces on iOS/Android.
Non-Goals
- OCR fallback for now. Agents can use screenshots separately when needed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels