Skip to content

DOM 元素检测增强 (browser-use 研究)#388

Merged
jackwener merged 4 commits intomainfrom
dom-detection-improvements
Mar 24, 2026
Merged

DOM 元素检测增强 (browser-use 研究)#388
jackwener merged 4 commits intomainfrom
dom-detection-improvements

Conversation

@jackwener
Copy link
Owner

改进概述

基于 browser-useClickableElementDetector 研究,增强了 DOM 元素检测能力。

改动内容

1. 搜索元素启发式检测 (dom-snapshot.ts)

  • 添加 SEARCH_INDICATORS 集合,识别 class/id 中的搜索相关关键词
  • 添加 isSearchElement() 函数,检测搜索按钮/输入框
  • 增强对 label/span 包装器的检测(label > span > input 模式)

2. CDP 事件监听器检测 (cdp.ts)

  • 添加 AXTreeNode 接口和 fetchInteractiveNodeIds() 方法
  • 添加 fetchEventListeners() 方法,通过 DOMDebugger 获取事件监听器
  • 添加 detectListenersuseAXTree 选项到 SnapshotOptions
  • 添加 annotateWithListeners() 后处理函数

3. 类型定义更新 (types.ts)

  • 扩展 SnapshotOptions 接口,添加 detectListenersuseAXTree 选项

4. 测试覆盖

  • 添加搜索元素检测相关测试(6 个新测试)
  • 添加 CDP 事件监听器检测测试(4 个新测试)
  • 总计 254 个测试全部通过

效果

  • 能更准确地检测带有事件监听器的元素
  • 改进了对 UI 框架(Vue/React/Angular)动态绑定事件的检测
  • 增强了对非标准交互模式的识别能力

- Add SEARCH_INDICATORS set to detect search-related elements by class/id
- Add hasFormControlDescendant helper to detect form controls within wrapper elements
- Add isSearchElement function to identify search buttons/inputs
- Enhance isInteractive to detect:
  - Labels wrapping form controls (label > span > input pattern)
  - Span elements containing form controls
  - Search-related elements

This improves detection of UI patterns where clickable elements are
wrapped in non-standard containers like labels and spans.

Ref: browser-use ClickableElementDetector research
- Add AXTreeNode interface and fetchInteractiveNodeIds method to CDPBridge
- Add fetchEventListeners method to detect click-related event listeners via DOMDebugger
- Add detectListeners and useAXTree options to SnapshotOptions
- Add annotateWithListeners to mark elements with click listeners in snapshot

This enables more accurate interactivity detection by leveraging Chrome DevTools
Protocol to identify elements that have event listeners attached, beyond what
can be detected from static HTML attributes alone.

Ref: browser-use ClickableElementDetector research
- Add tests for SEARCH_INDICATORS set in generateSnapshotJs
- Add tests for hasFormControlDescendant function
- Add tests for isSearchElement function
- Add tests for label/span wrapper detection in isInteractive
- Add tests for fetchInteractiveNodeIds (AXTree parsing)
- Add tests for fetchEventListeners (DOM listener extraction)

Total: 254 tests passing
@jackwener jackwener force-pushed the dom-detection-improvements branch from 904b8bd to 9401151 Compare March 24, 2026 15:42
@jackwener
Copy link
Owner Author

我已经直接把这版 PR 收到一个更稳的范围。

深度 review 结论:

  • dom-snapshot.ts 里的 search heuristics / label-span-input wrapper detection 是成立的,属于低风险增益
  • 原 PR 里新增的 CDP listener / AXTree 链路没有真正接到现有 snapshot 架构上:当前 snapshot() 返回的是序列化字符串,不是可回写的 DOM tree object,所以 annotateWithListeners() 这种 Node-side post-process 方案接不上实际数据流;同时 useAXTree / detectListeners 也只停留在类型层,没有形成闭环能力
  • 这部分代码还会直接导致 npm run typecheck 失败

我已经直接在当前分支处理:

  • 删除未完成的 CDP listener / AXTree snapshot hooks
  • 删除对应 tests 和未生效的 SnapshotOptions 扩展
  • 保留真正成立的 DOM heuristics 改进(search element + wrapper detection)

这是更简洁也更稳的方案:先把已验证有效的 heuristics 合进去;如果后面要做 listener-aware snapshot,应该单独起 PR,先重设数据流(例如在页面内先做 annotation,再统一交给 generateSnapshotJs()),不要在序列化后的 snapshot 上做 patch。

本地验证已过:

  • npx vitest run src/browser/dom-snapshot.test.ts src/browser/cdp.test.ts
  • npm run typecheck
  • npm test
  • npm run build

follow-up commit: 9401151

@jackwener jackwener merged commit 8d2ee03 into main Mar 24, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant