Skip to content

Add WebSpeech/browser ASR support on windows#11

Open
OIerty wants to merge 55 commits intoouyangyipeng:mainfrom
OIerty:toorigin
Open

Add WebSpeech/browser ASR support on windows#11
OIerty wants to merge 55 commits intoouyangyipeng:mainfrom
OIerty:toorigin

Conversation

@OIerty
Copy link
Copy Markdown

@OIerty OIerty commented Mar 30, 2026

Introduce browser-based WebSpeech ASR mode so the frontend can perform SpeechRecognition and inject text into the backend.

Key changes:

  • README and api-service/.env.example: document the new webspeech ASR mode and WEBSPEECH_LANG configuration.
  • api-service: add /ingest_asr_text and MonitorService.ingest_external_text to reuse the existing ASR text handling flow.
  • api-service/services/asr_service.py: add BrowserSpeechASR placeholder and make create_asr recognize webspeech/browser modes.
  • app-ui: add browserAsr service to wrap SpeechRecognition/webkitSpeechRecognition, auto-restart on recoverable disconnects, and post transcriptions to the backend.
  • app-ui/src/App.tsx: integrate browser ASR lifecycle on start, pause, resume, and stop; apply cleanup and rollback so backend and frontend stay consistent.
  • app-ui/src/components/SettingsPanel.tsx: expose the WebSpeech option and WEBSPEECH_LANG with explanatory hints.

Behavior notes:

  • Frontend performs recognition and POSTs recognized text to /ingest_asr_text; backend writes it through the same monitor pipeline.
  • Browser ASR auto-restarts on non-fatal disconnects and reports fatal errors like permission or capture issues to the UI.
  • Start, pause, resume, and stop flows include cleanup and rollback to keep backend and frontend sessions consistent.

Introduce browser-based WebSpeech ASR mode so the frontend can perform SpeechRecognition and inject text into the backend.

Key changes:
- README and api-service/.env.example: document new `webspeech` ASR mode and WEBSPEECH_LANG config.
- api-service: add /ingest_asr_text endpoint (monitor_router.py) and MonitorService.ingest_external_text to reuse existing ASR text handling flow.
- api-service/services/asr_service.py: add BrowserSpeechASR placeholder and make create_asr recognize webspeech/browser modes.
- app-ui: add browserAsr service (src/services/browserAsr.ts) that wraps SpeechRecognition/webkitSpeechRecognition, auto-restarts, and posts transcriptions to the backend (ingestAsrText); add API helpers (ingestAsrText, getConfiguredAsrMode, getConfiguredWebspeechLang) in services/api.ts.
- app-ui/src/App.tsx: integrate browser ASR session lifecycle (start/stop on monitor start/pause/stop), read backend-configured ASR mode and lang, and add rollback/cleanup if start fails.
- app-ui/src/components/SettingsPanel.tsx: expose WebSpeech option and WEBSPEECH_LANG with explanatory hints.

Behavior notes:
- Frontend performs recognition and POSTs recognized text to /ingest_asr_text; backend writes it through the same monitor pipeline.
- The browser ASR auto-restarts on non-fatal disconnects and reports fatal errors (permission/capture) to the UI.
- Start/stop flows include cleanup/rollback to keep backend and frontend sessions consistent.
Copilot AI review requested due to automatic review settings March 30, 2026 20:45
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 'webspeech' ASR mode, enabling the application to use the browser's native SpeechRecognition API for audio transcription. Key changes include the addition of a backend endpoint for ingesting external text, a new BrowserSpeechASR service, and a frontend implementation of the browser ASR session. The review feedback suggests optimizing the frontend by merging configuration-fetching functions to reduce redundant network requests and adding defensive checks in the backend to ensure text ingestion only occurs when the correct ASR mode is active, preventing potential duplicate transcriptions.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new “webspeech/browser” ASR mode where the frontend uses the browser Web Speech API for recognition and injects recognized text into the FastAPI backend, reusing the existing transcript/alert pipeline.

Changes:

  • Backend: add /ingest_asr_text endpoint and MonitorService.ingest_external_text to accept externally recognized text.
  • Backend: add BrowserSpeechASR placeholder and extend create_asr to recognize webspeech/browser/edge-webspeech modes.
  • Frontend: add a browserAsr service, API helpers to read configured ASR mode/lang, integrate browser ASR lifecycle into monitor start/pause/stop, and expose settings UI for WebSpeech language.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
README.md Documents the new WebSpeech ASR option and config.
api-service/.env.example Adds webspeech mode and WEBSPEECH_LANG example config.
api-service/routers/monitor_router.py Adds /ingest_asr_text API endpoint.
api-service/services/monitor_service.py Adds external text ingestion hook into the existing ASR flow.
api-service/services/asr_service.py Adds BrowserSpeechASR placeholder and factory support for webspeech modes.
app-ui/src/services/browserAsr.ts Implements Web Speech recognition wrapper with auto-restart and backend injection.
app-ui/src/services/api.ts Adds ingestAsrText plus helpers to parse ASR mode/lang from settings content.
app-ui/src/App.tsx Starts/stops browser ASR alongside monitor lifecycle, with rollback/cleanup on failures.
app-ui/src/components/SettingsPanel.tsx Adds UI options for webspeech and WEBSPEECH_LANG plus hints.
app-ui/package-lock.json Lockfile metadata changes.
Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ouyangyipeng
Copy link
Copy Markdown
Owner

非常感谢提交这个 PR!功能想法很棒,不过自动化 Code Review 工具(Copilot 和 Gemini)指出了一些潜在的阻塞问题(async 中调用 sync 写入)以及模式校验缺失的漏洞。麻烦参考下面的 Review 意见进行一波修复,完成后我们立刻 Merge!

1. 潜在的严重 Bug(建议必须修)

  • 异步路由中存在同步阻塞 (Copilot 提出):api-service/routers/monitor_router.py 中,@router.post 是一个 async 异步函数,但内部直接调用了带有磁盘读写操作的同步方法 ingest_external_text。在语音识别这种高频上报的场景下,会严重阻塞事件循环,导致 WebSocket 或其他 API 响应卡顿。需要改为异步处理或放入线程池。
  • 缺少异常捕获导致流程中断 (Copilot 提出): 在前端 browserAsr.ts 中,catch 代码块里直接调用了 recognition.abort(),但这句代码本身也可能抛出异常。如果不包一层 try/catch,错误会向上冒泡,导致 UI 状态和后端严重脱节。

2. 业务逻辑漏洞(防患于未然)

  • 跨模式数据污染/重复录音 (Copilot & Gemini 共同提出): 后端 ingest_external_text 接口缺乏对当前 ASR 模式的校验。如果用户后端配置的是 local(本地麦克风收音),但前端依然调用这个接口注入文本,会导致转录内容重复。必须在接收数据前判断当前确切的 ASR 模式。
  • 暂停状态处理不当 (Copilot 提出): 同样在 ingest_external_text 中,没有判断监控是否处于 is_paused(暂停)状态。如果系统已暂停,文本会被底层丢弃,但该接口依然会给前端返回 success,导致前后端认知不一致。

3. 代码与性能优化(建议采纳)

  • 前端冗余的网络请求 (Gemini 提出): 前端获取配置时,分别调用了两次获取 ASR_MODEWEBSPEECH_LANG 的函数,这会导致重复发出网络请求。建议按照 Gemini 的建议合并为一个 getAsrConfig() 函数一次性获取。

OIerty added 3 commits April 1, 2026 18:49
Keep browser ASR ingestion on a worker thread, preserve pause and mode checks on the backend, and use effective ASR config returned by the monitor start flow. Also harden browser ASR stop handling and remove duplicate settings fetches.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@OIerty
Copy link
Copy Markdown
Author

OIerty commented Apr 1, 2026

不好,好像搞混了

OIerty and others added 5 commits April 1, 2026 19:25
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

OIerty and others added 4 commits April 1, 2026 19:45
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Delete the in-file browser speech recognition implementation: AUTO_RESTART_DELAY_MS, getRecognitionConstructor, createBrowserAsrSession and related timers/handlers and event logic. This cleans up the module in preparation for refactoring or moving ASR logic to a centralized/alternative implementation.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

OIerty and others added 2 commits April 1, 2026 20:24
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add normalize_asr_mode helper to asr_service and use it to consistently parse ASR_MODE from the environment (replacing repeated strip().lower() logic). Update monitor_router to import and use the helper. Strengthen MonitorService guards: return a clear paused status when monitoring is paused, add defensive handling when the ASR instance is None, and return a distinct status for unsupported ASR modes when external text injection is attempted.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 2 comments.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Remove unused environment parsing helpers (stripInlineEnvComment, getConfiguredAsrMode, getConfiguredWebspeechLang) from api.ts. Refactor browserAsr.ts: normalize formatting/indentation, make options required and trim session token, centralize restart scheduling, add send queue + dedupe logic for final transcripts, improve error reporting via onStatus and handle fatal recognition errors. These changes aim to simplify config handling and make the browser SpeechRecognition session more robust and resilient to start/restart errors.

Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Improve ingest_asr_text endpoint by importing HTTPException and mapping monitor service result statuses to appropriate HTTP responses (success -> 200, unauthorized -> 401, not_running/paused/unsupported_asr_mode -> 409, empty_text -> 400, other failures -> 500). This surfaces clearer errors to clients instead of always returning the raw result. Also small frontend cleanup in App.tsx: use const for the local text variable in the ASR message handler.

Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)

api-service/routers/monitor_router.py:78

  • /stop_monitor 现在新增了 with_summary 查询参数来控制是否生成总结,但该端点的 docstring 仍然只描述“停止录音/停止 ASR”。建议在注释里补充参数含义与默认行为(with_summary=true 时会继续调用 SummaryService),避免前端/调用方误解接口语义。
@router.post("/stop_monitor")
async def stop_monitor(with_summary: bool = True):
    """
    停止监控
    - 停止录音
    - 停止 ASR
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated no new comments.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Change createBrowserAsrSession to take the options object as the first argument and the optional onStatus callback as the second. Update the call site in App.tsx to pass the options ({ lang, sessionToken }) first and the status callback second. Affects app-ui/src/services/browserAsr.ts and app-ui/src/App.tsx to make option passing explicit and improve parameter ordering.

Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 99 to +104
const res = await fetch(`${API_BASE}/resume_monitor`, { method: "POST" });
if (!res.ok) throw new Error("继续监控失败");
return res.json();
const data = await res.json();
if (data.status && data.status !== "resumed") {
throw new Error(data.message || "继续监控失败");
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resumeMonitor!res.ok 时同样只抛出通用错误,可能导致后端返回的具体失败原因(权限/依赖/配置)丢失。建议与 ingestAsrText 的做法一致,尽量解析响应体中的 detail/message 再抛出。

Copilot uses AI. Check for mistakes.
Comment on lines 49 to 54
material_name = request.cite_filename or ""
transcript_service.activate_cite_file(material_name or None)
result = await monitor_service.start(
course_name=request.course_name,
material_name=material_name,
)
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activate_cite_file() 在文件不存在时会抛出 FileNotFoundError(见 transcript_service.py),这里未捕获会导致 start_monitor 直接 500,前端也只能看到通用“启动监控失败”。建议在此处捕获该异常并转成 400(例如 HTTPException(status_code=400, detail=...)),返回更可操作的提示(如“未找到资料文件,请重新选择”)。

Copilot uses AI. Check for mistakes.
Handle FileNotFoundError in start_monitor: wrap transcript_service.activate_cite_file and return a 400 HTTPException with a user-facing message when the material file is missing. On the frontend, add extractErrorMessage to parse error response bodies (JSON or plain text) and use it in startMonitor and resumeMonitor to surface server-provided error details instead of generic messages. Improves user feedback for missing materials and other server errors.

Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
@OIerty OIerty requested a review from Copilot April 5, 2026 09:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add a formatDetail helper inside extractErrorMessage to normalize the response "detail" (handle strings, arrays, and objects, trim empty strings, and JSON-stringify non-string items). Replace the duplicated ad-hoc parsing in ingestAsrText with a call to extractErrorMessage, removing verbose error-parsing logic and improving robustness while preserving the fallback message "浏览器语音文本注入失败".

Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Move updating of lastSentFinalTranscript and lastSentFinalAt into the sendQueue .then handler so they are only changed after the backend confirms receipt. This prevents the same final transcript from being suppressed by client-side dedupe if the send fails; the prior immediate update before enqueueing was removed.

Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@OIerty
Copy link
Copy Markdown
Author

OIerty commented Apr 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants