Add WebSpeech/browser ASR support on windows#11
Add WebSpeech/browser ASR support on windows#11OIerty wants to merge 55 commits intoouyangyipeng:mainfrom
Conversation
Introduce browser-based WebSpeech ASR mode so the frontend can perform SpeechRecognition and inject text into the backend. Key changes: - README and api-service/.env.example: document new `webspeech` ASR mode and WEBSPEECH_LANG config. - api-service: add /ingest_asr_text endpoint (monitor_router.py) and MonitorService.ingest_external_text to reuse existing ASR text handling flow. - api-service/services/asr_service.py: add BrowserSpeechASR placeholder and make create_asr recognize webspeech/browser modes. - app-ui: add browserAsr service (src/services/browserAsr.ts) that wraps SpeechRecognition/webkitSpeechRecognition, auto-restarts, and posts transcriptions to the backend (ingestAsrText); add API helpers (ingestAsrText, getConfiguredAsrMode, getConfiguredWebspeechLang) in services/api.ts. - app-ui/src/App.tsx: integrate browser ASR session lifecycle (start/stop on monitor start/pause/stop), read backend-configured ASR mode and lang, and add rollback/cleanup if start fails. - app-ui/src/components/SettingsPanel.tsx: expose WebSpeech option and WEBSPEECH_LANG with explanatory hints. Behavior notes: - Frontend performs recognition and POSTs recognized text to /ingest_asr_text; backend writes it through the same monitor pipeline. - The browser ASR auto-restarts on non-fatal disconnects and reports fatal errors (permission/capture) to the UI. - Start/stop flows include cleanup/rollback to keep backend and frontend sessions consistent.
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'webspeech' ASR mode, enabling the application to use the browser's native SpeechRecognition API for audio transcription. Key changes include the addition of a backend endpoint for ingesting external text, a new BrowserSpeechASR service, and a frontend implementation of the browser ASR session. The review feedback suggests optimizing the frontend by merging configuration-fetching functions to reduce redundant network requests and adding defensive checks in the backend to ensure text ingestion only occurs when the correct ASR mode is active, preventing potential duplicate transcriptions.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new “webspeech/browser” ASR mode where the frontend uses the browser Web Speech API for recognition and injects recognized text into the FastAPI backend, reusing the existing transcript/alert pipeline.
Changes:
- Backend: add
/ingest_asr_textendpoint andMonitorService.ingest_external_textto accept externally recognized text. - Backend: add
BrowserSpeechASRplaceholder and extendcreate_asrto recognizewebspeech/browser/edge-webspeechmodes. - Frontend: add a
browserAsrservice, API helpers to read configured ASR mode/lang, integrate browser ASR lifecycle into monitor start/pause/stop, and expose settings UI for WebSpeech language.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Documents the new WebSpeech ASR option and config. |
| api-service/.env.example | Adds webspeech mode and WEBSPEECH_LANG example config. |
| api-service/routers/monitor_router.py | Adds /ingest_asr_text API endpoint. |
| api-service/services/monitor_service.py | Adds external text ingestion hook into the existing ASR flow. |
| api-service/services/asr_service.py | Adds BrowserSpeechASR placeholder and factory support for webspeech modes. |
| app-ui/src/services/browserAsr.ts | Implements Web Speech recognition wrapper with auto-restart and backend injection. |
| app-ui/src/services/api.ts | Adds ingestAsrText plus helpers to parse ASR mode/lang from settings content. |
| app-ui/src/App.tsx | Starts/stops browser ASR alongside monitor lifecycle, with rollback/cleanup on failures. |
| app-ui/src/components/SettingsPanel.tsx | Adds UI options for webspeech and WEBSPEECH_LANG plus hints. |
| app-ui/package-lock.json | Lockfile metadata changes. |
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
非常感谢提交这个 PR!功能想法很棒,不过自动化 Code Review 工具(Copilot 和 Gemini)指出了一些潜在的阻塞问题(async 中调用 sync 写入)以及模式校验缺失的漏洞。麻烦参考下面的 Review 意见进行一波修复,完成后我们立刻 Merge! 1. 潜在的严重 Bug(建议必须修)
2. 业务逻辑漏洞(防患于未然)
3. 代码与性能优化(建议采纳)
|
Keep browser ASR ingestion on a worker thread, preserve pause and mode checks on the backend, and use effective ASR config returned by the monitor start flow. Also harden browser ASR stop handling and remove duplicate settings fetches.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
不好,好像搞混了 |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Delete the in-file browser speech recognition implementation: AUTO_RESTART_DELAY_MS, getRecognitionConstructor, createBrowserAsrSession and related timers/handlers and event logic. This cleans up the module in preparation for refactoring or moving ASR logic to a centralized/alternative implementation.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add normalize_asr_mode helper to asr_service and use it to consistently parse ASR_MODE from the environment (replacing repeated strip().lower() logic). Update monitor_router to import and use the helper. Strengthen MonitorService guards: return a clear paused status when monitoring is paused, add defensive handling when the ASR instance is None, and return a distinct status for unsupported ASR modes when external text injection is attempted.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 2 comments.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Remove unused environment parsing helpers (stripInlineEnvComment, getConfiguredAsrMode, getConfiguredWebspeechLang) from api.ts. Refactor browserAsr.ts: normalize formatting/indentation, make options required and trim session token, centralize restart scheduling, add send queue + dedupe logic for final transcripts, improve error reporting via onStatus and handle fatal recognition errors. These changes aim to simplify config handling and make the browser SpeechRecognition session more robust and resilient to start/restart errors. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Improve ingest_asr_text endpoint by importing HTTPException and mapping monitor service result statuses to appropriate HTTP responses (success -> 200, unauthorized -> 401, not_running/paused/unsupported_asr_mode -> 409, empty_text -> 400, other failures -> 500). This surfaces clearer errors to clients instead of always returning the raw result. Also small frontend cleanup in App.tsx: use const for the local text variable in the ASR message handler. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)
api-service/routers/monitor_router.py:78
/stop_monitor现在新增了with_summary查询参数来控制是否生成总结,但该端点的 docstring 仍然只描述“停止录音/停止 ASR”。建议在注释里补充参数含义与默认行为(with_summary=true时会继续调用 SummaryService),避免前端/调用方误解接口语义。
@router.post("/stop_monitor")
async def stop_monitor(with_summary: bool = True):
"""
停止监控
- 停止录音
- 停止 ASR
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated no new comments.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Change createBrowserAsrSession to take the options object as the first argument and the optional onStatus callback as the second. Update the call site in App.tsx to pass the options ({ lang, sessionToken }) first and the status callback second. Affects app-ui/src/services/browserAsr.ts and app-ui/src/App.tsx to make option passing explicit and improve parameter ordering.
Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const res = await fetch(`${API_BASE}/resume_monitor`, { method: "POST" }); | ||
| if (!res.ok) throw new Error("继续监控失败"); | ||
| return res.json(); | ||
| const data = await res.json(); | ||
| if (data.status && data.status !== "resumed") { | ||
| throw new Error(data.message || "继续监控失败"); | ||
| } |
There was a problem hiding this comment.
resumeMonitor 在 !res.ok 时同样只抛出通用错误,可能导致后端返回的具体失败原因(权限/依赖/配置)丢失。建议与 ingestAsrText 的做法一致,尽量解析响应体中的 detail/message 再抛出。
| material_name = request.cite_filename or "" | ||
| transcript_service.activate_cite_file(material_name or None) | ||
| result = await monitor_service.start( | ||
| course_name=request.course_name, | ||
| material_name=material_name, | ||
| ) |
There was a problem hiding this comment.
activate_cite_file() 在文件不存在时会抛出 FileNotFoundError(见 transcript_service.py),这里未捕获会导致 start_monitor 直接 500,前端也只能看到通用“启动监控失败”。建议在此处捕获该异常并转成 400(例如 HTTPException(status_code=400, detail=...)),返回更可操作的提示(如“未找到资料文件,请重新选择”)。
Handle FileNotFoundError in start_monitor: wrap transcript_service.activate_cite_file and return a 400 HTTPException with a user-facing message when the material file is missing. On the frontend, add extractErrorMessage to parse error response bodies (JSON or plain text) and use it in startMonitor and resumeMonitor to surface server-provided error details instead of generic messages. Improves user feedback for missing materials and other server errors. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add a formatDetail helper inside extractErrorMessage to normalize the response "detail" (handle strings, arrays, and objects, trim empty strings, and JSON-stringify non-string items). Replace the duplicated ad-hoc parsing in ingestAsrText with a call to extractErrorMessage, removing verbose error-parsing logic and improving robustness while preserving the fallback message "浏览器语音文本注入失败". Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Move updating of lastSentFinalTranscript and lastSentFinalAt into the sendQueue .then handler so they are only changed after the backend confirms receipt. This prevents the same final transcript from being suppressed by client-side dedupe if the send fails; the prior immediate update before enqueueing was removed. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- app-ui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Introduce browser-based WebSpeech ASR mode so the frontend can perform SpeechRecognition and inject text into the backend.
Key changes:
Behavior notes: