Add WebSpeech/browser ASR support on windows by OIerty · Pull Request #11 · ouyangyipeng/ClassAssistant

OIerty · 2026-03-30T20:45:06Z

Introduce browser-based WebSpeech ASR mode so the frontend can perform SpeechRecognition and inject text into the backend.

Key changes:

README and api-service/.env.example: document the new webspeech ASR mode and WEBSPEECH_LANG configuration.
api-service: add /ingest_asr_text and MonitorService.ingest_external_text to reuse the existing ASR text handling flow.
api-service/services/asr_service.py: add BrowserSpeechASR placeholder and make create_asr recognize webspeech/browser modes.
app-ui: add browserAsr service to wrap SpeechRecognition/webkitSpeechRecognition, auto-restart on recoverable disconnects, and post transcriptions to the backend.
app-ui/src/App.tsx: integrate browser ASR lifecycle on start, pause, resume, and stop; apply cleanup and rollback so backend and frontend stay consistent.
app-ui/src/components/SettingsPanel.tsx: expose the WebSpeech option and WEBSPEECH_LANG with explanatory hints.

Behavior notes:

Frontend performs recognition and POSTs recognized text to /ingest_asr_text; backend writes it through the same monitor pipeline.
Browser ASR auto-restarts on non-fatal disconnects and reports fatal errors like permission or capture issues to the UI.
Start, pause, resume, and stop flows include cleanup and rollback to keep backend and frontend sessions consistent.

Introduce browser-based WebSpeech ASR mode so the frontend can perform SpeechRecognition and inject text into the backend. Key changes: - README and api-service/.env.example: document new `webspeech` ASR mode and WEBSPEECH_LANG config. - api-service: add /ingest_asr_text endpoint (monitor_router.py) and MonitorService.ingest_external_text to reuse existing ASR text handling flow. - api-service/services/asr_service.py: add BrowserSpeechASR placeholder and make create_asr recognize webspeech/browser modes. - app-ui: add browserAsr service (src/services/browserAsr.ts) that wraps SpeechRecognition/webkitSpeechRecognition, auto-restarts, and posts transcriptions to the backend (ingestAsrText); add API helpers (ingestAsrText, getConfiguredAsrMode, getConfiguredWebspeechLang) in services/api.ts. - app-ui/src/App.tsx: integrate browser ASR session lifecycle (start/stop on monitor start/pause/stop), read backend-configured ASR mode and lang, and add rollback/cleanup if start fails. - app-ui/src/components/SettingsPanel.tsx: expose WebSpeech option and WEBSPEECH_LANG with explanatory hints. Behavior notes: - Frontend performs recognition and POSTs recognized text to /ingest_asr_text; backend writes it through the same monitor pipeline. - The browser ASR auto-restarts on non-fatal disconnects and reports fatal errors (permission/capture) to the UI. - Start/stop flows include cleanup/rollback to keep backend and frontend sessions consistent.

gemini-code-assist

Code Review

This pull request introduces a new 'webspeech' ASR mode, enabling the application to use the browser's native SpeechRecognition API for audio transcription. Key changes include the addition of a backend endpoint for ingesting external text, a new BrowserSpeechASR service, and a frontend implementation of the browser ASR session. The review feedback suggests optimizing the frontend by merging configuration-fetching functions to reduce redundant network requests and adding defensive checks in the backend to ensure text ingestion only occurs when the correct ASR mode is active, preventing potential duplicate transcriptions.

app-ui/src/services/api.ts

app-ui/src/App.tsx

api-service/services/monitor_service.py

Copilot

Pull request overview

This PR introduces a new “webspeech/browser” ASR mode where the frontend uses the browser Web Speech API for recognition and injects recognized text into the FastAPI backend, reusing the existing transcript/alert pipeline.

Changes:

Backend: add /ingest_asr_text endpoint and MonitorService.ingest_external_text to accept externally recognized text.
Backend: add BrowserSpeechASR placeholder and extend create_asr to recognize webspeech/browser/edge-webspeech modes.
Frontend: add a browserAsr service, API helpers to read configured ASR mode/lang, integrate browser ASR lifecycle into monitor start/pause/stop, and expose settings UI for WebSpeech language.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
README.md	Documents the new WebSpeech ASR option and config.
api-service/.env.example	Adds `webspeech` mode and `WEBSPEECH_LANG` example config.
api-service/routers/monitor_router.py	Adds `/ingest_asr_text` API endpoint.
api-service/services/monitor_service.py	Adds external text ingestion hook into the existing ASR flow.
api-service/services/asr_service.py	Adds `BrowserSpeechASR` placeholder and factory support for webspeech modes.
app-ui/src/services/browserAsr.ts	Implements Web Speech recognition wrapper with auto-restart and backend injection.
app-ui/src/services/api.ts	Adds `ingestAsrText` plus helpers to parse ASR mode/lang from settings content.
app-ui/src/App.tsx	Starts/stops browser ASR alongside monitor lifecycle, with rollback/cleanup on failures.
app-ui/src/components/SettingsPanel.tsx	Adds UI options for `webspeech` and `WEBSPEECH_LANG` plus hints.
app-ui/package-lock.json	Lockfile metadata changes.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

api-service/services/monitor_service.py

api-service/routers/monitor_router.py

app-ui/src/services/browserAsr.ts

app-ui/src/App.tsx

ouyangyipeng · 2026-03-31T04:39:41Z

非常感谢提交这个 PR！功能想法很棒，不过自动化 Code Review 工具（Copilot 和 Gemini）指出了一些潜在的阻塞问题（async 中调用 sync 写入）以及模式校验缺失的漏洞。麻烦参考下面的 Review 意见进行一波修复，完成后我们立刻 Merge！

1. 潜在的严重 Bug（建议必须修）

异步路由中存在同步阻塞 (Copilot 提出)： 在 api-service/routers/monitor_router.py 中，@router.post 是一个 async 异步函数，但内部直接调用了带有磁盘读写操作的同步方法 ingest_external_text。在语音识别这种高频上报的场景下，会严重阻塞事件循环，导致 WebSocket 或其他 API 响应卡顿。需要改为异步处理或放入线程池。
缺少异常捕获导致流程中断 (Copilot 提出)： 在前端 browserAsr.ts 中，catch 代码块里直接调用了 recognition.abort()，但这句代码本身也可能抛出异常。如果不包一层 try/catch，错误会向上冒泡，导致 UI 状态和后端严重脱节。

2. 业务逻辑漏洞（防患于未然）

跨模式数据污染/重复录音 (Copilot & Gemini 共同提出)： 后端 ingest_external_text 接口缺乏对当前 ASR 模式的校验。如果用户后端配置的是 local（本地麦克风收音），但前端依然调用这个接口注入文本，会导致转录内容重复。必须在接收数据前判断当前确切的 ASR 模式。
暂停状态处理不当 (Copilot 提出)： 同样在 ingest_external_text 中，没有判断监控是否处于 is_paused（暂停）状态。如果系统已暂停，文本会被底层丢弃，但该接口依然会给前端返回 success，导致前后端认知不一致。

3. 代码与性能优化（建议采纳）

前端冗余的网络请求 (Gemini 提出)： 前端获取配置时，分别调用了两次获取 ASR_MODE 和 WEBSPEECH_LANG 的函数，这会导致重复发出网络请求。建议按照 Gemini 的建议合并为一个 getAsrConfig() 函数一次性获取。

Keep browser ASR ingestion on a worker thread, preserve pause and mode checks on the backend, and use effective ASR config returned by the monitor start flow. Also harden browser ASR stop handling and remove duplicate settings fetches.

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/services/browserAsr.ts

app-ui/src/App.tsx

api-service/routers/monitor_router.py

api-service/services/asr_service.py

OIerty · 2026-04-01T11:24:16Z

不好，好像搞混了

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/services/browserAsr.ts

app-ui/src/App.tsx

api-service/services/monitor_service.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Delete the in-file browser speech recognition implementation: AUTO_RESTART_DELAY_MS, getRecognitionConstructor, createBrowserAsrSession and related timers/handlers and event logic. This cleans up the module in preparation for refactoring or moving ASR logic to a centralized/alternative implementation.

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

api-service/services/asr_service.py

api-service/services/monitor_service.py

app-ui/src/services/api.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add normalize_asr_mode helper to asr_service and use it to consistently parse ASR_MODE from the environment (replacing repeated strip().lower() logic). Update monitor_router to import and use the helper. Strengthen MonitorService guards: return a clear paused status when monitoring is paused, add defensive handling when the ASR instance is None, and return a distinct status for unsupported ASR modes when external text injection is attempted.

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 2 comments.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/services/browserAsr.ts

app-ui/src/services/api.ts

Remove unused environment parsing helpers (stripInlineEnvComment, getConfiguredAsrMode, getConfiguredWebspeechLang) from api.ts. Refactor browserAsr.ts: normalize formatting/indentation, make options required and trim session token, centralize restart scheduling, add send queue + dedupe logic for final transcripts, improve error reporting via onStatus and handle fatal recognition errors. These changes aim to simplify config handling and make the browser SpeechRecognition session more robust and resilient to start/restart errors. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

api-service/routers/monitor_router.py

app-ui/src/App.tsx

app-ui/src/services/api.ts

Improve ingest_asr_text endpoint by importing HTTPException and mapping monitor service result statuses to appropriate HTTP responses (success -> 200, unauthorized -> 401, not_running/paused/unsupported_asr_mode -> 409, empty_text -> 400, other failures -> 500). This surfaces clearer errors to clients instead of always returning the raw result. Also small frontend cleanup in App.tsx: use const for the local text variable in the ASR message handler. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

Comments suppressed due to low confidence (1)

api-service/routers/monitor_router.py:78

/stop_monitor 现在新增了 with_summary 查询参数来控制是否生成总结，但该端点的 docstring 仍然只描述“停止录音/停止 ASR”。建议在注释里补充参数含义与默认行为（with_summary=true 时会继续调用 SummaryService），避免前端/调用方误解接口语义。

@router.post("/stop_monitor")
async def stop_monitor(with_summary: bool = True):
    """
    停止监控
    - 停止录音
    - 停止 ASR
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/services/api.ts

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated no new comments.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Change createBrowserAsrSession to take the options object as the first argument and the optional onStatus callback as the second. Update the call site in App.tsx to pass the options ({ lang, sessionToken }) first and the status callback second. Affects app-ui/src/services/browserAsr.ts and app-ui/src/App.tsx to make option passing explicit and improve parameter ordering. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/App.tsx

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/services/api.ts

Copilot · 2026-04-05T09:22:20Z

app-ui/src/services/api.ts

  const res = await fetch(`${API_BASE}/resume_monitor`, { method: "POST" });
  if (!res.ok) throw new Error("继续监控失败");
-  return res.json();
+  const data = await res.json();
+  if (data.status && data.status !== "resumed") {
+    throw new Error(data.message || "继续监控失败");
+  }


resumeMonitor 在 !res.ok 时同样只抛出通用错误，可能导致后端返回的具体失败原因（权限/依赖/配置）丢失。建议与 ingestAsrText 的做法一致，尽量解析响应体中的 detail/message 再抛出。

Copilot · 2026-04-05T09:22:20Z

api-service/routers/monitor_router.py

    material_name = request.cite_filename or ""
    transcript_service.activate_cite_file(material_name or None)
    result = await monitor_service.start(
        course_name=request.course_name,
        material_name=material_name,
    )


activate_cite_file() 在文件不存在时会抛出 FileNotFoundError（见 transcript_service.py），这里未捕获会导致 start_monitor 直接 500，前端也只能看到通用“启动监控失败”。建议在此处捕获该异常并转成 400（例如 HTTPException(status_code=400, detail=...)），返回更可操作的提示（如“未找到资料文件，请重新选择”）。

Handle FileNotFoundError in start_monitor: wrap transcript_service.activate_cite_file and return a 400 HTTPException with a user-facing message when the material file is missing. On the frontend, add extractErrorMessage to parse error response bodies (JSON or plain text) and use it in startMonitor and resumeMonitor to surface server-provided error details instead of generic messages. Improves user feedback for missing materials and other server errors. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/services/api.ts

Add a formatDetail helper inside extractErrorMessage to normalize the response "detail" (handle strings, arrays, and objects, trim empty strings, and JSON-stringify non-string items). Replace the duplicated ad-hoc parsing in ingestAsrText with a call to extractErrorMessage, removing verbose error-parsing logic and improving robustness while preserving the fallback message "浏览器语音文本注入失败". Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/services/browserAsr.ts

Move updating of lastSentFinalTranscript and lastSentFinalAt into the sendQueue .then handler so they are only changed after the backend confirms receipt. This prevents the same final transcript from being suppressed by client-side dedupe if the send fails; the prior immediate update before enqueueing was removed. Co-Authored-By: Copilot <198982749+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

app-ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

app-ui/src/App.tsx

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

OIerty · 2026-04-05T10:37:06Z

https://github.com/OIerty/ClassAssistant/releases/tag/v1.2.1-Windows

Copilot AI review requested due to automatic review settings March 30, 2026 20:45

Copilot started reviewing on behalf of OIerty March 30, 2026 20:45 View session

gemini-code-assist bot reviewed Mar 30, 2026

View reviewed changes

app-ui/src/services/api.ts Outdated Show resolved Hide resolved

app-ui/src/App.tsx Outdated Show resolved Hide resolved

app-ui/src/App.tsx Outdated Show resolved Hide resolved

api-service/services/monitor_service.py Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 30, 2026

View reviewed changes

OIerty added 3 commits April 1, 2026 18:49

feat: 增强监控服务，支持浏览器语音识别和环境变量配置

8b1fee1

fix

feb8cf4

Fix webspeech merge conflicts

4c0aca2

Keep browser ASR ingestion on a worker thread, preserve pause and mode checks on the backend, and use effective ASR config returned by the monitor start flow. Also harden browser ASR stop handling and remove duplicate settings fetches.

OIerty requested a review from Copilot April 1, 2026 11:15

Copilot started reviewing on behalf of OIerty April 1, 2026 11:15 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

app-ui/src/services/browserAsr.ts Outdated Show resolved Hide resolved

app-ui/src/App.tsx Outdated Show resolved Hide resolved

api-service/routers/monitor_router.py Outdated Show resolved Hide resolved

api-service/services/asr_service.py Outdated Show resolved Hide resolved

OIerty and others added 5 commits April 1, 2026 19:25

Apply suggestion from @Copilot

ffffbcf

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @gemini-code-assist[bot]

cd80bb0

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Apply suggestion from @Copilot

bd98b40

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @gemini-code-assist[bot]

8134684

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update app-ui/src/services/browserAsr.ts

90d7109

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

OIerty requested a review from Copilot April 1, 2026 11:34

Copilot started reviewing on behalf of OIerty April 1, 2026 11:34 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

app-ui/src/services/browserAsr.ts Outdated Show resolved Hide resolved

app-ui/src/App.tsx Outdated Show resolved Hide resolved

api-service/services/monitor_service.py Outdated Show resolved Hide resolved

api-service/services/monitor_service.py Outdated Show resolved Hide resolved

OIerty and others added 4 commits April 1, 2026 19:45

Update app-ui/src/App.tsx

b389c83

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update api-service/services/monitor_service.py

4416d4c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update api-service/services/monitor_service.py

d7bde6b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

OIerty requested a review from Copilot April 1, 2026 12:08

Copilot started reviewing on behalf of OIerty April 1, 2026 12:09 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

api-service/services/asr_service.py Outdated Show resolved Hide resolved

api-service/services/monitor_service.py Outdated Show resolved Hide resolved

app-ui/src/services/api.ts Outdated Show resolved Hide resolved

OIerty and others added 2 commits April 1, 2026 20:24

Update app-ui/src/services/api.ts

fad4ba7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot started reviewing on behalf of OIerty April 5, 2026 08:30 View session