Skip to content

fix(proxy): improve 499 handling and add hedge reason types#893

Closed
ding113 wants to merge 1 commit intodevfrom
fix/499-handling-and-hedge-reasons
Closed

fix(proxy): improve 499 handling and add hedge reason types#893
ding113 wants to merge 1 commit intodevfrom
fix/499-handling-and-hedge-reasons

Conversation

@ding113
Copy link
Owner

@ding113 ding113 commented Mar 9, 2026

Summary

Improves client abort (HTTP 499) handling and adds infrastructure for future hedge scheduling support.

Changes

1. Enhanced 499 Handling

  • Conditional session clearing: Only clear session provider binding on first attempt to avoid race condition
  • New reason type: Added client_abort to distinguish from generic system_error
  • Detailed logging: Added debug logging when session binding is cleared

2. Hedge Reason Types (Infrastructure)

  • Added 3 new reason types for future hedge scheduling:
    • hedge_triggered: First-byte timeout triggers hedge (parallel provider attempt)
    • hedge_winner: Hedge race winner (first provider to return first byte)
    • hedge_loser_cancelled: Hedge race loser (cancelled by winner)
  • Added comprehensive documentation for all new reason types
  • Updated i18n translations (zh-CN, en)

Technical Details

Race Condition Fix

Problem: Original implementation unconditionally cleared session binding on 499, which could clear a successful binding if the provider responded after client disconnect.

Solution: Only clear binding on first attempt (attemptCount === 1), preserving session stickiness for retry scenarios.

Type System Updates

  • src/types/message.ts: Added 4 new ProviderChainReason types with detailed comments
  • src/app/v1/_lib/proxy/session.ts: Synced reason types with message.ts
  • messages/*/provider-chain.json: Added translations for all new reasons

Testing

  • ✅ All 3635 tests pass (377 test files)
  • ✅ Type check passes
  • ✅ Lint passes
  • ✅ No regressions detected

Related

  • Addresses code review feedback from previous hedge scheduling exploration
  • Lays groundwork for future hedge scheduling implementation (Phase 4-5 of original plan)
  • Fixes P0 and P1 issues identified in code review

Checklist

  • Code follows project conventions
  • All tests pass
  • Type check passes
  • Lint passes
  • i18n translations added
  • Documentation updated

Greptile Summary

This PR improves HTTP 499 (client abort) handling in the proxy forwarder and lays infrastructure groundwork for future hedge scheduling support. The core behavioral change makes session provider binding cleanup conditional on attemptCount === 1, which correctly avoids clearing a binding that was successfully established by a prior attempt within the same retry loop.

Key changes:

  • Race condition fix: Session binding is now only cleared on the first attempt of a 499, preserving stickiness when the abort happens on a retry.
  • New reason type: client_abort replaces the generic system_error for HTTP 499 entries in the provider decision chain, improving observability.
  • Infrastructure types: hedge_triggered, hedge_winner, and hedge_loser_cancelled are pre-defined for future hedge scheduling but are not yet wired to any logic.
  • Code quality: The 499 handler uses a redundant dynamic import of SessionManager that is already statically available, adding unnecessary async overhead.

All four new reason types are properly synced across types, session, and i18n files (English and Chinese translations).

Confidence Score: 4/5

  • Safe to merge after removing the redundant dynamic import; no correctness issues found.
  • The logic change (conditional session clearing on attemptCount === 1) is well-reasoned and correctly addresses the described race condition. All four new reason types are properly synced across types, session, and i18n files. The only issue is a redundant dynamic import of a module already statically imported, which is a style/cleanup concern rather than a bug. This is easily fixed and improves code clarity and efficiency.
  • src/app/v1/_lib/proxy/forwarder.ts — redundant dynamic import on line 1027 should be removed

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Request arrives] --> B{Session bound?}
    B -- Yes --> C[Attempt 1: session_reuse provider]
    B -- No --> D[Attempt 1: fresh provider selection]
    C --> E{Result?}
    D --> E
    E -- Success --> F[Record chain: request_success / retry_success]
    E -- 499 Client Abort --> G{attemptCount === 1?}
    G -- Yes --> H[clearSessionProvider\nRecord chain: client_abort\nThrow error]
    G -- No --> I[Keep existing session binding\nRecord chain: client_abort\nThrow error]
    E -- Retryable error --> J[Increment attemptCount\nSelect next provider]
    J --> E
Loading

Last reviewed commit: 8c9a8f5

Greptile also left 2 inline comments on this PR.

- Add client_abort reason type with conditional session clearing
- Only clear session binding on first attempt to avoid race condition
- Add hedge_triggered/hedge_winner/hedge_loser_cancelled reason types
- Add detailed documentation for all new reason types
- Update i18n translations (zh-CN, en) for new reasons

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the handling of client aborts (HTTP 499) by introducing a more robust session clearing mechanism that avoids race conditions. It also lays essential groundwork for future hedge scheduling capabilities by defining new reason types, enhancing the system's ability to manage parallel provider attempts and their outcomes.

Highlights

  • Enhanced 499 Handling: Implemented conditional session clearing for HTTP 499 (Client Abort) responses, clearing session provider binding only on the first attempt to prevent race conditions and preserve session stickiness. A new "client_abort" reason type was introduced, replacing "system_error" for better distinction, and debug logging was added for session binding clearance.
  • Hedge Reason Types Infrastructure: Added three new reason types ("hedge_triggered", "hedge_winner", "hedge_loser_cancelled") to support future hedge scheduling, along with comprehensive documentation and updated i18n translations for English and Chinese.
Changelog
  • messages/en/provider-chain.json
    • Added new translation keys for "client_abort", "hedge_triggered", "hedge_winner", and "hedge_loser_cancelled".
  • messages/zh-CN/provider-chain.json
    • Added new Chinese translation keys for "client_abort", "hedge_triggered", "hedge_winner", and "hedge_loser_cancelled".
  • src/app/v1/_lib/proxy/forwarder.ts
    • Modified the 499 handling logic to conditionally clear session provider binding only on the first attempt.
    • Changed the reason for client aborts from "system_error" to "client_abort" in the decision chain.
    • Added debug logging when a session binding is cleared due to client abort.
  • src/app/v1/_lib/proxy/session.ts
    • Extended the "ProviderChainReason" type to include "client_abort", "hedge_triggered", "hedge_winner", and "hedge_loser_cancelled" with descriptive comments.
  • src/types/message.ts
    • Extended the "ProviderChainItem" interface's "reason" type to include "client_abort", "hedge_triggered", "hedge_winner", and "hedge_loser_cancelled" with descriptive comments.
Activity
  • All 3635 tests passed across 377 test files.
  • Type checking passed successfully.
  • Linting passed without issues.
  • No regressions were detected during testing.
  • Addressed code review feedback from previous hedge scheduling exploration.
  • Lays groundwork for future hedge scheduling implementation (Phase 4-5 of original plan).
  • Fixed P0 and P1 issues identified in code review.
  • Code follows project conventions.
  • i18n translations were added.
  • Documentation was updated.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

工作流程概览

此变更为提供者链决策系统添加四个新的原因代码(client_abort、hedge_triggered、hedge_winner、hedge_loser_cancelled),包括翻译资源、类型定义和会话清理逻辑。同时在客户端中止时实现了会话提供者的动态清理功能。

变更

内聚组 / 文件 变更摘要
翻译资源文件
messages/en/provider-chain.json, messages/zh-CN/provider-chain.json
在 reasons 对象中添加了四个新的翻译键:client_abort、hedge_triggered、hedge_winner 和 hedge_loser_cancelled,分别对应英文和中文翻译。
类型定义更新
src/types/message.ts, src/app/v1/_lib/proxy/session.ts
扩展 ProviderChainItem.reason 和 addProviderToChain 方法的 metadata.reason 联合类型,新增四个原因字面量常量。
会话清理实现
src/app/v1/_lib/proxy/forwarder.ts
在客户端中止路径中添加动态导入 SessionManager 并调用 clearSessionProvider 进行会话绑定清理的逻辑,同时将决策链原因改为 "client_abort"。

代码审查工作量评估

🎯 2 (Simple) | ⏱️ ~12 分钟

可能相关的 PR

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed 标题清晰准确地总结了主要变更,包括改进499处理和添加对冲原因类型,与变更集高度相关。
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed 拉取请求描述准确相关,详细说明了代码更改的内容,包括客户端中止处理改进和对冲调度基础设施的添加。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/499-handling-and-hedge-reasons

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added bug Something isn't working area:session area:i18n labels Mar 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

🧪 测试结果

测试类型 状态
代码质量
单元测试
集成测试
API 测试

总体结果: ✅ 所有测试通过

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves client abort (HTTP 499) handling by fixing a race condition and introduces new reason types for future hedge scheduling support. The changes are well-structured and address the described issues. I've identified one area for improvement regarding a redundant dynamic import, which can be simplified by using the existing static import for better performance and maintainability.

Comment on lines +1027 to +1028
const { SessionManager } = await import("@/lib/session-manager");
await SessionManager.clearSessionProvider(session.sessionId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

SessionManager is already imported statically at the top of this file (line 26). Using the existing static import is more efficient than this dynamic import and avoids redundant module loading.

Suggested change
const { SessionManager } = await import("@/lib/session-manager");
await SessionManager.clearSessionProvider(session.sessionId);
await SessionManager.clearSessionProvider(session.sessionId);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/app/v1/_lib/proxy/session.ts (1)

441-460: 这里的 reason 建议直接复用 ProviderChainItem["reason"]

现在这份字面量 union 又和 src/types/message.ts 手工同步了一次;下次再加/删 reason 时,很容易只改一边。这里已经引入了 ProviderChainItem,直接复用类型会稳很多。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/v1/_lib/proxy/session.ts` around lines 441 - 460, The literal union
type for reason should be replaced with ProviderChainItem["reason"] to avoid
duplicate manual syncing; update the declaration of reason in this module to use
ProviderChainItem["reason"] instead of the long string union, and ensure
ProviderChainItem is imported (or referenced) in this file so the type resolves
correctly (keep existing comments as needed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/v1/_lib/proxy/forwarder.ts`:
- Around line 1025-1028: The current unconditional call to
SessionManager.clearSessionProvider(session.sessionId) can delete a provider
newly written by a concurrent request; change this to a compare-and-delete: pass
the expected provider identifier for this request (e.g., the provider id on the
current session object) and perform an atomic check-and-delete in
SessionManager.clearSessionProvider(expectedSessionId, expectedProvider) (or
implement a new method like clearSessionProviderIfMatches). The implementation
should read the current provider and only delete if it equals the expected
provider using an atomic Redis operation (EVAL Lua script or WATCH/MULTI) so
concurrent successful updates are not removed.

In `@src/types/message.ts`:
- Around line 38-42: The UI branches in LogicTraceTab and provider-chain-popover
still use the old exhaustive switch/if over providerChain.reason and thus treat
the new enum entries as default/pending; update the conditional logic in
src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx
and src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx to
explicitly handle the new reason values ("client_restriction_filtered",
"client_abort", "hedge_triggered", "hedge_winner", "hedge_loser_cancelled"),
mapping each to the correct display state (e.g., client_abort => show
client-aborted state/not counted as provider failure,
client_restriction_filtered => neutral/filtered, hedge_triggered/winner/loser =>
mark as hedged/winner/loser accordingly) instead of falling through to default
so providerChain entries render accurate statuses.

---

Nitpick comments:
In `@src/app/v1/_lib/proxy/session.ts`:
- Around line 441-460: The literal union type for reason should be replaced with
ProviderChainItem["reason"] to avoid duplicate manual syncing; update the
declaration of reason in this module to use ProviderChainItem["reason"] instead
of the long string union, and ensure ProviderChainItem is imported (or
referenced) in this file so the type resolves correctly (keep existing comments
as needed).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: deee3707-bd86-48fc-ae0d-58b63cb9f827

📥 Commits

Reviewing files that changed from the base of the PR and between 26b34a9 and 8c9a8f5.

📒 Files selected for processing (5)
  • messages/en/provider-chain.json
  • messages/zh-CN/provider-chain.json
  • src/app/v1/_lib/proxy/forwarder.ts
  • src/app/v1/_lib/proxy/session.ts
  • src/types/message.ts

Comment on lines +1025 to +1028
// 清理 session provider 绑定(仅在首次尝试时清理,避免清理已成功的绑定)
if (session.sessionId && attemptCount === 1) {
const { SessionManager } = await import("@/lib/session-manager");
await SessionManager.clearSessionProvider(session.sessionId);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

这里的清理仍然会误删并发请求刚写入的新绑定。

src/lib/session-manager.ts:576-586 里的 clearSessionProvider() 是无条件 DEL session:${sessionId}:providerattemptCount === 1 只避免了“同一转发链里晚到成功响应”的竞态;如果同一 sessionId 下另一个并发请求已经成功更新了绑定,这里仍会把那个新绑定删掉。更稳妥的做法是改成 compare-and-delete,至少校验当前绑定仍然是这次请求对应的 provider。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/v1/_lib/proxy/forwarder.ts` around lines 1025 - 1028, The current
unconditional call to SessionManager.clearSessionProvider(session.sessionId) can
delete a provider newly written by a concurrent request; change this to a
compare-and-delete: pass the expected provider identifier for this request
(e.g., the provider id on the current session object) and perform an atomic
check-and-delete in SessionManager.clearSessionProvider(expectedSessionId,
expectedProvider) (or implement a new method like
clearSessionProviderIfMatches). The implementation should read the current
provider and only delete if it equals the expected provider using an atomic
Redis operation (EVAL Lua script or WATCH/MULTI) so concurrent successful
updates are not removed.

Comment on lines +38 to +42
| "client_restriction_filtered" // Provider skipped due to client restriction (neutral, no circuit breaker)
| "client_abort" // 客户端主动断开连接(HTTP 499),首次尝试时清理 session 绑定,不计入熔断器
| "hedge_triggered" // 首字节超时触发 hedge 调度(并行发起下一个供应商连接)
| "hedge_winner" // hedge 竞速胜出(首个返回首字节的供应商)
| "hedge_loser_cancelled"; // hedge 竞速落败(被胜者取消的供应商连接)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

新增 reason 还没同步到现有的穷举判定里。

src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx:32-52src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx:36-50 还在按旧枚举做条件分支。现在这些值一旦进入 providerChainclient_abort / hedge_* 会落到默认分支,被显示成 pending 或直接不算“实际请求”,日志链路会失真。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/message.ts` around lines 38 - 42, The UI branches in LogicTraceTab
and provider-chain-popover still use the old exhaustive switch/if over
providerChain.reason and thus treat the new enum entries as default/pending;
update the conditional logic in
src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx
and src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx to
explicitly handle the new reason values ("client_restriction_filtered",
"client_abort", "hedge_triggered", "hedge_winner", "hedge_loser_cancelled"),
mapping each to the correct display state (e.g., client_abort => show
client-aborted state/not counted as provider failure,
client_restriction_filtered => neutral/filtered, hedge_triggered/winner/loser =>
mark as hedged/winner/loser accordingly) instead of falling through to default
so providerChain entries render accurate statuses.

@github-actions github-actions bot added the size/S Small PR (< 200 lines) label Mar 9, 2026
Comment on lines +1026 to +1027
if (session.sessionId && attemptCount === 1) {
const { SessionManager } = await import("@/lib/session-manager");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SessionManager is already statically imported at the top of the file (line 26), so the dynamic import here is redundant and adds unnecessary async overhead. Use the existing static import directly instead:

Suggested change
if (session.sessionId && attemptCount === 1) {
const { SessionManager } = await import("@/lib/session-manager");
if (session.sessionId && attemptCount === 1) {
await SessionManager.clearSessionProvider(session.sessionId);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 1026-1027

Comment:
`SessionManager` is already statically imported at the top of the file (line 26), so the dynamic import here is redundant and adds unnecessary async overhead. Use the existing static import directly instead:

```suggestion
            if (session.sessionId && attemptCount === 1) {
              await SessionManager.clearSessionProvider(session.sessionId);
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1025 to +1028
// 清理 session provider 绑定(仅在首次尝试时清理,避免清理已成功的绑定)
if (session.sessionId && attemptCount === 1) {
const { SessionManager } = await import("@/lib/session-manager");
await SessionManager.clearSessionProvider(session.sessionId);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant dynamic import of an already-available module.

SessionManager is already statically imported at line 26:

import { SessionManager } from "@/lib/session-manager";

The await import(...) call on line 1027 creates an unnecessary dynamic import of the same module, adding async overhead and obscuring the module-level dependency. Simply use the existing static import instead:

Suggested change
// 清理 session provider 绑定(仅在首次尝试时清理,避免清理已成功的绑定)
if (session.sessionId && attemptCount === 1) {
const { SessionManager } = await import("@/lib/session-manager");
await SessionManager.clearSessionProvider(session.sessionId);
if (session.sessionId && attemptCount === 1) {
await SessionManager.clearSessionProvider(session.sessionId);
logger.debug(
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 1025-1028

Comment:
Redundant dynamic import of an already-available module.

`SessionManager` is already statically imported at line 26:

```ts
import { SessionManager } from "@/lib/session-manager";
```

The `await import(...)` call on line 1027 creates an unnecessary dynamic import of the same module, adding async overhead and obscuring the module-level dependency. Simply use the existing static import instead:

```suggestion
            if (session.sessionId && attemptCount === 1) {
              await SessionManager.clearSessionProvider(session.sessionId);
              logger.debug(
```

How can I resolve this? If you propose a fix, please make it concise.

@github-actions github-actions bot added the size/XS Extra Small PR (< 50 lines) label Mar 9, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR improves HTTP 499 handling with conditional session clearing and adds infrastructure for future hedge scheduling. The core logic change is sound, but there's one issue to address.

PR Size: XS

  • Lines changed: 39
  • Files changed: 5

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Standards 0 1 1 0

High Priority Issues (Should Fix)

1. [GENERAL-STANDARD] Redundant dynamic import of SessionManager

  • File: src/app/v1/_lib/proxy/forwarder.ts:1027
  • Issue: The file already imports SessionManager statically at line 26, but the new code uses a dynamic import at line 1027. This is unnecessary and adds async overhead.

Suggested fix:

// Remove the dynamic import:
// const { SessionManager } = await import("@/lib/session-manager");

// Use the already-imported SessionManager directly:
await SessionManager.clearSessionProvider(session.sessionId);

Medium Priority Issues (Consider Fixing)

2. [GENERAL-i18n] Missing translations for 3 languages

  • File: messages/*/provider-chain.json
  • Issue: The PR adds translations for en and zh-CN, but per CLAUDE.md the project supports 5 languages (zh-CN, zh-TW, en, ja, ru). Missing: zh-TW, ja, ru.

Suggested fix: Add the 4 new keys to:

  • messages/zh-TW/provider-chain.json
  • messages/ja/provider-chain.json
  • messages/ru/provider-chain.json

Review Coverage

  • Logic and correctness - The conditional clearing logic (attemptCount === 1) correctly addresses the race condition
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean, error is properly logged
  • Type safety - New reason types properly synced across files
  • Documentation accuracy - Comments match code behavior
  • Test coverage - Existing tests cover the path
  • Code clarity - Good, clear comments explaining the fix

Positive Notes

  • The race condition fix (only clearing session on first attempt) is well-reasoned
  • New client_abort reason type improves observability vs generic system_error
  • Hedge reason types are properly documented and typed for future use
  • Debug logging added for session binding cleanup

Automated review by Claude AI

@ding113 ding113 closed this Mar 9, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Mar 9, 2026
"client_restriction_filtered": "Client Restricted"
"client_restriction_filtered": "Client Restricted",
"client_abort": "Client Aborted",
"hedge_triggered": "Hedge Triggered",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] [STANDARD-VIOLATION] New provider-chain.reasons keys missing in 3 supported locales

Evidence: messages/en/provider-chain.json:60-63 adds client_abort and hedge_* reason labels, but messages/ja/provider-chain.json, messages/ru/provider-chain.json, and messages/zh-TW/provider-chain.json do not define these keys.

Why this is a problem: CLAUDE.md: **i18n Required** - All user-facing strings must use i18n (5 languages supported). With src/i18n/request.ts:getMessageFallback returning ${namespace}.${key}, these locales will render fallback keys instead of human-readable labels.

Suggested fix (add under reasons in each locale file):

messages/ja/provider-chain.json

"client_restriction_filtered": "クライアント制限",
"client_abort": "クライアント中断",
"hedge_triggered": "ヘッジ起動",
"hedge_winner": "ヘッジ勝者",
"hedge_loser_cancelled": "ヘッジ敗者キャンセル"

messages/ru/provider-chain.json

"client_restriction_filtered": "Клиент ограничен",
"client_abort": "Запрос прерван клиентом",
"hedge_triggered": "Запущен хедж",
"hedge_winner": "Победитель хеджа",
"hedge_loser_cancelled": "Проигравший хедж отменен"

messages/zh-TW/provider-chain.json

"client_restriction_filtered": "客戶端受限",
"client_abort": "客戶端中斷",
"hedge_triggered": "首字節超時觸發 Hedge",
"hedge_winner": "Hedge 競速勝出",
"hedge_loser_cancelled": "Hedge 競速落敗"


// 清理 session provider 绑定(仅在首次尝试时清理,避免清理已成功的绑定)
if (session.sessionId && attemptCount === 1) {
const { SessionManager } = await import("@/lib/session-manager");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] [TEST-MISSING-CRITICAL] New client-abort session-binding behavior has no unit test coverage

Evidence: src/app/v1/_lib/proxy/forwarder.ts:1026-1036 adds a new branch that clears the session provider binding only when attemptCount === 1:

if (session.sessionId && attemptCount === 1) {
  await SessionManager.clearSessionProvider(session.sessionId);
}

Why this is a problem: CLAUDE.md: **Test Coverage** - All new features must have unit test coverage of at least 80%. This is subtle retry-dependent behavior; without a test, it is easy to regress (e.g. future refactors around retries/hedging).

Suggested fix: add focused unit tests asserting clearSessionProvider is called only on attempt 1. Example test file:

import { beforeEach, describe, expect, test, vi } from "vitest";
import { resolveEndpointPolicy } from "@/app/v1/_lib/proxy/endpoint-policy";

const mocks = vi.hoisted(() => {
  return {
    categorizeErrorAsync: vi.fn(),
    getCircuitState: vi.fn(() => "closed"),
    getProviderHealthInfo: vi.fn(async () => ({
      health: { failureCount: 0 },
      config: { failureThreshold: 3 },
    })),
    recordFailure: vi.fn(async () => {}),
    recordSuccess: vi.fn(),
  };
});

vi.mock("@/lib/logger", () => ({
  logger: {
    debug: vi.fn(),
    info: vi.fn(),
    warn: vi.fn(),
    trace: vi.fn(),
    error: vi.fn(),
    fatal: vi.fn(),
  },
}));

vi.mock("@/lib/circuit-breaker", () => ({
  getCircuitState: mocks.getCircuitState,
  getProviderHealthInfo: mocks.getProviderHealthInfo,
  recordFailure: mocks.recordFailure,
  recordSuccess: mocks.recordSuccess,
}));

vi.mock("@/app/v1/_lib/proxy/errors", async (importOriginal) => {
  const actual = await importOriginal<typeof import("@/app/v1/_lib/proxy/errors")>();
  return {
    ...actual,
    categorizeErrorAsync: mocks.categorizeErrorAsync,
  };
});

import { ProxyForwarder } from "@/app/v1/_lib/proxy/forwarder";
import { ErrorCategory, ProxyError } from "@/app/v1/_lib/proxy/errors";
import { ProxySession } from "@/app/v1/_lib/proxy/session";
import { SessionManager } from "@/lib/session-manager";
import type { Provider } from "@/types/provider";

function createSession(): ProxySession {
  const headers = new Headers();
  const session = Object.create(ProxySession.prototype);

  Object.assign(session, {
    startTime: Date.now(),
    method: "POST",
    requestUrl: new URL("https://example.com/v1/messages"),
    headers,
    originalHeaders: new Headers(headers),
    headerLog: JSON.stringify(Object.fromEntries(headers.entries())),
    request: {
      model: "claude-test",
      log: "(test)",
      message: { model: "claude-test", messages: [{ role: "user", content: "hi" }] },
    },
    userAgent: null,
    context: null,
    clientAbortSignal: null,
    userName: "test-user",
    authState: { success: true, user: null, key: null, apiKey: null },
    provider: null,
    messageContext: null,
    sessionId: "sess-1",
    requestSequence: 1,
    originalFormat: "claude",
    providerType: null,
    originalModelName: null,
    originalUrlPathname: null,
    providerChain: [],
    cacheTtlResolved: null,
    context1mApplied: false,
    specialSettings: [],
    cachedPriceData: undefined,
    cachedBillingModelSource: undefined,
    endpointPolicy: resolveEndpointPolicy("/v1/messages"),
    isHeaderModified: () => false,
  });

  return session as ProxySession;
}

function createProvider(overrides: Partial<Provider> = {}): Provider {
  return {
    id: 1,
    name: "p1",
    providerType: "claude",
    url: "https://provider.example.com",
    key: "k",
    preserveClientIp: false,
    priority: 0,
    providerVendorId: null,
    maxRetryAttempts: 2,
    ...overrides,
  } as unknown as Provider;
}

describe("ProxyForwarder - client abort session binding", () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

  test("clears session binding when client aborts on attempt 1", async () => {
    const clearSpy = vi
      .spyOn(SessionManager, "clearSessionProvider")
      .mockResolvedValue(undefined);

    mocks.categorizeErrorAsync.mockResolvedValueOnce(ErrorCategory.CLIENT_ABORT);

    const session = createSession();
    session.setProvider(createProvider({ maxRetryAttempts: 1 }));

    vi.spyOn(ProxyForwarder as any, "doForward").mockRejectedValueOnce(
      new ProxyError("aborted", 499)
    );

    await expect(ProxyForwarder.send(session)).rejects.toBeInstanceOf(ProxyError);
    expect(clearSpy).toHaveBeenCalledWith("sess-1");
  });

  test("does not clear session binding when client aborts on retry attempt", async () => {
    vi.useFakeTimers();
    const clearSpy = vi
      .spyOn(SessionManager, "clearSessionProvider")
      .mockResolvedValue(undefined);

    mocks.categorizeErrorAsync
      .mockResolvedValueOnce(ErrorCategory.PROVIDER_ERROR)
      .mockResolvedValueOnce(ErrorCategory.CLIENT_ABORT);

    const session = createSession();
    session.setProvider(createProvider({ maxRetryAttempts: 2 }));

    vi.spyOn(ProxyForwarder as any, "doForward")
      .mockRejectedValueOnce(new ProxyError("upstream failed", 500))
      .mockRejectedValueOnce(new ProxyError("aborted", 499));

    const promise = ProxyForwarder.send(session);
    await vi.runAllTimersAsync();

    await expect(promise).rejects.toBeInstanceOf(ProxyError);
    expect(clearSpy).not.toHaveBeenCalled();

    vi.useRealTimers();
  });
});

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR is small but changes proxy behavior and adds new user-visible provider-chain reason labels. The diff currently leaves 3 supported locales without translations for the new reason keys, and it adds no unit tests for the new attempt-gated session-binding clear.

PR Size: S

  • Lines changed: 39
  • Files changed: 5

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 1 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 1 0 0
Simplification 0 0 0 0

Critical Issues (Must Fix)

None.

High Priority Issues (Should Fix)

  • messages/en/provider-chain.json:60 - New reason keys are missing in messages/ja/provider-chain.json, messages/ru/provider-chain.json, messages/zh-TW/provider-chain.json (i18n fallback will show raw keys).
  • src/app/v1/_lib/proxy/forwarder.ts:1026 - Missing unit tests for attemptCount === 1 gating of SessionManager.clearSessionProvider on client abort.

Review Coverage

  • Logic and correctness
  • Security (OWASP Top 10)
  • Error handling
  • Type safety
  • Documentation accuracy
  • Test coverage
  • Code clarity

Automated review by Codex AI

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied PR size label size/S, left 2 inline review comments (i18n: missing client_abort/hedge_* keys in ja/ru/zh-TW; tests: missing unit coverage for the new attemptCount === 1 session-binding clear), and submitted the required “Code Review Summary” via gh pr review 893 --comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:i18n area:session bug Something isn't working size/S Small PR (< 200 lines) size/XS Extra Small PR (< 50 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant