fix(proxy): improve 499 handling and add hedge reason types#893
fix(proxy): improve 499 handling and add hedge reason types#893
Conversation
- Add client_abort reason type with conditional session clearing - Only clear session binding on first attempt to avoid race condition - Add hedge_triggered/hedge_winner/hedge_loser_cancelled reason types - Add detailed documentation for all new reason types - Update i18n translations (zh-CN, en) for new reasons Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the handling of client aborts (HTTP 499) by introducing a more robust session clearing mechanism that avoids race conditions. It also lays essential groundwork for future hedge scheduling capabilities by defining new reason types, enhancing the system's ability to manage parallel provider attempts and their outcomes. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 Walkthrough工作流程概览此变更为提供者链决策系统添加四个新的原因代码(client_abort、hedge_triggered、hedge_winner、hedge_loser_cancelled),包括翻译资源、类型定义和会话清理逻辑。同时在客户端中止时实现了会话提供者的动态清理功能。 变更
代码审查工作量评估🎯 2 (Simple) | ⏱️ ~12 分钟 可能相关的 PR
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
Code Review
This pull request improves client abort (HTTP 499) handling by fixing a race condition and introduces new reason types for future hedge scheduling support. The changes are well-structured and address the described issues. I've identified one area for improvement regarding a redundant dynamic import, which can be simplified by using the existing static import for better performance and maintainability.
| const { SessionManager } = await import("@/lib/session-manager"); | ||
| await SessionManager.clearSessionProvider(session.sessionId); |
There was a problem hiding this comment.
SessionManager is already imported statically at the top of this file (line 26). Using the existing static import is more efficient than this dynamic import and avoids redundant module loading.
| const { SessionManager } = await import("@/lib/session-manager"); | |
| await SessionManager.clearSessionProvider(session.sessionId); | |
| await SessionManager.clearSessionProvider(session.sessionId); |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/app/v1/_lib/proxy/session.ts (1)
441-460: 这里的reason建议直接复用ProviderChainItem["reason"]。现在这份字面量 union 又和
src/types/message.ts手工同步了一次;下次再加/删 reason 时,很容易只改一边。这里已经引入了ProviderChainItem,直接复用类型会稳很多。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/v1/_lib/proxy/session.ts` around lines 441 - 460, The literal union type for reason should be replaced with ProviderChainItem["reason"] to avoid duplicate manual syncing; update the declaration of reason in this module to use ProviderChainItem["reason"] instead of the long string union, and ensure ProviderChainItem is imported (or referenced) in this file so the type resolves correctly (keep existing comments as needed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/v1/_lib/proxy/forwarder.ts`:
- Around line 1025-1028: The current unconditional call to
SessionManager.clearSessionProvider(session.sessionId) can delete a provider
newly written by a concurrent request; change this to a compare-and-delete: pass
the expected provider identifier for this request (e.g., the provider id on the
current session object) and perform an atomic check-and-delete in
SessionManager.clearSessionProvider(expectedSessionId, expectedProvider) (or
implement a new method like clearSessionProviderIfMatches). The implementation
should read the current provider and only delete if it equals the expected
provider using an atomic Redis operation (EVAL Lua script or WATCH/MULTI) so
concurrent successful updates are not removed.
In `@src/types/message.ts`:
- Around line 38-42: The UI branches in LogicTraceTab and provider-chain-popover
still use the old exhaustive switch/if over providerChain.reason and thus treat
the new enum entries as default/pending; update the conditional logic in
src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx
and src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx to
explicitly handle the new reason values ("client_restriction_filtered",
"client_abort", "hedge_triggered", "hedge_winner", "hedge_loser_cancelled"),
mapping each to the correct display state (e.g., client_abort => show
client-aborted state/not counted as provider failure,
client_restriction_filtered => neutral/filtered, hedge_triggered/winner/loser =>
mark as hedged/winner/loser accordingly) instead of falling through to default
so providerChain entries render accurate statuses.
---
Nitpick comments:
In `@src/app/v1/_lib/proxy/session.ts`:
- Around line 441-460: The literal union type for reason should be replaced with
ProviderChainItem["reason"] to avoid duplicate manual syncing; update the
declaration of reason in this module to use ProviderChainItem["reason"] instead
of the long string union, and ensure ProviderChainItem is imported (or
referenced) in this file so the type resolves correctly (keep existing comments
as needed).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: deee3707-bd86-48fc-ae0d-58b63cb9f827
📒 Files selected for processing (5)
messages/en/provider-chain.jsonmessages/zh-CN/provider-chain.jsonsrc/app/v1/_lib/proxy/forwarder.tssrc/app/v1/_lib/proxy/session.tssrc/types/message.ts
| // 清理 session provider 绑定(仅在首次尝试时清理,避免清理已成功的绑定) | ||
| if (session.sessionId && attemptCount === 1) { | ||
| const { SessionManager } = await import("@/lib/session-manager"); | ||
| await SessionManager.clearSessionProvider(session.sessionId); |
There was a problem hiding this comment.
这里的清理仍然会误删并发请求刚写入的新绑定。
src/lib/session-manager.ts:576-586 里的 clearSessionProvider() 是无条件 DEL session:${sessionId}:provider。attemptCount === 1 只避免了“同一转发链里晚到成功响应”的竞态;如果同一 sessionId 下另一个并发请求已经成功更新了绑定,这里仍会把那个新绑定删掉。更稳妥的做法是改成 compare-and-delete,至少校验当前绑定仍然是这次请求对应的 provider。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/v1/_lib/proxy/forwarder.ts` around lines 1025 - 1028, The current
unconditional call to SessionManager.clearSessionProvider(session.sessionId) can
delete a provider newly written by a concurrent request; change this to a
compare-and-delete: pass the expected provider identifier for this request
(e.g., the provider id on the current session object) and perform an atomic
check-and-delete in SessionManager.clearSessionProvider(expectedSessionId,
expectedProvider) (or implement a new method like
clearSessionProviderIfMatches). The implementation should read the current
provider and only delete if it equals the expected provider using an atomic
Redis operation (EVAL Lua script or WATCH/MULTI) so concurrent successful
updates are not removed.
| | "client_restriction_filtered" // Provider skipped due to client restriction (neutral, no circuit breaker) | ||
| | "client_abort" // 客户端主动断开连接(HTTP 499),首次尝试时清理 session 绑定,不计入熔断器 | ||
| | "hedge_triggered" // 首字节超时触发 hedge 调度(并行发起下一个供应商连接) | ||
| | "hedge_winner" // hedge 竞速胜出(首个返回首字节的供应商) | ||
| | "hedge_loser_cancelled"; // hedge 竞速落败(被胜者取消的供应商连接) |
There was a problem hiding this comment.
新增 reason 还没同步到现有的穷举判定里。
src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx:32-52 和 src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx:36-50 还在按旧枚举做条件分支。现在这些值一旦进入 providerChain,client_abort / hedge_* 会落到默认分支,被显示成 pending 或直接不算“实际请求”,日志链路会失真。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/types/message.ts` around lines 38 - 42, The UI branches in LogicTraceTab
and provider-chain-popover still use the old exhaustive switch/if over
providerChain.reason and thus treat the new enum entries as default/pending;
update the conditional logic in
src/app/[locale]/dashboard/logs/_components/error-details-dialog/components/LogicTraceTab.tsx
and src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx to
explicitly handle the new reason values ("client_restriction_filtered",
"client_abort", "hedge_triggered", "hedge_winner", "hedge_loser_cancelled"),
mapping each to the correct display state (e.g., client_abort => show
client-aborted state/not counted as provider failure,
client_restriction_filtered => neutral/filtered, hedge_triggered/winner/loser =>
mark as hedged/winner/loser accordingly) instead of falling through to default
so providerChain entries render accurate statuses.
| if (session.sessionId && attemptCount === 1) { | ||
| const { SessionManager } = await import("@/lib/session-manager"); |
There was a problem hiding this comment.
SessionManager is already statically imported at the top of the file (line 26), so the dynamic import here is redundant and adds unnecessary async overhead. Use the existing static import directly instead:
| if (session.sessionId && attemptCount === 1) { | |
| const { SessionManager } = await import("@/lib/session-manager"); | |
| if (session.sessionId && attemptCount === 1) { | |
| await SessionManager.clearSessionProvider(session.sessionId); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 1026-1027
Comment:
`SessionManager` is already statically imported at the top of the file (line 26), so the dynamic import here is redundant and adds unnecessary async overhead. Use the existing static import directly instead:
```suggestion
if (session.sessionId && attemptCount === 1) {
await SessionManager.clearSessionProvider(session.sessionId);
```
How can I resolve this? If you propose a fix, please make it concise.| // 清理 session provider 绑定(仅在首次尝试时清理,避免清理已成功的绑定) | ||
| if (session.sessionId && attemptCount === 1) { | ||
| const { SessionManager } = await import("@/lib/session-manager"); | ||
| await SessionManager.clearSessionProvider(session.sessionId); |
There was a problem hiding this comment.
Redundant dynamic import of an already-available module.
SessionManager is already statically imported at line 26:
import { SessionManager } from "@/lib/session-manager";The await import(...) call on line 1027 creates an unnecessary dynamic import of the same module, adding async overhead and obscuring the module-level dependency. Simply use the existing static import instead:
| // 清理 session provider 绑定(仅在首次尝试时清理,避免清理已成功的绑定) | |
| if (session.sessionId && attemptCount === 1) { | |
| const { SessionManager } = await import("@/lib/session-manager"); | |
| await SessionManager.clearSessionProvider(session.sessionId); | |
| if (session.sessionId && attemptCount === 1) { | |
| await SessionManager.clearSessionProvider(session.sessionId); | |
| logger.debug( |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 1025-1028
Comment:
Redundant dynamic import of an already-available module.
`SessionManager` is already statically imported at line 26:
```ts
import { SessionManager } from "@/lib/session-manager";
```
The `await import(...)` call on line 1027 creates an unnecessary dynamic import of the same module, adding async overhead and obscuring the module-level dependency. Simply use the existing static import instead:
```suggestion
if (session.sessionId && attemptCount === 1) {
await SessionManager.clearSessionProvider(session.sessionId);
logger.debug(
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Code Review Summary
This PR improves HTTP 499 handling with conditional session clearing and adds infrastructure for future hedge scheduling. The core logic change is sound, but there's one issue to address.
PR Size: XS
- Lines changed: 39
- Files changed: 5
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Standards | 0 | 1 | 1 | 0 |
High Priority Issues (Should Fix)
1. [GENERAL-STANDARD] Redundant dynamic import of SessionManager
- File:
src/app/v1/_lib/proxy/forwarder.ts:1027 - Issue: The file already imports
SessionManagerstatically at line 26, but the new code uses a dynamic import at line 1027. This is unnecessary and adds async overhead.
Suggested fix:
// Remove the dynamic import:
// const { SessionManager } = await import("@/lib/session-manager");
// Use the already-imported SessionManager directly:
await SessionManager.clearSessionProvider(session.sessionId);Medium Priority Issues (Consider Fixing)
2. [GENERAL-i18n] Missing translations for 3 languages
- File:
messages/*/provider-chain.json - Issue: The PR adds translations for
enandzh-CN, but per CLAUDE.md the project supports 5 languages (zh-CN, zh-TW, en, ja, ru). Missing: zh-TW, ja, ru.
Suggested fix: Add the 4 new keys to:
messages/zh-TW/provider-chain.jsonmessages/ja/provider-chain.jsonmessages/ru/provider-chain.json
Review Coverage
- Logic and correctness - The conditional clearing logic (attemptCount === 1) correctly addresses the race condition
- Security (OWASP Top 10) - Clean
- Error handling - Clean, error is properly logged
- Type safety - New reason types properly synced across files
- Documentation accuracy - Comments match code behavior
- Test coverage - Existing tests cover the path
- Code clarity - Good, clear comments explaining the fix
Positive Notes
- The race condition fix (only clearing session on first attempt) is well-reasoned
- New
client_abortreason type improves observability vs genericsystem_error - Hedge reason types are properly documented and typed for future use
- Debug logging added for session binding cleanup
Automated review by Claude AI
| "client_restriction_filtered": "Client Restricted" | ||
| "client_restriction_filtered": "Client Restricted", | ||
| "client_abort": "Client Aborted", | ||
| "hedge_triggered": "Hedge Triggered", |
There was a problem hiding this comment.
[HIGH] [STANDARD-VIOLATION] New provider-chain.reasons keys missing in 3 supported locales
Evidence: messages/en/provider-chain.json:60-63 adds client_abort and hedge_* reason labels, but messages/ja/provider-chain.json, messages/ru/provider-chain.json, and messages/zh-TW/provider-chain.json do not define these keys.
Why this is a problem: CLAUDE.md: **i18n Required** - All user-facing strings must use i18n (5 languages supported). With src/i18n/request.ts:getMessageFallback returning ${namespace}.${key}, these locales will render fallback keys instead of human-readable labels.
Suggested fix (add under reasons in each locale file):
messages/ja/provider-chain.json
"client_restriction_filtered": "クライアント制限",
"client_abort": "クライアント中断",
"hedge_triggered": "ヘッジ起動",
"hedge_winner": "ヘッジ勝者",
"hedge_loser_cancelled": "ヘッジ敗者キャンセル"messages/ru/provider-chain.json
"client_restriction_filtered": "Клиент ограничен",
"client_abort": "Запрос прерван клиентом",
"hedge_triggered": "Запущен хедж",
"hedge_winner": "Победитель хеджа",
"hedge_loser_cancelled": "Проигравший хедж отменен"messages/zh-TW/provider-chain.json
"client_restriction_filtered": "客戶端受限",
"client_abort": "客戶端中斷",
"hedge_triggered": "首字節超時觸發 Hedge",
"hedge_winner": "Hedge 競速勝出",
"hedge_loser_cancelled": "Hedge 競速落敗"|
|
||
| // 清理 session provider 绑定(仅在首次尝试时清理,避免清理已成功的绑定) | ||
| if (session.sessionId && attemptCount === 1) { | ||
| const { SessionManager } = await import("@/lib/session-manager"); |
There was a problem hiding this comment.
[HIGH] [TEST-MISSING-CRITICAL] New client-abort session-binding behavior has no unit test coverage
Evidence: src/app/v1/_lib/proxy/forwarder.ts:1026-1036 adds a new branch that clears the session provider binding only when attemptCount === 1:
if (session.sessionId && attemptCount === 1) {
await SessionManager.clearSessionProvider(session.sessionId);
}Why this is a problem: CLAUDE.md: **Test Coverage** - All new features must have unit test coverage of at least 80%. This is subtle retry-dependent behavior; without a test, it is easy to regress (e.g. future refactors around retries/hedging).
Suggested fix: add focused unit tests asserting clearSessionProvider is called only on attempt 1. Example test file:
import { beforeEach, describe, expect, test, vi } from "vitest";
import { resolveEndpointPolicy } from "@/app/v1/_lib/proxy/endpoint-policy";
const mocks = vi.hoisted(() => {
return {
categorizeErrorAsync: vi.fn(),
getCircuitState: vi.fn(() => "closed"),
getProviderHealthInfo: vi.fn(async () => ({
health: { failureCount: 0 },
config: { failureThreshold: 3 },
})),
recordFailure: vi.fn(async () => {}),
recordSuccess: vi.fn(),
};
});
vi.mock("@/lib/logger", () => ({
logger: {
debug: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
trace: vi.fn(),
error: vi.fn(),
fatal: vi.fn(),
},
}));
vi.mock("@/lib/circuit-breaker", () => ({
getCircuitState: mocks.getCircuitState,
getProviderHealthInfo: mocks.getProviderHealthInfo,
recordFailure: mocks.recordFailure,
recordSuccess: mocks.recordSuccess,
}));
vi.mock("@/app/v1/_lib/proxy/errors", async (importOriginal) => {
const actual = await importOriginal<typeof import("@/app/v1/_lib/proxy/errors")>();
return {
...actual,
categorizeErrorAsync: mocks.categorizeErrorAsync,
};
});
import { ProxyForwarder } from "@/app/v1/_lib/proxy/forwarder";
import { ErrorCategory, ProxyError } from "@/app/v1/_lib/proxy/errors";
import { ProxySession } from "@/app/v1/_lib/proxy/session";
import { SessionManager } from "@/lib/session-manager";
import type { Provider } from "@/types/provider";
function createSession(): ProxySession {
const headers = new Headers();
const session = Object.create(ProxySession.prototype);
Object.assign(session, {
startTime: Date.now(),
method: "POST",
requestUrl: new URL("https://example.com/v1/messages"),
headers,
originalHeaders: new Headers(headers),
headerLog: JSON.stringify(Object.fromEntries(headers.entries())),
request: {
model: "claude-test",
log: "(test)",
message: { model: "claude-test", messages: [{ role: "user", content: "hi" }] },
},
userAgent: null,
context: null,
clientAbortSignal: null,
userName: "test-user",
authState: { success: true, user: null, key: null, apiKey: null },
provider: null,
messageContext: null,
sessionId: "sess-1",
requestSequence: 1,
originalFormat: "claude",
providerType: null,
originalModelName: null,
originalUrlPathname: null,
providerChain: [],
cacheTtlResolved: null,
context1mApplied: false,
specialSettings: [],
cachedPriceData: undefined,
cachedBillingModelSource: undefined,
endpointPolicy: resolveEndpointPolicy("/v1/messages"),
isHeaderModified: () => false,
});
return session as ProxySession;
}
function createProvider(overrides: Partial<Provider> = {}): Provider {
return {
id: 1,
name: "p1",
providerType: "claude",
url: "https://provider.example.com",
key: "k",
preserveClientIp: false,
priority: 0,
providerVendorId: null,
maxRetryAttempts: 2,
...overrides,
} as unknown as Provider;
}
describe("ProxyForwarder - client abort session binding", () => {
beforeEach(() => {
vi.clearAllMocks();
});
test("clears session binding when client aborts on attempt 1", async () => {
const clearSpy = vi
.spyOn(SessionManager, "clearSessionProvider")
.mockResolvedValue(undefined);
mocks.categorizeErrorAsync.mockResolvedValueOnce(ErrorCategory.CLIENT_ABORT);
const session = createSession();
session.setProvider(createProvider({ maxRetryAttempts: 1 }));
vi.spyOn(ProxyForwarder as any, "doForward").mockRejectedValueOnce(
new ProxyError("aborted", 499)
);
await expect(ProxyForwarder.send(session)).rejects.toBeInstanceOf(ProxyError);
expect(clearSpy).toHaveBeenCalledWith("sess-1");
});
test("does not clear session binding when client aborts on retry attempt", async () => {
vi.useFakeTimers();
const clearSpy = vi
.spyOn(SessionManager, "clearSessionProvider")
.mockResolvedValue(undefined);
mocks.categorizeErrorAsync
.mockResolvedValueOnce(ErrorCategory.PROVIDER_ERROR)
.mockResolvedValueOnce(ErrorCategory.CLIENT_ABORT);
const session = createSession();
session.setProvider(createProvider({ maxRetryAttempts: 2 }));
vi.spyOn(ProxyForwarder as any, "doForward")
.mockRejectedValueOnce(new ProxyError("upstream failed", 500))
.mockRejectedValueOnce(new ProxyError("aborted", 499));
const promise = ProxyForwarder.send(session);
await vi.runAllTimersAsync();
await expect(promise).rejects.toBeInstanceOf(ProxyError);
expect(clearSpy).not.toHaveBeenCalled();
vi.useRealTimers();
});
});There was a problem hiding this comment.
Code Review Summary
This PR is small but changes proxy behavior and adds new user-visible provider-chain reason labels. The diff currently leaves 3 supported locales without translations for the new reason keys, and it adds no unit tests for the new attempt-gated session-binding clear.
PR Size: S
- Lines changed: 39
- Files changed: 5
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 1 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 1 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
Critical Issues (Must Fix)
None.
High Priority Issues (Should Fix)
messages/en/provider-chain.json:60- New reason keys are missing inmessages/ja/provider-chain.json,messages/ru/provider-chain.json,messages/zh-TW/provider-chain.json(i18n fallback will show raw keys).src/app/v1/_lib/proxy/forwarder.ts:1026- Missing unit tests forattemptCount === 1gating ofSessionManager.clearSessionProvideron client abort.
Review Coverage
- Logic and correctness
- Security (OWASP Top 10)
- Error handling
- Type safety
- Documentation accuracy
- Test coverage
- Code clarity
Automated review by Codex AI
There was a problem hiding this comment.
Applied PR size label size/S, left 2 inline review comments (i18n: missing client_abort/hedge_* keys in ja/ru/zh-TW; tests: missing unit coverage for the new attemptCount === 1 session-binding clear), and submitted the required “Code Review Summary” via gh pr review 893 --comment.
Summary
Improves client abort (HTTP 499) handling and adds infrastructure for future hedge scheduling support.
Changes
1. Enhanced 499 Handling
client_abortto distinguish from genericsystem_error2. Hedge Reason Types (Infrastructure)
hedge_triggered: First-byte timeout triggers hedge (parallel provider attempt)hedge_winner: Hedge race winner (first provider to return first byte)hedge_loser_cancelled: Hedge race loser (cancelled by winner)Technical Details
Race Condition Fix
Problem: Original implementation unconditionally cleared session binding on 499, which could clear a successful binding if the provider responded after client disconnect.
Solution: Only clear binding on first attempt (
attemptCount === 1), preserving session stickiness for retry scenarios.Type System Updates
src/types/message.ts: Added 4 newProviderChainReasontypes with detailed commentssrc/app/v1/_lib/proxy/session.ts: Synced reason types with message.tsmessages/*/provider-chain.json: Added translations for all new reasonsTesting
Related
Checklist
Greptile Summary
This PR improves HTTP 499 (client abort) handling in the proxy forwarder and lays infrastructure groundwork for future hedge scheduling support. The core behavioral change makes session provider binding cleanup conditional on
attemptCount === 1, which correctly avoids clearing a binding that was successfully established by a prior attempt within the same retry loop.Key changes:
client_abortreplaces the genericsystem_errorfor HTTP 499 entries in the provider decision chain, improving observability.hedge_triggered,hedge_winner, andhedge_loser_cancelledare pre-defined for future hedge scheduling but are not yet wired to any logic.SessionManagerthat is already statically available, adding unnecessary async overhead.All four new reason types are properly synced across types, session, and i18n files (English and Chinese translations).
Confidence Score: 4/5
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[Request arrives] --> B{Session bound?} B -- Yes --> C[Attempt 1: session_reuse provider] B -- No --> D[Attempt 1: fresh provider selection] C --> E{Result?} D --> E E -- Success --> F[Record chain: request_success / retry_success] E -- 499 Client Abort --> G{attemptCount === 1?} G -- Yes --> H[clearSessionProvider\nRecord chain: client_abort\nThrow error] G -- No --> I[Keep existing session binding\nRecord chain: client_abort\nThrow error] E -- Retryable error --> J[Increment attemptCount\nSelect next provider] J --> ELast reviewed commit: 8c9a8f5