fix(ccusage): avoid RangeError when parsing large transcript JSONL files#875
fix(ccusage): avoid RangeError when parsing large transcript JSONL files#875MumuTW wants to merge 6 commits intoryoppippi:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces full-file JSONL reads with line-by-line streaming in ccusage to skip large Changes
Sequence Diagram(s)sequenceDiagram
participant FS as File System
participant Stream as Stream Reader
participant Parser as Per-line Parser/Validator
participant Aggregator as Usage Aggregator
participant Pricing as PricingFetcher
FS->>Stream: open JSONL (createReadStream)
Stream->>Parser: emit next line
Parser-->>Stream: parsed object or error
alt first-line indicates file-history-snapshot
Parser->>Aggregator: signal skip -> return null
else assistant usage line found
Parser->>Aggregator: update latestUsage (inputTokens, cacheTokens)
Aggregator->>Stream: continue reading
end
Stream->>Aggregator: EOF
Aggregator->>Pricing: request contextLimit (modelId)
Pricing-->>Aggregator: contextLimit or failure
Aggregator->>Caller: compute percentage or return null
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/ccusage/src/data-loader.ts (1)
1295-1314: Type assignment may not fully narrowinput_tokensto required.The assignment at line 1309 assigns
obj.message.usage(whereinput_tokensis optional pertranscriptUsageSchema) tolatestUsage(whereinput_tokensis required). While the check at line 1307 ensuresinput_tokens != nullat runtime, TypeScript's property narrowing may not fully narrow the parent object type.Consider using a type assertion or explicit object construction to ensure type safety:
💡 Suggested refactor for explicit type construction
if ( obj.type === 'assistant' && obj.message != null && obj.message.usage != null && obj.message.usage.input_tokens != null ) { - latestUsage = obj.message.usage; + latestUsage = { + input_tokens: obj.message.usage.input_tokens, + cache_creation_input_tokens: obj.message.usage.cache_creation_input_tokens, + cache_read_input_tokens: obj.message.usage.cache_read_input_tokens, + }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ccusage/src/data-loader.ts` around lines 1295 - 1314, The assignment of obj.message.usage to latestUsage can leave TypeScript unconvinced that input_tokens is present because transcriptUsageSchema marks it optional; to fix, explicitly construct or cast a value with the required shape before assigning to latestUsage — e.g., after the runtime check (obj.message.usage != null && obj.message.usage.input_tokens != null) create a new object with the needed properties (or use a type assertion to the required type) and assign that to latestUsage; update the logic around transcriptMessageSchema, transcriptUsageSchema, obj, input_tokens, and latestUsage in the try block so the compiler sees a value that definitely satisfies latestUsage's required fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1295-1314: The assignment of obj.message.usage to latestUsage can
leave TypeScript unconvinced that input_tokens is present because
transcriptUsageSchema marks it optional; to fix, explicitly construct or cast a
value with the required shape before assigning to latestUsage — e.g., after the
runtime check (obj.message.usage != null && obj.message.usage.input_tokens !=
null) create a new object with the needed properties (or use a type assertion to
the required type) and assign that to latestUsage; update the logic around
transcriptMessageSchema, transcriptUsageSchema, obj, input_tokens, and
latestUsage in the try block so the compiler sees a value that definitely
satisfies latestUsage's required fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3f65d700-2b4a-40b7-bd2d-fc00166209e9
📒 Files selected for processing (1)
apps/ccusage/src/data-loader.ts
|
Follow-up for #829: silenced logger output in JSON mode for ccusage-opencode.\n\nWhat changed:\n- Set when is active in:\n - \n - \n - \n - \n\nValidation:\n- vinext | WARN The field "pnpm.peerDependencyRules" was found in /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/vinext/package.json. This will not take effect. You should configure "pnpm.peerDependencyRules" at the root of the workspace instead.
RUN v4.0.15 /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/ccusage/apps/opencode ✓ src/data-loader.ts (2 tests) 4ms Test Files 2 passed (2) |
|
Follow-up for #829: silenced logger output in JSON mode for What changed:
Validation:
Commit: |
|
thanks! lmc |
@ccusage/amp
ccusage
@ccusage/codex
@ccusage/mcp
@ccusage/opencode
@ccusage/pi
commit: |
…ement The latestUsage variable requires input_tokens as a non-optional number, but obj.message.usage has it as optional. Explicitly construct the object after the null check so TypeScript can see the narrowed type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1269-1288: The current fast-path checks the first non-empty line
via readline (createInterface) which forces Node to buffer the entire line and
crashes on huge single-line records; fix by reading a bounded prefix from the
file before creating the readline reader: open transcriptPath with fs (e.g.,
fs.open + filehandle.read or createReadStream with { start: 0, end: N-1 }), read
a small prefix (e.g., 4 KiB), trim leading whitespace, attempt to parse only
that prefix (or regex-extract the initial {"type":...} token) to detect if type
=== "file-history-snapshot", and if so log via logger.debug and return null;
otherwise close the temp handle/stream and then create the original
createReadStream + createInterface and continue as before. Ensure you properly
close file handles/streams (or destroy the temp stream) and preserve the
existing variables firstNonEmptyLineSeen and the rest of the processing flow.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fc0d4887-4b7a-4c69-9bb9-34c1e89f1600
📒 Files selected for processing (1)
apps/ccusage/src/data-loader.ts
…tion Read only the first 4 KiB of the file to detect file-history-snapshot type instead of using readline, which buffers the entire first line and crashes on huge single-line records (e.g. 734 MB). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
apps/ccusage/src/data-loader.ts (1)
1299-1320: Keep the new transcript path in the repo’sResultstyle.This adds fresh
try/catchJSON parsing plus repeatedResult.isSuccess(...)checks. Switching the throwable parse toResult.try()and branching onResult.isFailure(contextLimitResult)would match the project’s byethrow conventions and keep the happy path flatter. As per coding guidelines, "Prefer@praha/byethrowResult type over traditional try-catch for functional error handling", "UseResult.try()for wrapping operations that may throw (JSON parsing, etc.)", and "UseResult.isFailure()for checking errors (more readable than!Result.isSuccess())".Also applies to: 1342-1352
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1267-1284: The fast-path currently assumes the first 4KiB begins
with {"type":...} and misses snapshots when "type" is not the first property;
change the probe to extract the first line from prefixBuf (find first newline
CR/LF within PREFIX_SIZE), parse that first-line substring as JSON (safe
try/catch) and read its top-level "type" property (instead of regex anchored to
the start) to detect "file-history-snapshot" and short-circuit (symbols:
PREFIX_SIZE, prefixBuf, readSync, transcriptPath, typeMatch/logger.debug); if no
newline is present in the prefix keep the existing fallback to readline; add a
regression test that writes a snapshot line where "type" is not the first field
to ensure detection still works.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2c75f78d-d9c4-4ca0-9955-a44ec79acec1
📒 Files selected for processing (1)
apps/ccusage/src/data-loader.ts
The fast-path regex previously required "type" to be the first key in the JSON object. If a serializer placed another field first (e.g. "version"), the snapshot would slip through to readline and re-trigger the large-line crash. Now we search for "type":"file-history-snapshot" anywhere in the first line of the 4 KiB prefix, matching any key order.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1280-1285: The fast-path currently inspects only
prefix.split('\n',1)[0], which misses cases where the transcript begins with
blank lines; change the logic that computes firstLine (used in the if that tests
/^\s*\{/ and /"type"\s*:\s*"file-history-snapshot"/) to instead scan prefix for
the first non-empty line (first line with any non-whitespace characters) and run
the same regex checks against that line, ensure logger.debug('Skipping
file-history-snapshot transcript file for context tokens') and the early return
remain unchanged, and add a unit/integration regression that supplies a prefix
with leading blank lines followed by a file-history-snapshot record to verify
the fast-path triggers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 97421b2b-47f9-4e0d-bb8c-f6bff4ae3db7
📒 Files selected for processing (1)
apps/ccusage/src/data-loader.ts
Files starting with blank lines before the snapshot record would miss the fast-path check and fall back to readline, re-opening the large-line buffering issue. Use .find() to skip empty leading lines.
|
Addressed the latest CodeRabbit feedback in 95ab9a2 — the snapshot fast-path now uses |
Summary
calculateContextTokensfull-filereadFileparsing with streaming readline-based parsingtype: "file-history-snapshot"file-history-snapshottranscript inputsTesting
pnpm --dir ccusage --filter ccusage testFixes #873
Summary by CodeRabbit
Performance
Bug Fixes
Behavior Changes
Tests