fix(ccusage): avoid RangeError when parsing large transcript JSONL files by MumuTW · Pull Request #875 · ryoppippi/ccusage

MumuTW · 2026-03-06T03:58:12Z

Summary

replace calculateContextTokens full-file readFile parsing with streaming readline-based parsing
skip transcript files early when first non-empty line has type: "file-history-snapshot"
add regression test for file-history-snapshot transcript inputs

Testing

pnpm --dir ccusage --filter ccusage test

Fixes #873

Summary by CodeRabbit

Performance
- Improved large-transcript processing by switching to streaming parsing, reducing memory use and speeding up reads.
Bug Fixes
- More resilient parsing and error handling; avoids full-file reads for certain transcript types and produces more accurate context-token calculations.
Behavior Changes
- JSON output mode now silences standard log output for quieter machine-readable results.
Tests
- Added tests covering streaming behavior and early-skip transcript scenarios.

…shots

coderabbitai · 2026-03-06T03:58:31Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Replaces full-file JSONL reads with line-by-line streaming in ccusage to skip large file-history-snapshot files and extract latest assistant usage; moves contextLimit fetch after streaming. Separately, opencode CLI commands now set logger.level = 0 when JSON output is requested.

Changes

Cohort / File(s)	Summary
Streaming JSONL parser `apps/ccusage/src/data-loader.ts`	Rewrote `calculateContextTokens` to use `createReadStream` + `readline` streaming, fast-prefix check to early-skip `file-history-snapshot`, per-line JSON parse + `transcriptMessageSchema` validation, track latest assistant usage (input/cache tokens), defer `PricingFetcher` contextLimit fetch until after streaming, robust per-line error handling, return null when no usable usage found.
CLI JSON output logging changes `apps/opencode/src/commands/daily.ts`, `apps/opencode/src/commands/weekly.ts`, `apps/opencode/src/commands/monthly.ts`, `apps/opencode/src/commands/session.ts`	When `--json` / `jsonOutput` is set, set `logger.level = 0` early to silence normal logging for JSON-mode output.

Sequence Diagram(s)

sequenceDiagram
  participant FS as File System
  participant Stream as Stream Reader
  participant Parser as Per-line Parser/Validator
  participant Aggregator as Usage Aggregator
  participant Pricing as PricingFetcher

  FS->>Stream: open JSONL (createReadStream)
  Stream->>Parser: emit next line
  Parser-->>Stream: parsed object or error
  alt first-line indicates file-history-snapshot
    Parser->>Aggregator: signal skip -> return null
  else assistant usage line found
    Parser->>Aggregator: update latestUsage (inputTokens, cacheTokens)
    Aggregator->>Stream: continue reading
  end
  Stream->>Aggregator: EOF
  Aggregator->>Pricing: request contextLimit (modelId)
  Pricing-->>Aggregator: contextLimit or failure
  Aggregator->>Caller: compute percentage or return null

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix(ccusage): use streaming to handle large JSONL files #706: Converts JSONL handling to line-by-line streaming — closely related to streaming approach in data-loader.ts.
feat: add context token display to statusline command #480: Prior changes to calculateContextTokens and transcript parsing/schema logic — overlaps validation and parsing concerns.

Suggested reviewers

ryoppippi

Poem

🐰 I hop through lines and parse with care,

Sniffing snapshots, skipping bulky lair.
I tally tokens, latest first in sight,
Streamed and steady, I keep memory light.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	❓ Inconclusive	The PR primarily addresses streaming/file-history-snapshot fixes (in-scope), but adds logger.level = 0 to four command files. While described as a follow-up to `#829`, these changes are not mentioned in `#873`.	Clarify the purpose of logger.level changes in daily/monthly/session/weekly commands or defer to a separate PR if unrelated to the RangeError fix.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: restructuring calculateContextTokens to use streaming instead of full-file reads, with a fast-prefix check to skip file-history-snapshot transcripts.
Linked Issues check	✅ Passed	All coding requirements from issue `#873` are addressed: stream-based parsing, early skip of file-history-snapshot via fast-prefix check, robust type detection regardless of key order, and regression tests.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

apps/ccusage/src/data-loader.ts (1)

1295-1314: Type assignment may not fully narrow input_tokens to required.

The assignment at line 1309 assigns obj.message.usage (where input_tokens is optional per transcriptUsageSchema) to latestUsage (where input_tokens is required). While the check at line 1307 ensures input_tokens != null at runtime, TypeScript's property narrowing may not fully narrow the parent object type.

Consider using a type assertion or explicit object construction to ensure type safety:

💡 Suggested refactor for explicit type construction

 if (
     obj.type === 'assistant' &&
     obj.message != null &&
     obj.message.usage != null &&
     obj.message.usage.input_tokens != null
 ) {
-    latestUsage = obj.message.usage;
+    latestUsage = {
+        input_tokens: obj.message.usage.input_tokens,
+        cache_creation_input_tokens: obj.message.usage.cache_creation_input_tokens,
+        cache_read_input_tokens: obj.message.usage.cache_read_input_tokens,
+    };
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/ccusage/src/data-loader.ts` around lines 1295 - 1314, The assignment of
obj.message.usage to latestUsage can leave TypeScript unconvinced that
input_tokens is present because transcriptUsageSchema marks it optional; to fix,
explicitly construct or cast a value with the required shape before assigning to
latestUsage — e.g., after the runtime check (obj.message.usage != null &&
obj.message.usage.input_tokens != null) create a new object with the needed
properties (or use a type assertion to the required type) and assign that to
latestUsage; update the logic around transcriptMessageSchema,
transcriptUsageSchema, obj, input_tokens, and latestUsage in the try block so
the compiler sees a value that definitely satisfies latestUsage's required
fields.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1295-1314: The assignment of obj.message.usage to latestUsage can
leave TypeScript unconvinced that input_tokens is present because
transcriptUsageSchema marks it optional; to fix, explicitly construct or cast a
value with the required shape before assigning to latestUsage — e.g., after the
runtime check (obj.message.usage != null && obj.message.usage.input_tokens !=
null) create a new object with the needed properties (or use a type assertion to
the required type) and assign that to latestUsage; update the logic around
transcriptMessageSchema, transcriptUsageSchema, obj, input_tokens, and
latestUsage in the try block so the compiler sees a value that definitely
satisfies latestUsage's required fields.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f65d700-2b4a-40b7-bd2d-fc00166209e9

📥 Commits

Reviewing files that changed from the base of the PR and between c40ea6e and 0896c64.

📒 Files selected for processing (1)

apps/ccusage/src/data-loader.ts

MumuTW · 2026-03-06T05:10:23Z

Follow-up for #829: silenced logger output in JSON mode for ccusage-opencode.\n\nWhat changed:\n- Set when is active in:\n - \n - \n - \n - \n\nValidation:\n- vinext | WARN The field "pnpm.peerDependencyRules" was found in /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/vinext/package.json. This will not take effect. You should configure "pnpm.peerDependencyRules" at the root of the workspace instead.
vinext | WARN The field "pnpm.onlyBuiltDependencies" was found in /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/vinext/package.json. This will not take effect. You should configure "pnpm.onlyBuiltDependencies" at the root of the workspace instead.

@ccusage/opencode@18.0.8 test /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/ccusage/apps/opencode
TZ=UTC vitest

RUN v4.0.15 /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/ccusage/apps/opencode

✓ src/data-loader.ts (2 tests) 4ms
✓ src/commands/weekly.ts (4 tests) 4ms

Test Files 2 passed (2)
Tests 6 passed (6)
Start at 05:10:22
Duration 463ms (transform 238ms, setup 0ms, import 435ms, tests 8ms, environment 0ms)\n- (from )\n\nCommit:

MumuTW · 2026-03-06T05:10:33Z

Follow-up for #829: silenced logger output in JSON mode for ccusage-opencode.

What changed:

Set logger.level = 0 when --json is active in:
- apps/opencode/src/commands/daily.ts
- apps/opencode/src/commands/monthly.ts
- apps/opencode/src/commands/session.ts
- apps/opencode/src/commands/weekly.ts

Validation:

pnpm --filter @ccusage/opencode test
bun ./src/index.ts daily --json | jq . (run from apps/opencode)

Commit: 9995939

ryoppippi · 2026-03-06T08:27:38Z

thanks! lmc

pkg-pr-new · 2026-03-06T08:28:43Z

Open in StackBlitz

@ccusage/amp

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/amp@875

ccusage

npm i https://pkg.pr.new/ryoppippi/ccusage@875

@ccusage/codex

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/codex@875

@ccusage/mcp

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/mcp@875

@ccusage/opencode

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/opencode@875

@ccusage/pi

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/pi@875

commit: 9995939

…ement The latestUsage variable requires input_tokens as a non-optional number, but obj.message.usage has it as optional. Explicitly construct the object after the null check so TypeScript can see the narrowed type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1269-1288: The current fast-path checks the first non-empty line
via readline (createInterface) which forces Node to buffer the entire line and
crashes on huge single-line records; fix by reading a bounded prefix from the
file before creating the readline reader: open transcriptPath with fs (e.g.,
fs.open + filehandle.read or createReadStream with { start: 0, end: N-1 }), read
a small prefix (e.g., 4 KiB), trim leading whitespace, attempt to parse only
that prefix (or regex-extract the initial {"type":...} token) to detect if type
=== "file-history-snapshot", and if so log via logger.debug and return null;
otherwise close the temp handle/stream and then create the original
createReadStream + createInterface and continue as before. Ensure you properly
close file handles/streams (or destroy the temp stream) and preserve the
existing variables firstNonEmptyLineSeen and the rest of the processing flow.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc0d4887-4b7a-4c69-9bb9-34c1e89f1600

📥 Commits

Reviewing files that changed from the base of the PR and between 9995939 and b9fd7cb.

📒 Files selected for processing (1)

apps/ccusage/src/data-loader.ts

apps/ccusage/src/data-loader.ts

…tion Read only the first 4 KiB of the file to detect file-history-snapshot type instead of using readline, which buffers the entire first line and crashes on huge single-line records (e.g. 734 MB). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

apps/ccusage/src/data-loader.ts (1)

1299-1320: Keep the new transcript path in the repo’s Result style.

This adds fresh try/catch JSON parsing plus repeated Result.isSuccess(...) checks. Switching the throwable parse to Result.try() and branching on Result.isFailure(contextLimitResult) would match the project’s byethrow conventions and keep the happy path flatter. As per coding guidelines, "Prefer @praha/byethrow Result type over traditional try-catch for functional error handling", "Use Result.try() for wrapping operations that may throw (JSON parsing, etc.)", and "Use Result.isFailure() for checking errors (more readable than !Result.isSuccess())".

Also applies to: 1342-1352

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1267-1284: The fast-path currently assumes the first 4KiB begins
with {"type":...} and misses snapshots when "type" is not the first property;
change the probe to extract the first line from prefixBuf (find first newline
CR/LF within PREFIX_SIZE), parse that first-line substring as JSON (safe
try/catch) and read its top-level "type" property (instead of regex anchored to
the start) to detect "file-history-snapshot" and short-circuit (symbols:
PREFIX_SIZE, prefixBuf, readSync, transcriptPath, typeMatch/logger.debug); if no
newline is present in the prefix keep the existing fallback to readline; add a
regression test that writes a snapshot line where "type" is not the first field
to ensure detection still works.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2c75f78d-d9c4-4ca0-9955-a44ec79acec1

📥 Commits

Reviewing files that changed from the base of the PR and between b9fd7cb and a3bc6cb.

📒 Files selected for processing (1)

apps/ccusage/src/data-loader.ts

apps/ccusage/src/data-loader.ts

The fast-path regex previously required "type" to be the first key in the JSON object. If a serializer placed another field first (e.g. "version"), the snapshot would slip through to readline and re-trigger the large-line crash. Now we search for "type":"file-history-snapshot" anywhere in the first line of the 4 KiB prefix, matching any key order.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1280-1285: The fast-path currently inspects only
prefix.split('\n',1)[0], which misses cases where the transcript begins with
blank lines; change the logic that computes firstLine (used in the if that tests
/^\s*\{/ and /"type"\s*:\s*"file-history-snapshot"/) to instead scan prefix for
the first non-empty line (first line with any non-whitespace characters) and run
the same regex checks against that line, ensure logger.debug('Skipping
file-history-snapshot transcript file for context tokens') and the early return
remain unchanged, and add a unit/integration regression that supplies a prefix
with leading blank lines followed by a file-history-snapshot record to verify
the fast-path triggers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97421b2b-47f9-4e0d-bb8c-f6bff4ae3db7

📥 Commits

Reviewing files that changed from the base of the PR and between a3bc6cb and e16a183.

📒 Files selected for processing (1)

apps/ccusage/src/data-loader.ts

apps/ccusage/src/data-loader.ts

Files starting with blank lines before the snapshot record would miss the fast-path check and fall back to readline, re-opening the large-line buffering issue. Use .find() to skip empty leading lines.

MumuTW · 2026-03-08T01:54:48Z

Addressed the latest CodeRabbit feedback in 95ab9a2 — the snapshot fast-path now uses .find((l) => l.trim() \!== "") to skip leading blank lines before checking for the type field. This prevents files starting with blank lines from falling through to the readline path.

fix(ccusage): stream context token parsing and skip file-history snap…

0896c64

…shots

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

fix(opencode): silence logger in json output mode

9995939

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

apps/ccusage/src/data-loader.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

apps/ccusage/src/data-loader.ts Show resolved Hide resolved

coderabbitai bot reviewed Mar 7, 2026

View reviewed changes

apps/ccusage/src/data-loader.ts Show resolved Hide resolved

fix: probe first non-empty line in snapshot fast-path

95ab9a2

Files starting with blank lines before the snapshot record would miss the fast-path check and fall back to readline, re-opening the large-line buffering issue. Use .find() to skip empty leading lines.

Uh oh!

Conversation

MumuTW commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

MumuTW commented Mar 6, 2026

Uh oh!

MumuTW commented Mar 6, 2026

Uh oh!

ryoppippi commented Mar 6, 2026

Uh oh!

pkg-pr-new bot commented Mar 6, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MumuTW commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MumuTW commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading