Skip to content

fix(ccusage): avoid RangeError when parsing large transcript JSONL files#875

Open
MumuTW wants to merge 6 commits intoryoppippi:mainfrom
MumuTW:fix-ccusage-file-history-snapshot-873
Open

fix(ccusage): avoid RangeError when parsing large transcript JSONL files#875
MumuTW wants to merge 6 commits intoryoppippi:mainfrom
MumuTW:fix-ccusage-file-history-snapshot-873

Conversation

@MumuTW
Copy link

@MumuTW MumuTW commented Mar 6, 2026

Summary

  • replace calculateContextTokens full-file readFile parsing with streaming readline-based parsing
  • skip transcript files early when first non-empty line has type: "file-history-snapshot"
  • add regression test for file-history-snapshot transcript inputs

Testing

  • pnpm --dir ccusage --filter ccusage test

Fixes #873

Summary by CodeRabbit

  • Performance

    • Improved large-transcript processing by switching to streaming parsing, reducing memory use and speeding up reads.
  • Bug Fixes

    • More resilient parsing and error handling; avoids full-file reads for certain transcript types and produces more accurate context-token calculations.
  • Behavior Changes

    • JSON output mode now silences standard log output for quieter machine-readable results.
  • Tests

    • Added tests covering streaming behavior and early-skip transcript scenarios.

@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces full-file JSONL reads with line-by-line streaming in ccusage to skip large file-history-snapshot files and extract latest assistant usage; moves contextLimit fetch after streaming. Separately, opencode CLI commands now set logger.level = 0 when JSON output is requested.

Changes

Cohort / File(s) Summary
Streaming JSONL parser
apps/ccusage/src/data-loader.ts
Rewrote calculateContextTokens to use createReadStream + readline streaming, fast-prefix check to early-skip file-history-snapshot, per-line JSON parse + transcriptMessageSchema validation, track latest assistant usage (input/cache tokens), defer PricingFetcher contextLimit fetch until after streaming, robust per-line error handling, return null when no usable usage found.
CLI JSON output logging changes
apps/opencode/src/commands/daily.ts, apps/opencode/src/commands/weekly.ts, apps/opencode/src/commands/monthly.ts, apps/opencode/src/commands/session.ts
When --json / jsonOutput is set, set logger.level = 0 early to silence normal logging for JSON-mode output.

Sequence Diagram(s)

sequenceDiagram
  participant FS as File System
  participant Stream as Stream Reader
  participant Parser as Per-line Parser/Validator
  participant Aggregator as Usage Aggregator
  participant Pricing as PricingFetcher

  FS->>Stream: open JSONL (createReadStream)
  Stream->>Parser: emit next line
  Parser-->>Stream: parsed object or error
  alt first-line indicates file-history-snapshot
    Parser->>Aggregator: signal skip -> return null
  else assistant usage line found
    Parser->>Aggregator: update latestUsage (inputTokens, cacheTokens)
    Aggregator->>Stream: continue reading
  end
  Stream->>Aggregator: EOF
  Aggregator->>Pricing: request contextLimit (modelId)
  Pricing-->>Aggregator: contextLimit or failure
  Aggregator->>Caller: compute percentage or return null
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ryoppippi

Poem

🐰 I hop through lines and parse with care,

Sniffing snapshots, skipping bulky lair.
I tally tokens, latest first in sight,
Streamed and steady, I keep memory light.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ❓ Inconclusive The PR primarily addresses streaming/file-history-snapshot fixes (in-scope), but adds logger.level = 0 to four command files. While described as a follow-up to #829, these changes are not mentioned in #873. Clarify the purpose of logger.level changes in daily/monthly/session/weekly commands or defer to a separate PR if unrelated to the RangeError fix.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: restructuring calculateContextTokens to use streaming instead of full-file reads, with a fast-prefix check to skip file-history-snapshot transcripts.
Linked Issues check ✅ Passed All coding requirements from issue #873 are addressed: stream-based parsing, early skip of file-history-snapshot via fast-prefix check, robust type detection regardless of key order, and regression tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/ccusage/src/data-loader.ts (1)

1295-1314: Type assignment may not fully narrow input_tokens to required.

The assignment at line 1309 assigns obj.message.usage (where input_tokens is optional per transcriptUsageSchema) to latestUsage (where input_tokens is required). While the check at line 1307 ensures input_tokens != null at runtime, TypeScript's property narrowing may not fully narrow the parent object type.

Consider using a type assertion or explicit object construction to ensure type safety:

💡 Suggested refactor for explicit type construction
 if (
     obj.type === 'assistant' &&
     obj.message != null &&
     obj.message.usage != null &&
     obj.message.usage.input_tokens != null
 ) {
-    latestUsage = obj.message.usage;
+    latestUsage = {
+        input_tokens: obj.message.usage.input_tokens,
+        cache_creation_input_tokens: obj.message.usage.cache_creation_input_tokens,
+        cache_read_input_tokens: obj.message.usage.cache_read_input_tokens,
+    };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ccusage/src/data-loader.ts` around lines 1295 - 1314, The assignment of
obj.message.usage to latestUsage can leave TypeScript unconvinced that
input_tokens is present because transcriptUsageSchema marks it optional; to fix,
explicitly construct or cast a value with the required shape before assigning to
latestUsage — e.g., after the runtime check (obj.message.usage != null &&
obj.message.usage.input_tokens != null) create a new object with the needed
properties (or use a type assertion to the required type) and assign that to
latestUsage; update the logic around transcriptMessageSchema,
transcriptUsageSchema, obj, input_tokens, and latestUsage in the try block so
the compiler sees a value that definitely satisfies latestUsage's required
fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1295-1314: The assignment of obj.message.usage to latestUsage can
leave TypeScript unconvinced that input_tokens is present because
transcriptUsageSchema marks it optional; to fix, explicitly construct or cast a
value with the required shape before assigning to latestUsage — e.g., after the
runtime check (obj.message.usage != null && obj.message.usage.input_tokens !=
null) create a new object with the needed properties (or use a type assertion to
the required type) and assign that to latestUsage; update the logic around
transcriptMessageSchema, transcriptUsageSchema, obj, input_tokens, and
latestUsage in the try block so the compiler sees a value that definitely
satisfies latestUsage's required fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f65d700-2b4a-40b7-bd2d-fc00166209e9

📥 Commits

Reviewing files that changed from the base of the PR and between c40ea6e and 0896c64.

📒 Files selected for processing (1)
  • apps/ccusage/src/data-loader.ts

@MumuTW
Copy link
Author

MumuTW commented Mar 6, 2026

Follow-up for #829: silenced logger output in JSON mode for ccusage-opencode.\n\nWhat changed:\n- Set when is active in:\n - \n - \n - \n - \n\nValidation:\n- vinext |  WARN  The field "pnpm.peerDependencyRules" was found in /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/vinext/package.json. This will not take effect. You should configure "pnpm.peerDependencyRules" at the root of the workspace instead.
vinext |  WARN  The field "pnpm.onlyBuiltDependencies" was found in /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/vinext/package.json. This will not take effect. You should configure "pnpm.onlyBuiltDependencies" at the root of the workspace instead.

@ccusage/opencode@18.0.8 test /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/ccusage/apps/opencode
TZ=UTC vitest

RUN v4.0.15 /home/opc/.paperclip/instances/default/workspaces/7948d02f-b91e-4189-b9eb-32bf0b5923d2/ccusage/apps/opencode

✓ src/data-loader.ts (2 tests) 4ms
✓ src/commands/weekly.ts (4 tests) 4ms

Test Files 2 passed (2)
Tests 6 passed (6)
Start at 05:10:22
Duration 463ms (transform 238ms, setup 0ms, import 435ms, tests 8ms, environment 0ms)\n- (from )\n\nCommit:

@MumuTW
Copy link
Author

MumuTW commented Mar 6, 2026

Follow-up for #829: silenced logger output in JSON mode for ccusage-opencode.

What changed:

  • Set logger.level = 0 when --json is active in:
    • apps/opencode/src/commands/daily.ts
    • apps/opencode/src/commands/monthly.ts
    • apps/opencode/src/commands/session.ts
    • apps/opencode/src/commands/weekly.ts

Validation:

  • pnpm --filter @ccusage/opencode test
  • bun ./src/index.ts daily --json | jq . (run from apps/opencode)

Commit: 9995939

@ryoppippi
Copy link
Owner

thanks! lmc

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 6, 2026

Open in StackBlitz

@ccusage/amp

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/amp@875

ccusage

npm i https://pkg.pr.new/ryoppippi/ccusage@875

@ccusage/codex

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/codex@875

@ccusage/mcp

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/mcp@875

@ccusage/opencode

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/opencode@875

@ccusage/pi

npm i https://pkg.pr.new/ryoppippi/ccusage/@ccusage/pi@875

commit: 9995939

…ement

The latestUsage variable requires input_tokens as a non-optional number,
but obj.message.usage has it as optional. Explicitly construct the object
after the null check so TypeScript can see the narrowed type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1269-1288: The current fast-path checks the first non-empty line
via readline (createInterface) which forces Node to buffer the entire line and
crashes on huge single-line records; fix by reading a bounded prefix from the
file before creating the readline reader: open transcriptPath with fs (e.g.,
fs.open + filehandle.read or createReadStream with { start: 0, end: N-1 }), read
a small prefix (e.g., 4 KiB), trim leading whitespace, attempt to parse only
that prefix (or regex-extract the initial {"type":...} token) to detect if type
=== "file-history-snapshot", and if so log via logger.debug and return null;
otherwise close the temp handle/stream and then create the original
createReadStream + createInterface and continue as before. Ensure you properly
close file handles/streams (or destroy the temp stream) and preserve the
existing variables firstNonEmptyLineSeen and the rest of the processing flow.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc0d4887-4b7a-4c69-9bb9-34c1e89f1600

📥 Commits

Reviewing files that changed from the base of the PR and between 9995939 and b9fd7cb.

📒 Files selected for processing (1)
  • apps/ccusage/src/data-loader.ts

…tion

Read only the first 4 KiB of the file to detect file-history-snapshot
type instead of using readline, which buffers the entire first line
and crashes on huge single-line records (e.g. 734 MB).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
apps/ccusage/src/data-loader.ts (1)

1299-1320: Keep the new transcript path in the repo’s Result style.

This adds fresh try/catch JSON parsing plus repeated Result.isSuccess(...) checks. Switching the throwable parse to Result.try() and branching on Result.isFailure(contextLimitResult) would match the project’s byethrow conventions and keep the happy path flatter. As per coding guidelines, "Prefer @praha/byethrow Result type over traditional try-catch for functional error handling", "Use Result.try() for wrapping operations that may throw (JSON parsing, etc.)", and "Use Result.isFailure() for checking errors (more readable than !Result.isSuccess())".

Also applies to: 1342-1352

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1267-1284: The fast-path currently assumes the first 4KiB begins
with {"type":...} and misses snapshots when "type" is not the first property;
change the probe to extract the first line from prefixBuf (find first newline
CR/LF within PREFIX_SIZE), parse that first-line substring as JSON (safe
try/catch) and read its top-level "type" property (instead of regex anchored to
the start) to detect "file-history-snapshot" and short-circuit (symbols:
PREFIX_SIZE, prefixBuf, readSync, transcriptPath, typeMatch/logger.debug); if no
newline is present in the prefix keep the existing fallback to readline; add a
regression test that writes a snapshot line where "type" is not the first field
to ensure detection still works.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2c75f78d-d9c4-4ca0-9955-a44ec79acec1

📥 Commits

Reviewing files that changed from the base of the PR and between b9fd7cb and a3bc6cb.

📒 Files selected for processing (1)
  • apps/ccusage/src/data-loader.ts

The fast-path regex previously required "type" to be the first key in
the JSON object. If a serializer placed another field first (e.g.
"version"), the snapshot would slip through to readline and re-trigger
the large-line crash.

Now we search for "type":"file-history-snapshot" anywhere in the first
line of the 4 KiB prefix, matching any key order.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1280-1285: The fast-path currently inspects only
prefix.split('\n',1)[0], which misses cases where the transcript begins with
blank lines; change the logic that computes firstLine (used in the if that tests
/^\s*\{/ and /"type"\s*:\s*"file-history-snapshot"/) to instead scan prefix for
the first non-empty line (first line with any non-whitespace characters) and run
the same regex checks against that line, ensure logger.debug('Skipping
file-history-snapshot transcript file for context tokens') and the early return
remain unchanged, and add a unit/integration regression that supplies a prefix
with leading blank lines followed by a file-history-snapshot record to verify
the fast-path triggers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97421b2b-47f9-4e0d-bb8c-f6bff4ae3db7

📥 Commits

Reviewing files that changed from the base of the PR and between a3bc6cb and e16a183.

📒 Files selected for processing (1)
  • apps/ccusage/src/data-loader.ts

Files starting with blank lines before the snapshot record would miss
the fast-path check and fall back to readline, re-opening the
large-line buffering issue. Use .find() to skip empty leading lines.
@MumuTW
Copy link
Author

MumuTW commented Mar 8, 2026

Addressed the latest CodeRabbit feedback in 95ab9a2 — the snapshot fast-path now uses .find((l) => l.trim() \!== "") to skip leading blank lines before checking for the type field. This prevents files starting with blank lines from falling through to the readline path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] RangeError: Invalid string length caused by file-history-snapshot JSONL files (734MB)

2 participants