Summary
FastSkill eval already defines input_tokens / output_tokens on CaseResult and CaseSummary, but the eval runner always sets them to None. aikit-sdk (on goaikit/aikit main, including work such as PR #60 / token usage aggregation) now exposes:
RunResult.token_usage: Option<TokenUsage> with aggregated input_tokens, output_tokens, optional totals, cache, and reasoning fields.
AgentEventPayload::TokenUsageLine { usage, source, raw_agent_line_seq } on the streaming callback when RunOptions::emit_token_usage_events is true (default).
This issue tracks updating gofastskill/fastskill so eval runs record that data in artifacts and traces.
Dependency: latest aikit-sdk
- Requirement: Bump the resolved aikit-sdk revision to the current tip of
https://github.com/goaikit/aikit branch main (run cargo update -p aikit-sdk and commit Cargo.lock).
- Reference SHA (verify at implementation time):
main at e007c0182feefa87ba6a1d405826ae420953ca63 (must be at or after the token-usage merge; e.g. commit 72a69c7d517bb26a1e443d51b273ab126b04e1ce and descendants).
- [dependencies] in
Cargo.toml already uses aikit-sdk = { git = "...", branch = "main" }; the important part is refreshing Cargo.lock so CI and local builds pick up the new APIs.
Code changes (FastSkill)
1. src/eval/trace.rs — handle new event variant
- Extend
TracePayload with a variant for token usage (e.g. embed usage + source as JSON-friendly structs, or reuse types re-exported from aikit-sdk if appropriate).
- Update
agent_events_to_trace to match AgentEventPayload::TokenUsageLine so trace.jsonl includes per-step usage when the SDK emits it.
- Ensure existing consumers (
count_raw_json_events, checks on trace) remain correct: token lines should not be counted as raw_json tool commands unless you intentionally change that definition.
2. src/eval/runner.rs — populate CaseResult from RunResult
- On successful
run_agent_events / Ok(run_result), map run_result.token_usage into CaseResult:
- Set
input_tokens / output_tokens from TokenUsage when Some.
- Optionally extend
CaseResult (and serde JSON) with optional fields for total_tokens, cache, and reasoning if product wants full detail in result.json; otherwise document that only input/output are stored in the flat fields.
3. eval report / eval score (if applicable)
- Ensure human-readable report shows token totals when present.
- Confirm
eval score re-read path still deserializes result.json after any schema extension (backward compatible Option fields).
4. Tests
- Update or add CLI/integration tests under
tests/cli/eval_tests.rs (or unit tests with a stub runner) so a fake RunResult with token_usage produces non-null token fields in result.json.
- If the trace format gains a new payload type, add a small serialization test in
eval/trace.rs tests.
Acceptance criteria
References
- aikit-sdk:
TokenUsage, RunResult::token_usage, AgentEventPayload::TokenUsageLine, aggregate_token_usage in aikit-sdk/src/runner.rs.
- fastskill eval:
src/eval/runner.rs, src/eval/trace.rs, src/eval/artifacts.rs (CaseResult).
Summary
FastSkill eval already defines
input_tokens/output_tokensonCaseResultandCaseSummary, but the eval runner always sets them toNone. aikit-sdk (ongoaikit/aikitmain, including work such as PR #60 / token usage aggregation) now exposes:RunResult.token_usage: Option<TokenUsage>with aggregatedinput_tokens,output_tokens, optional totals, cache, and reasoning fields.AgentEventPayload::TokenUsageLine { usage, source, raw_agent_line_seq }on the streaming callback whenRunOptions::emit_token_usage_eventsis true (default).This issue tracks updating gofastskill/fastskill so eval runs record that data in artifacts and traces.
Dependency: latest aikit-sdk
https://github.com/goaikit/aikitbranchmain(runcargo update -p aikit-sdkand commitCargo.lock).mainate007c0182feefa87ba6a1d405826ae420953ca63(must be at or after the token-usage merge; e.g. commit72a69c7d517bb26a1e443d51b273ab126b04e1ceand descendants).Cargo.tomlalready usesaikit-sdk = { git = "...", branch = "main" }; the important part is refreshingCargo.lockso CI and local builds pick up the new APIs.Code changes (FastSkill)
1.
src/eval/trace.rs— handle new event variantTracePayloadwith a variant for token usage (e.g. embedusage+sourceas JSON-friendly structs, or reuse types re-exported fromaikit-sdkif appropriate).agent_events_to_traceto matchAgentEventPayload::TokenUsageLinesotrace.jsonlincludes per-step usage when the SDK emits it.count_raw_json_events, checks on trace) remain correct: token lines should not be counted asraw_jsontool commands unless you intentionally change that definition.2.
src/eval/runner.rs— populateCaseResultfromRunResultrun_agent_events/Ok(run_result), maprun_result.token_usageintoCaseResult:input_tokens/output_tokensfromTokenUsagewhenSome.CaseResult(and serde JSON) with optional fields fortotal_tokens, cache, and reasoning if product wants full detail inresult.json; otherwise document that only input/output are stored in the flat fields.3.
eval report/eval score(if applicable)eval scorere-read path still deserializesresult.jsonafter any schema extension (backward compatibleOptionfields).4. Tests
tests/cli/eval_tests.rs(or unit tests with a stub runner) so a fakeRunResultwithtoken_usageproduces non-null token fields inresult.json.eval/trace.rstests.Acceptance criteria
Cargo.lockpins aikit-sdk to latestmaincompatible withTokenUsage/TokenUsageLine.fastskill eval runwrites non-nullinput_tokens/output_tokensin per-caseresult.jsonwhen the agent stream includes usage the SDK can aggregate.trace.jsonlrecords token usage events without breaking existing checks that rely onraw_jsoncommand counts.gofastskill/skillreferences/eval.md).References
TokenUsage,RunResult::token_usage,AgentEventPayload::TokenUsageLine,aggregate_token_usageinaikit-sdk/src/runner.rs.src/eval/runner.rs,src/eval/trace.rs,src/eval/artifacts.rs(CaseResult).