Purpose: Document how pi_agent_rust validates behavioral compatibility with Pi Agent (TypeScript).
pi_agent_rust must behave identically to the TypeScript reference implementation for all observable behaviors. This document describes the conformance testing approach used to verify this compatibility.
┌─────────────────────────────────────────────────────────────────────┐
│ Test Layers │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Unit Tests │ │ Conformance │ │ Integration │ │
│ │ (src/*.rs) │ │ Tests │ │ Tests │ │
│ │ │ │ (fixtures) │ │ (E2E) │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ Tests internal Tests observable Tests full │
│ logic in isolation behavior vs fixtures agent workflow │
│ │
└─────────────────────────────────────────────────────────────────────┘
Location: src/*.rs inline #[cfg(test)] modules
Coverage:
- Message type serialization/deserialization
- SSE parser edge cases
- Truncation algorithms
- Path resolution
- Provider message conversion
- Package manager source parsing/identity + settings updates
- Skills loader + prompt template expansion
Count: 35+ tests
Location: tests/conformance/
Purpose: Verify tool behavior matches TypeScript reference.
Fixture Format:
{
"version": "1.0",
"tool": "read",
"description": "Conformance tests for the read tool",
"cases": [
{
"name": "read_simple_file",
"setup": [
{"type": "create_file", "path": "test.txt", "content": "hello"}
],
"input": {"path": "test.txt"},
"expected": {
"content_exact": "hello",
"details_none": true
}
}
]
}Fixture Files:
| Tool | File | Cases |
|---|---|---|
| read | read_tool.json |
5+ |
| bash | bash_tool.json |
10+ |
| edit | edit_tool.json |
8+ |
| write | write_tool.json |
6+ |
| grep | grep_tool.json |
8+ |
| find | find_tool.json |
5+ |
| ls | ls_tool.json |
5+ |
| truncation | truncation.json |
10+ |
Location: tests/*.rs
Purpose: End-to-end testing of agent workflows.
Coverage (current):
tests/rpc_mode.rs: RPC protocol sanity (get_state, prompt streaming events, get_session_stats)tests/e2e_cli.rs: headless CLI smoke (print mode, selection paths)tests/provider_streaming.rs: VCR-backed provider streaming playback (Anthropic/OpenAI/Gemini/Azure)tests/compaction.rs: compaction engine behavior with scripted provider
Planned:
- Fixture-based RPC conformance harness comparing Rust RPC responses/events against the TypeScript reference (
legacy_pi_mono_code/pi-mono/packages/coding-agent/docs/rpc.md).
Location: tests/ext_conformance_diff.rs + tests/ext_conformance/
Purpose: Validate extension runtime behavior (registration/events/hostcalls) against the TypeScript reference by running the SAME extension in BOTH the TS oracle (Bun + jiti) and Rust QuickJS runtime, then comparing registration snapshots.
Results (2026-02-05):
| Corpus | Passed | Total | Rate | Notes |
|---|---|---|---|---|
| Official | 60 | 60 | 100% | All pass, test runs in CI |
| Built-in | 4 | 4 | 100% | Pi-mono built-in extensions (diff, files, prompt-url-widget, redraws) |
| Community | 53 | 58 | 91.4% | 53/53 testable pass; 5 TS oracle env failures |
| npm | 47 | 63 | 74.6% | 16 failures: 13 missing npm deps, 3 env issues |
| Third-party | 19 | 23 | 82.6% | 19/19 testable pass; 4 known-unfixable skipped |
Community TS oracle failures (environment issues, not Rust bugs):
nicobailon-interactive-shell: requires nativepty.nodemodulenicobailon-interview-tool: missingform/index.htmlfilequalisero-background-notify: missing../../sharedmodulequalisero-pi-agent-scip: missing./dist/extension.jsqualisero-safe-git: missing../../sharedmodule
Third-party known-unfixable failures (external dependencies, not Rust bugs):
kcosr: readFileSync adjacent.mdfile (VFS is in-memory only)marckrenn: imports@marckrenn/pi-sub-shared(private npm package)ogulcancelik: readFileSync adjacent.htmlfile (VFS is in-memory only)qualisero: imports@sourcegraph/scip-typescript(external npm package)
Built-in pi-mono extensions (source_tier built-in-pi-mono):
diff: Slash command/diffshowing git diff in TUI (usespi.exec,ctx.ui.custom)files: Slash command/fileslisting session file operations (usesctx.sessionManager,ctx.ui.custom,pi.exec)prompt-url-widget: Widget for PR/issue URLs withghmetadata (usespi.on,pi.exec,ctx.ui.setWidget,pi.getSessionName,pi.setSessionName)redraws: Slash command/tuishowing TUI redraw stats (usesctx.ui.custom,ctx.ui.notify)
Key runtime features enabling conformance:
- In-memory virtual filesystem (
__pi_vfs) fornode:fs - CJS-to-ESM transformation shim for CommonJS extensions
createRequireresolves actual builtin modules- Virtual module stubs:
shell-quote,vscode-languageserver-protocol,@modelcontextprotocol/sdk,glob,uuid,diff,just-bash,bunfig,dotenv registerCommandaccepts bothspec.handlerandspec.fn(PiCommand compat)- Global
URLpolyfill for QuickJS - Comprehensive node polyfills:
fs,path,os,crypto,url,process,buffer,child_process
Current building blocks:
- Differential test runner (
tests/ext_conformance_diff.rs) - TS oracle harness (
tests/ext_conformance/ts_harness/run_extension.ts) - Vendored artifacts (
tests/ext_conformance/artifacts/*) - Deterministic PiJS scheduler conformance (
tests/event_loop_conformance.rs)
This section turns the extension taxonomy (see EXTENSIONS.md §1B) into a
concrete conformance matrix and a test plan. The goal is to ensure every
extension shape has explicit, testable pass/fail criteria and fixture
coverage.
| Extension Shape | Entrypoint / Config | Required Capabilities / I/O | Expected Behaviors (Pass/Fail) | Coverage (Current / Planned) |
|---|---|---|---|---|
| PiJS (JS/TS) | extension.json (pi.ext.manifest.v1) or package manifest; entry .ts/.js |
tool (→ read/write/exec), http, session, ui, log |
PASS if: registrations match (tools/commands/flags/shortcuts/providers); derived capability matches hostcall method (see EXTENSIONS.md §3.2A); deterministic event ordering per scheduler contract; mock outputs deterministic under fixed spec; errors map to taxonomy (timeout/denied/io/invalid_request/internal). |
Current: tests/e2e_extension_registration.rs, tests/extensions_registration.rs, tests/ext_conformance.rs, tests/event_loop_conformance.rs, tests/ext_conformance/event_payloads/event_payloads.json, tests/ext_conformance/mock_specs/*, tests/ext_conformance_fixture_schema.rs. Planned: differential TS↔Rust runner (bd-21dv). |
| WASM Component | extension.json with runtime="wasm"; entry .wasm component |
WIT hostcalls → same capability set as PiJS | PASS if: registration + hostcall behavior matches PiJS contract; capability derivation identical to JS; deterministic logs; error taxonomy identical. | Planned: WASM host conformance + parity suite (bd-nom, bd-320). |
| MCP Server | MCP config or CLI args (stdio/http/sse) | MCP protocol (tools list + tool call/response); policy-gated connectors | PASS if: tool schemas discoverable; tool calls execute with deterministic mocks; policy denials surfaced as MCP errors; timeouts handled. | Planned: MCP conformance harness + fixtures (TBD). |
| Skill Pack | SKILL.md + assets |
File load only (no hostcalls) | PASS if: frontmatter valid; name/description parsed; injected into system prompt; skill resolution precedence correct. | Current: tests/resource_loader.rs, tests/e2e_cli.rs (skill discovery paths). |
| Prompt Template | .md prompt file (optional frontmatter) |
File load only | PASS if: template parse succeeds; parameters substitute deterministically; /template invocation expands correctly. |
Current: tests/resource_loader.rs, tests/e2e_cli.rs (template paths). |
| Theme | .json theme file |
File load only | PASS if: JSON schema valid; theme resolves/loads; TUI applies without panics. | Current: tests/tui_snapshot.rs + theme loader coverage. |
| Package Source | Package manifest listing resources | Depends on contained resources | PASS if: resource discovery resolves correctly; collisions resolved deterministically; package precedence honored. | Current: tests/package_manager.rs, tests/resource_loader.rs, tests/e2e_cli.rs (package flows). |
-
Fixture schemas
- Validate event payload fixtures:
tests/ext_conformance/event_payloads/event_payloads.json - Validate mock specs:
tests/mock_spec_schema.rs+tests/mock_spec_validation.rs
- Validate event payload fixtures:
-
Registration parity
- Rust runtime:
tests/extensions_registration.rs+tests/e2e_extension_registration.rs - Output: tools/commands/flags/shortcuts/providers must match expected snapshots
- Rust runtime:
-
Event conformance
- Use
tests/ext_conformance/event_payloads/event_payloads.jsonto drive event hooks - Validate scheduling/determinism:
tests/event_loop_conformance.rs
- Use
-
Hostcall + capability mapping
- Exercise
tool_call/tool_result/pi.http/pi.execwith mock specs - Assert derived capabilities match taxonomy (see
EXTENSIONS.md§3.2A)
- Exercise
-
Differential TS ↔ Rust (oracle mode)
- TS harness:
tests/ext_conformance/ts_harness/run_extension.ts - Rust harness:
tests/ext_conformance.rs+ conformance comparators - Planned runner:
bd-21dv(per-extension comparisons + report)
- TS harness:
-
Resource packs
- Skills/prompts/themes/packages:
tests/resource_loader.rs+tests/e2e_cli.rs
- Skills/prompts/themes/packages:
-
Pass/Fail Criteria Summary
- PASS = registration parity + deterministic outputs + error taxonomy compliance
- FAIL = any mismatch in registration, capability derivation, or normalized output diff
- SKIP = unsupported capability/shape (must include rationale + tracking bead)
All extension-related logs must conform to the ext.log.v1 schema
(see EXTENSIONS.md). The conformance harness records JSONL logs per scenario:
- Harness output:
target/ext_conformance/logs/<scenario_id>.jsonl - Capture output:
tests/ext_conformance/capture/<ext>/<scenario>/extension.log.jsonl
Normalization for deterministic diffs:
- Replace
ts,pid,host,run_id,session_id,artifact_id,trace_id,span_idwith placeholders. - Normalize absolute paths to
<cwd>/....
Deterministic runtime controls (TS oracle + Rust PiJS):
- Patched globals:
Date/Date.now,Math.random,process.cwd,process.env.HOME,pi.time.nowMs. - Env vars:
PI_DETERMINISTIC_TIME_MS,PI_DETERMINISTIC_TIME_STEP_MS,PI_DETERMINISTIC_RANDOM,PI_DETERMINISTIC_RANDOM_SEED,PI_DETERMINISTIC_CWD,PI_DETERMINISTIC_HOME.
CI consumption:
- Archive
target/ext_conformance/logs/**as CI artifacts. - Diffs should be grouped by
eventandcorrelationIDs to speed triage.
Most npm extensions are tier 3+ and therefore #[ignore] by default. To attempt all npm-registry
extensions, include ignored tests:
CARGO_TARGET_DIR=/tmp/pi_target cargo test --test ext_conformance_generated ext_npm_ -- --include-ignoredSnapshot (2026-02-05):
- npm extensions attempted: 63
- passed: 28
- failed: 35
- self-contained subset (
conformance_tier <= 2andhas_npm_deps = false): 14/17 passed (82.4%)
Failure summary (one row per failing extension):
| Extension | Category | Detail |
|---|---|---|
npm/aliou-pi-guardrails |
missing_npm_dependency |
@aliou/pi-utils-settings |
npm/aliou-pi-linkup |
missing_global_console |
console is not defined |
npm/aliou-pi-processes |
relative_import_resolution |
../components/processes-component |
npm/aliou-pi-synthetic |
manifest_mismatch |
expected command 'synthetic:quotas' not found in actual commands: [] |
npm/aliou-pi-toolchain |
missing_npm_dependency |
@aliou/sh |
npm/benvargas-pi-ancestor-discovery |
missing_node_shim_export |
Could not find export 'isAbsolute' in module 'node:path' |
npm/imsus-pi-extension-minimax-coding-plan-mcp |
missing_node_shim_export |
Could not find export 'readFile' in module 'node:fs' |
npm/juanibiapina-pi-files |
missing_npm_dependency |
@juanibiapina/pi-extension-settings |
npm/lsp-pi |
missing_npm_dependency |
vscode-languageserver-protocol/node.js |
npm/marckrenn-pi-sub-bar |
missing_npm_dependency |
@marckrenn/pi-sub-shared |
npm/marckrenn-pi-sub-core |
missing_npm_dependency |
@marckrenn/pi-sub-shared |
npm/permission-pi |
missing_npm_dependency |
shell-quote |
npm/pi-agentic-compaction |
missing_npm_dependency |
just-bash |
npm/pi-amplike |
manifest_mismatch |
manifest says it registers tools, but no tool defs were captured |
npm/pi-bash-confirm |
manifest_mismatch |
expected command 'demo-bash-confirm' not found in actual commands: ["bash-confirm"] |
npm/pi-brave-search |
missing_npm_dependency |
@mozilla/readability |
npm/pi-ghostty-theme-sync |
missing_node_shim_export |
Could not find export 'createHash' in module 'node:crypto' |
npm/pi-mermaid |
missing_npm_dependency |
beautiful-mermaid |
npm/pi-messenger |
missing_node_shim_export |
Could not find export 'isAbsolute' in module 'node:path' |
npm/pi-multicodex |
missing_virtual_module_export |
Could not find export 'getApiProvider' in module '@mariozechner/pi-ai' |
npm/pi-repoprompt-mcp |
missing_npm_dependency |
@modelcontextprotocol/sdk |
npm/pi-screenshots-picker |
missing_npm_dependency |
glob |
npm/pi-search-agent |
missing_npm_dependency |
dotenv |
npm/pi-session-ask |
runtime_error |
not a function |
npm/pi-shadow-git |
missing_node_shim_export |
Could not find export 'isAbsolute' in module 'node:path' |
npm/pi-super-curl |
missing_npm_dependency |
uuid |
npm/pi-telemetry-otel |
missing_npm_dependency |
@opentelemetry/api |
npm/pi-wakatime |
missing_node_builtin |
node:stream |
npm/pi-watch |
missing_npm_dependency |
chokidar |
npm/pi-web-access |
missing_npm_dependency |
@mozilla/readability |
npm/ralph-loop-pi |
missing_virtual_module_export |
Could not find export 'AssistantMessageComponent' in module '@mariozechner/pi-coding-agent' |
npm/vaayne-agent-kit |
missing_npm_dependency |
@modelcontextprotocol/sdk/client/index.js |
npm/vaayne-pi-mcp |
missing_npm_dependency |
@modelcontextprotocol/sdk/client/index.js |
npm/vaayne-pi-web-tools |
missing_npm_dependency |
jsdom |
npm/zenobius-pi-dcp |
missing_npm_dependency |
bunfig |
To make E2E and integration tests auditable and diffable, tests emit structured JSONL logs and a JSONL artifact index. These are intended for CI artifact capture and deterministic diffing alongside normalized fixtures.
Each log entry is one JSON object per line:
{
"schema": "pi.test.log.v1",
"type": "log",
"test": "e2e_cli_help_flag",
"seq": 1,
"ts": "2026-02-03T03:01:02.123Z",
"t_ms": 123,
"level": "info",
"category": "setup",
"message": "Created test directory",
"context": {
"path": "/tmp/pi-test-123/workspace",
"size": "42 bytes"
}
}Field notes:
tsis ISO-8601 UTC;t_msis relative to harness start.testis optional; when present it is a single string.contextis a flat string map (redacted for sensitive keys).
Each artifact entry is one JSON object per line:
{
"schema": "pi.test.artifact.v1",
"type": "artifact",
"test": "e2e_cli_help_flag",
"seq": 1,
"ts": "2026-02-03T03:01:05.000Z",
"t_ms": 3000,
"name": "stdout.txt",
"path": "/tmp/pi-test-123/stdout.txt",
"size_bytes": 2048,
"sha256": "sha256:deadbeef..."
}Normalized JSONL replaces non-deterministic values so diffs are stable:
ts→<TIMESTAMP>t_ms→0- absolute project paths →
<PROJECT_ROOT>/... - temp/test paths →
<TEST_ROOT>/... - UUIDs/run IDs →
<UUID>/<RUN_ID>when present in strings - local ports in URLs →
<PORT>
Normalized outputs are written alongside raw logs with a .normalized.jsonl suffix.
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Unique test identifier |
description |
string | no | Human-readable description |
setup |
array | no | Steps to initialize test environment |
input |
object | yes | Tool input parameters |
expected |
object | yes | Expected results |
expect_error |
bool | no | Whether test should fail |
error_contains |
string | no | Expected error substring |
tags |
array | no | Categories for filtering |
| Type | Fields | Description |
|---|---|---|
create_file |
path, content |
Create file with content |
create_dir |
path |
Create directory |
run_command |
command |
Execute shell command |
| Field | Type | Description |
|---|---|---|
content_exact |
string | Content must match exactly |
content_contains |
array | Content must include all substrings |
content_not_contains |
array | Content must NOT include any substring |
content_regex |
string | Content must match regex |
details |
object | Details must contain keys (value check optional) |
details_exact |
object | Details must match exactly |
details_none |
bool | Details must be None |
Fixtures were created by:
- Running TypeScript tools with specific inputs
- Capturing outputs and metadata
- Encoding expected behavior in JSON
Future automation with TypeScript capture harness:
# Run TypeScript reference and capture output
cd pi-mono
node capture-fixtures.js --tool read --output fixtures/read_tool.json
# Run Rust implementation against same fixtures
cd ../pi_agent_rust
cargo test --test conformance_fixturescargo testcargo test --libcargo test --test tools_conformance
cargo test --test conformance_fixturescargo test -- --nocapturecargo test read_tool
cargo test bash_toolAdd cases to the appropriate tests/conformance/fixtures/<tool>_tool.json:
{
"name": "new_test_case",
"description": "Test some edge case",
"setup": [...],
"input": {...},
"expected": {...}
}- Create fixture file:
tests/conformance/fixtures/<tool>_tool.json - Add test module to
tests/tools_conformance.rs - Implement fixture runner for the tool
Before adding a fixture, verify the expected behavior:
# In pi-mono
echo '{"path": "test.txt"}' | node -e "
const tool = require('./tools/read');
process.stdin.on('data', async (d) => {
const result = await tool.execute(JSON.parse(d));
console.log(JSON.stringify(result, null, 2));
});
"All tools return:
struct ToolResult {
content: Vec<ContentBlock>, // Primary output
details: Option<Value>, // Metadata (truncation info, etc.)
is_error: bool, // Error flag
error_type: Option<String>, // Error classification
}| Constant | Value | Used By |
|---|---|---|
DEFAULT_MAX_LINES |
2000 | read, bash, grep |
DEFAULT_MAX_BYTES |
50KB | read, bash, grep, find, ls |
GREP_MAX_LINE_LENGTH |
500 | grep |
Truncation message format:
[N more lines in file. Use offset=M to continue.]
- Absolute paths used as-is
~expanded to home directory- Relative paths resolved from working directory
- Symlinks followed for reads, not for writes
Tools should return errors (not panic) for:
- File not found
- Permission denied
- Invalid path
- Timeout exceeded
- Invalid input parameters
| Symptom | Likely Cause | Fix |
|---|---|---|
| Content mismatch | Different newline handling | Check \n vs \r\n |
| Details mismatch | Extra/missing metadata | Update fixture or code |
| Timeout | Async handling difference | Check spawn/wait logic |
| Order mismatch | Non-deterministic output | Sort before compare |
# Run specific test with debug output
RUST_LOG=debug cargo test test_name -- --nocapture
# Compare outputs manually
cargo run -- -p 'read test.txt' > rust_output.txt
node pi-mono/cli.js -p 'read test.txt' > ts_output.txt
diff rust_output.txt ts_output.txt| Category | Target | Current |
|---|---|---|
| Core types | 100% | ~95% |
| Tools | 100% | ~80% |
| Providers | Streaming paths | ~70% |
| Session | JSONL format | ~60% |
| CLI | Argument parsing | ~40% |
- TypeScript Reference Harness: Automated fixture generation from pi-mono
- Session Format Tests: JSONL compatibility verification
- CLI Argument Tests: Flag parsing conformance
- Streaming Tests: SSE event sequence validation
- Performance Benchmarks: Latency and throughput comparison
- FEATURE_PARITY.md: Implementation status tracker
- README.md: Project overview
- AGENTS.md: AI agent instructions