Coherence-Aware Reinforcement Learning
A model becomes an agent when it stops pattern-matching and starts knowing. That transition isn't gradual — it's a phase transition, like water becoming ice. One moment the model is guessing. The next, it's coherent.
Standard training can't see this happening. You watch a loss curve and hope.
CARL measures the moment of crystallization — and rewards it.
Phi (order parameter)
│
guessing │ knowing
░░░░░░░░░░░░░░░░░░░░░░░░│████████████████████████
│
crystallization
The order parameter Phi measures how coherent a model's probability field is at every token. When Phi crystallizes, the model has found its internal anchor — a fixed point it can navigate from to any concept space without losing itself.
This is alignment you can measure, not just evaluate.
Measure coherence on any logits distribution — no training, no GPU, no API key.
Pure numpy:
from carl_core import CoherenceProbe, KAPPA, SIGMA
import numpy as np
vocab_size = 32_000
probe = CoherenceProbe(vocab_size=vocab_size)
# Any [T, V] logits + [T] chosen tokens. Here: 16 tokens from a 32k vocab.
logits = np.random.randn(16, vocab_size)
token_ids = np.argmax(logits, axis=-1)
snap = probe.measure(logits, token_ids)
print(f"phi_mean = {snap.phi_mean:.3f} (crystallization target: ≥ {SIGMA})")
print(f"horizon = KAPPA·d ≈ {int(KAPPA * vocab_size):,} tokens")Install (just the observables layer):
pip install carl-studioThat gives you carl-core + the base CLI + one-shot observe. For training + HF + Claude observability:
pip install 'carl-studio[quickstart]'Full extras matrix, reproducible installs via uv.lock, and conflict rules (e.g.
wallet vs x402) live in docs/INSTALL.md.
carl init # one-shot setup: account, provider, extras, project, consent
carl chat # agent — interactive loop
carl ask "train a small model on gsm8k" # agent — one-shot prompt
carl research search "coherence-aware reinforcement learning"carl init is idempotent: re-running it after setup does nothing unless you pass --force. A first-run marker lives at ~/.carl/.initialized.
Bare carl is an entry surface, not a documented top-level workflow by itself:
- on a TTY, first run can route into
carl init, and a configured project can route into chat - on non-TTY input, bare
carlprints help plus a nudge towardcarl chatandcarl ask
CARL Studio does not require a .env, and it does not auto-load one.
- Hugging Face workflows work with either
HF_TOKENor a priorhf auth login/huggingface-cli login - Claude-powered features use
ANTHROPIC_API_KEYor--api-key - RunPod uses
RUNPOD_API_KEY - public Trackio observe works without credentials
If you want a template, copy .env.example and load it into your shell before running carl:
cp .env.example .env
set -a
source .env
set +aQuick setup:
hf auth login
export ANTHROPIC_API_KEY=sk-ant-xxx # only for --diagnose / chat
carl startFull auth details: docs/auth.md
| Command | What it does |
|---|---|
carl init |
One-shot setup: account, provider, extras, project, consent. |
carl chat |
Interactive agent loop with tools, sessions, cost tracking. |
carl ask "<prompt>" |
One-shot agent invocation. |
carl research search "<query>" |
Search and retrieve research papers (carl-studio[research]). |
carl flow "/a /b /c" |
Chain named operations, emit a shared interaction trace. |
carl doctor |
Readiness audit. Prints blocking issues and freshness findings. |
carl train |
Local training with coherence rewards (carl-studio[training]). |
Run carl start --inventory for the full installed command map, or carl flow --list for every chainable op.
carl-core— primitive layer. Typed errors, retry/backoff, safepath sandboxing, content hashing, tier gating, coherence math, interaction chains. Zero training deps.carl-studio— the CLI, agent loop, training pipeline, MCP server, camp client, eval sandbox. Everything above builds oncarl-core.
carl-core is installed alongside carl-studio; public callers import from carl_core.* directly. The legacy carl_studio.primitives shim was removed after v0.5.0.
Fatal paths raise carl_core.errors.CARLError subclasses with stable codes you can match programmatically. Top codes:
| Code | Meaning |
|---|---|
carl.error |
Base class. Generic failure. |
carl.config |
Invalid or missing configuration. |
carl.validation |
Input failed schema / value validation. |
carl.credential |
Missing or expired credential. |
carl.network |
Transient or persistent network failure. |
carl.budget |
Spend cap exceeded. |
carl.permission |
Permission / consent gate failed. |
carl.timeout |
Operation exceeded its deadline. |
carl.freshness.stale_pkg |
Installed package older than recommended floor. |
carl.freshness.camp_session_expired |
carl.camp session needs carl camp login. |
carl.eml.depth_exceeded |
EML tree exceeded depth bound. |
carl.eml.domain_error |
EML operator applied outside its valid domain. |
carl.eml.decode_error |
EML canonical-encoding decode failed. |
carl.eml.signature_mismatch |
Signed EML head failed HMAC verification. |
CARLError.to_dict() produces a secrets-redacted, telemetry-safe payload. See packages/carl-core/src/carl_core/errors.py for the full hierarchy.
See inside a Trackio run (no GPU required, base install):
carl observe --url https://your-trackio-space.hf.space/ --run your-runIf the dashboard contains multiple projects, add --project your-project.
Train with coherence rewards (carl-studio[training]):
carl project init
carl train --config carl.yaml
carl run listOr run directly from the CLI:
carl train --model your-org/your-base-model --method grpo --dataset your-org/your-dataset --output-repo your-org/your-model --compute a100-largeGate a checkpoint (carl-studio[training]):
carl eval --adapter your-username/your-model ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────┐ ┌──────┐
│ Observe │ ──> │ Measure │ ──> │ Train │ ──> │ Gate │ ──> │ Ship │
│ │ │ Phi │ │ CARL │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘ └──────┘ └──────┘
point at entropy + task rewards cascade push to
any run order param + coherence auto-fires hub
Observe — Point CARL at a Trackio dashboard or log file. Instantly see Phi trajectory, entropy, phase state, health.
Measure — Phi = 1 - H(P)/log|V|. Zero means maximum uncertainty. One means complete coherence. Computed per token, every step.
Train — Five reward functions in a cascade. Task rewards teach what. CARL rewards teach how coherently.
Gate — The cascade auto-calibrates from the training signal. No hardcoded thresholds. CARL activates only when the model demonstrates sustained capability.
Ship — Eval gate passes → checkpoint pushed to Hub.
| Workflow | Command | Install |
|---|---|---|
| One-shot observe | carl observe --url ... --run ... |
pip install carl-studio |
| Live observe | carl observe --live ... |
pip install 'carl-studio[tui]' |
| Claude diagnosis | carl observe --diagnose ... |
pip install 'carl-studio[observe]' |
| Local train/eval | carl train, carl eval |
pip install 'carl-studio[training]' |
| HF job management / publish | carl run status, carl run logs, carl run stop, carl push |
pip install 'carl-studio[hf]' |
| Camp account + marketplace | carl camp account, carl camp login, carl camp logout, carl camp credits, carl camp marketplace |
platform features (optional) |
| Privacy consent | carl camp consent show, carl camp consent update |
included |
| x402 payment rail | carl camp x402 configure, carl camp x402 status |
included |
| Contract witnessing | carl camp contract sign, carl camp contract verify |
included |
| Constitutional ledger | carl contract constitution genesis|verify|evaluate|status |
pip install 'carl-studio[constitutional]' |
| Carlito management | carl carlito list, carl carlito spawn, carl carlito show |
included |
Managed tiers build on top of these open workflows; extras control local capabilities, not research access.
Provider credentials unlock provider workflows, not CARL Paid platform access. Use carl camp account to inspect managed account state, credits, and enabled wallet/x402 capabilities. Privacy consent is managed locally with carl camp consent — all flags default off.
| Workflow | Auth |
|---|---|
| Local file observe | none |
| Public Trackio observe | none |
| Claude diagnosis / chat | ANTHROPIC_API_KEY or --api-key |
| Hub jobs / push / gated model access | HF_TOKEN or prior HF login |
| RunPod backend | RUNPOD_API_KEY |
Trained with CARL on OmniCoder-9B:
| Metric | Value |
|---|---|
| Task completion | 92% |
| Tool format compliance | 99% |
| Mean tool calls per task | 11.09 |
| Phase 2' eval gate | PASS |
80 GRPO steps. Five reward functions. Self-calibrating cascade gate.
Unified entry-point router + sessions + trust + journey coverage matrix.
- One
carlbinary, four entry modes.carl(REPL),carl "<prompt>"(REPL with first turn),carl -p "<q>"(one-shot, trust-bypass),carl <verb>(Typer dispatch). Router atsrc/carl_studio/cli/entry.py; contract docs atdocs/v18_journey_coverage.md. carl trust— bare-entry trust pre-check.trust status/acknowledge/enable/disable/resetwith prior-root eviction notice; persisted at~/.carl/trust.yaml.carl session list/show/delete— project-aware. Walks up viaproject_context.currentso you can invoke from any subdir of a project.carl init --jsonprobe-only fast-path. Seven stable probe keys (first_run_complete,camp_session,llm_provider_detected,training_extras_healthy,project_config_present,consent_set,context_present). No prompts on piped stdin; contract locked bytests/journeys/test_journeys_v18.py.- Journey matrix. 12 journeys × 4 transitions = 48 transitions, covered by
172 passing tests (164 pre-existing + 8 new journey tests). Batch spec for
parallel UAT execution at
tests/journeys/BATCHES.md.
EML symbolic witness — third realizability primitive alongside BITC and DMC.
- New reward option:
reward_class="eml". Depth-3 learnable tree, 7 parameters, +0.972 correlation with PhaseAdaptive — a nearly-indistinguishable signal at ~10x parameter efficiency. Benchmarks inscripts/eml_reward_benchmark.md. - Resonants — a new entity class.
carl_core.resonant.Resonant+compose_resonantsenables typed, depth-bounded (MAX_DEPTH=4) composition of reward / policy primitives without ad-hoc schema drift. - Constitutional ledger. New subcommand
carl contract constitution(genesis | verify | evaluate | status) — hash-chained append-only ledger over action features (25-dim encoding). Install via:
pip install 'carl-studio[constitutional]' # pulls pynacl>=1.5- Public EML paper — see the upstream Observable Computation bundle for
eml-symbolic-witness.md(numerical verification: ln identity max absolute error 4.44e-16 over 990 sample points onx ∈ [0.1, 10)at 0.01 step).
The math is published and independently reproducible. CARL ships a
four-paper in-repo series under paper/ and cites the
upstream Zenodo work for the conservation law and identity proof.
CARL Methods Series (in-repo, drafts):
paper/01-main-carl.md— Coherence-Aware Reinforcement Learning (main paper)paper/02-phase-adaptive-methods.md— Phase-Adaptive Coherence Rewardspaper/03-coherence-trap-technical-note.md— The Coherence Trap (technical note)paper/04-interaction-chains-witness-logs.md— Interaction Chains as Witness Logs
Index and cross-reference table: docs/paper_series.md.
Upstream foundations (Zenodo):
- Bounded Informational Time Crystals — derives the conservation law
- Material Reality — validates across 6,244 trials
- Semantic Realizability — formal proof
Architecture, API, CLI commands, environments, compute backends → docs/reference.md
Credential setup and provider auth → docs/auth.md
Full history lives in CHANGELOG.md; the most recent entries:
- Unified router (
cli/entry.py) picks between REPL / bare-prompt / one-shot (-p) / subcommand. carl trust— bare-entry trust pre-check registry at~/.carl/trust.yaml.carl session— project-aware, walks up viaproject_context.current.carl init --json— probe-only fast-path with 7 stable keys; never prompts on piped stdin.- Journey matrix + BATCHES spec at
tests/journeys/; 172 tests green on v0.18 surface. - Fixture discipline: HOME-pinned tests place the project at
tmp_path/"proj"(home-guard invariant).
- x402 spend caps (daily + session) +
confirm_paymenthook. - MCP per-request session state —
_sessionglobal replaced withMCPServerConnection.session; FastMCPContextDI on authenticated tools. carl metrics serve— Prometheus text-format scrape endpoint (metricsextra); heartbeat auto-hosts whenCARL_METRICS_PORTis set.carl run diff <a> <b>— trajectory delta (phi, q_hat, crystallizations) with optional--stepsalignment.- Shared
GatingPredicateProtocol +carl.gate.*error namespace acrossconsent_gateandtier_gate. - Heartbeat maintenance wrapped in
RetryPolicy(max_attempts=3)for transient sqlite/IO. CARL_HOMEenv now honored uniformly (db.py, settings.py, wallet_store.py, llm.py).
terminals.tech · PyPI · Paper · Docs
MIT — Intuition Labs LLC
