Observer User Guide

This guide is for people who need to set up Observer, understand how the pieces fit together, and run real verification flows without reverse-engineering the repository.

It is not a design spec.

If you want platform rationale, read OBSERVER.md. If you want normative detail, read the files under specs. If you want to get a project working, stay here.

If you want the gentlest possible on-ramp first, open QUICKSTART.html. It asks two practical questions and gives you a copy-paste starter command.

If You Are New, Start Here

If Observer currently feels confusing, use this stripped-down mental model first.

Observer is not mainly asking you to "run a script".

Observer wants you to make the verification surface explicit.

That usually means:

expose real targets through a provider
derive inventory from those targets
write a suite against that inventory
run the suite and keep the report

If you only need one short sentence, use this one:

Observer turns a verification setup into an explicit contract pipeline.

If you want Observer to lay down a working starting tree for you, run one of these first:

observer new shell-proxy demo --proxy python --subject-family rust
observer new shell-proxy demo --proxy javascript --subject-family rust
observer new shell-proxy demo --proxy python --subject-family c --strategy single-test
observer new shell-proxy demo --proxy python --subject-family rust --strategy staged-product

The default scaffold is multi-test, because that is the preferred everyday pattern.

That command shape matters. It separates two concerns explicitly:

the proxy/orchestration language such as Python
the subject family under test such as Rust, C, or Python

Today the built-in shell proxies are:

python
javascript

Today the built-in subject families are:

rust
c
python
generic

Scaffold Model

The scaffold command is trying to model the real adoption shape directly.

It is not asking only one question.

It is asking two questions:

what language is acting as the shell-orchestration proxy
what subject family is actually being verified

That distinction matters because these are usually not the same thing.

In many real systems, including the GLC-style shape you called out, the proxy language exists because it is better at orchestration, process control, JSON handling, or filesystem inspection than plain shell.

That does not mean the proxy language is the semantic thing being certified.

Examples:

--proxy python --subject-family rust: Python orchestrates checks against a Rust product surface
--proxy javascript --subject-family c: JavaScript orchestrates checks against a C product surface
--proxy python --subject-family python: Python is still just the proxy layer unless the target identities are actually about externally visible Python product behavior
--proxy javascript --subject-family generic: JavaScript orchestrates checks against a broader CLI, protocol, workflow, or artifact surface

If you want one sentence for the model, use this one:

Proxy language is how the checks are expressed. Subject family is what those checks are about.

Why This Is Better Than `new python`

observer new python sounds like Observer thinks the project is mainly about Python.

That is too weak for the real pattern.

The stronger model is:

external shell-proxy pattern
explicit proxy backend
explicit subject family

That is why the primary command is now observer new shell-proxy ....

The hidden observer new python ... alias still exists for compatibility, but it is not the teaching surface anymore.

Scaffold Command Reference

The main command is:

observer new shell-proxy <path> --proxy <python|javascript> --subject-family <rust|c|python|generic>

Useful flags:

--strategy single-test|multi-test|staged-product
--proxy python|javascript
--subject-family rust|c|python|generic
--provider <id>
--name <display-name>
--force
--dry-run

Defaults:

strategy defaults to multi-test
provider id defaults to the proxy name
subject family defaults to rust

What Each Strategy Means

single-test means:

one externally meaningful proof unit
only use it when there is truly one contract worth naming

multi-test means:

several granular proof units over one product surface
this is the normal default because it gives better inventory and better failure localization

staged-product means:

several verification concerns that should certify together
the generated tree uses a GLC-shaped teaching decomposition: unit, golden, dict

What The Proxy Changes

Changing --proxy changes the orchestration surface and the generated provider host implementation.

Today that means:

python: generated provider hosts and helper code are Python
javascript: generated provider hosts and helper code are JavaScript

The intent is not to make Python and JavaScript look like the subject under test.

The intent is to let teams choose the orchestration surface that best handles process glue, JSON, files, and small control logic.

What The Subject Family Changes

Changing --subject-family changes the teaching language around the generated tree.

Today it affects:

generated README guidance
subject-family wording in the scaffolded project
how the scaffold tells you to interpret the toy implementation

Current subject families mean:

rust: think crate, binary, or generated artifact surface
c: think library, binary, or generated artifact surface
python: think module, CLI, or generated artifact surface
generic: think product, protocol, workflow, or artifact surface

Today most subject families still mainly adjust wording.

The main exception is the Rust multi-test starter, which now generates a worked Cargo proof starter with native Rust tests plus an external shell-proxy Observer stage.

That is intentional. The command shape stays stable while subject-specialized starter content grows incrementally.

Scaffold Matrix

You can think of the built-in scaffolds as a matrix.

Rows are proxy backends.

Columns are strategies.

Subject family is an interpretation layer over that matrix.

proxy backend      single-test     multi-test      staged-product
python             yes             yes             yes
javascript         yes             yes             yes

And subject family is currently:

subject-family     affects generated wording and usage guidance
rust               yes
c                  yes
python             yes
generic            yes

What Gets Generated

All shell-proxy scaffolds generate the same broad artifact story:

provider host source
observer config
suite files
Makefile shortcuts
a vendored Observer host SDK for the chosen proxy language

single-test and multi-test generate a suite-first tree.

staged-product generates a product-first tree with:

unit
golden
dict
product.json

The toy code is intentionally replaceable.

The important part is the contract shape, target naming, and stage separation.

How To Choose Quickly

Use this shortcut.

Choose --proxy python when:

your team already uses Python for glue code
you want a small orchestration surface with direct filesystem and JSON handling

Choose --proxy javascript when:

your team already uses Node-based tooling
you want orchestration close to existing JS toolchains

Choose --subject-family rust when:

the real thing under test is mostly a Rust crate, binary, or generated Rust-oriented artifact surface

Choose --subject-family c when:

the real thing under test is mostly a C library, binary, or generated low-level artifact surface

Choose --subject-family python when:

the real thing under test is a Python product surface, while still keeping target names about externally visible behavior rather than about the proxy wrapper itself

Choose --subject-family generic when:

the real thing under test is broader than one implementation language, such as a CLI, wire protocol, package surface, or workflow artifact graph

Example Recipes

Start with a Python shell proxy over a Rust product surface:

observer new shell-proxy demo --proxy python --subject-family rust
cd demo
make report

Start with a JavaScript shell proxy over a C product surface:

observer new shell-proxy demo --proxy javascript --subject-family c
cd demo
make report

Start with the staged product teaching shape:

observer new shell-proxy demo --proxy python --subject-family rust --strategy staged-product
cd demo
make certify

Inspect the file plan without writing anything:

observer new shell-proxy demo --proxy javascript --subject-family generic --dry-run

The Three Rules To Remember

If you forget everything else, remember these three rules.

Rule 1: The target should name the real thing being verified

Good:

compiler/emits-canonical-json
cli/help-shows-subcommands
package/wheel-has-license

Bad:

python-wrapper-ran
run-all-checks
integration-script

Rule 2: Inventory is a contract, not a cache

tests.inv is not a throwaway implementation detail.

It is the explicit list of targets Observer believes exist.

If the target set changes, that is a real verification change.

Rule 3: A report is evidence, not just console exhaust

When Observer emits JSONL, that output is not noise.

It is the machine-readable proof that later commands use for:

cubes
compares
views
product certification rollup

Which Path Should I Choose?

Use this decision guide.

If you want Observer to generate the starting tree for one of these paths, use observer new shell-proxy <path> --proxy python --subject-family <name> --strategy <name>.

Choose `single-test-strategy` when

there is only one externally meaningful proof unit
the target still names a real product contract
splitting further would be fake granularity

Choose `multi-test-strategy` when

one system exposes several distinct behaviors
you want failure localization
you want better compare and analytics artifacts
you want inventory to describe the real surface cleanly

Choose `staged-product` when

several verification areas must pass together
one area is unit-like, another is corpus-like, another is consistency-like
release health is a product question, not one suite question

The Fastest Useful Reading Order

If you want to get to a working mental model quickly, read in this order:

this guide through the proxy-language sections
../examples/python-proxy-pattern/single-test-strategy/README.md
../examples/python-proxy-pattern/multi-test-strategy/README.md
../examples/python-proxy-pattern/staged-product/README.md
one runnable starter such as ../lib/python/starter/README.md or ../lib/rust/starter/README.md

What Observer Is

Observer is a deterministic verification platform built around explicit contracts and derived artifacts.

The core flow is:

a provider exposes tests or workflow targets
Observer derives canonical inventory from that provider
a suite selects from that inventory and declares expectations
Observer runs the suite and emits a structured report
optional derived artifacts such as cubes, compares, compare indexes, and HTML views are produced from that report
optional product certification combines multiple stages into one product verdict

If you remember nothing else, remember this pipeline:

provider -> inventory -> suite -> report -> cube/compare/view -> product

The rest of the tool is built around that shape.

When To Use Which Surface

Observer has a few distinct working layers.

Provider: how a language or workflow exposes executable targets to Observer.
Inventory: the canonical list of runnable targets.
Suite: the expectations you want enforced.
Report: the machine-readable record of one execution.
Analytics: derived artifacts such as cubes, compares, compare indexes, and HTML views.
Product: an ordered, multi-stage certification contract above suites.

Use them like this:

If you are onboarding a new language integration, start with a provider and inventory.
If you already have inventory, write a suite and run it.
If you need artifact history or build-to-build comparison, derive cubes and compares.
If release health depends on multiple verification areas, define a product and use certify.

Primary Pattern: Proxy-Language Verification

One usage pattern deserves to be made explicit because it is likely to be the main way many teams adopt Observer.

The language you use to author tests is often not the real subject under test.

That language is frequently just the most convenient control surface for expressing verification against something else.

Examples:

Python tests that verify a CLI's behavior
Python tests that verify generated files or package outputs
Python tests that verify a service contract or protocol exchange
Python tests that verify compiler output or workflow results

In that model:

Python is the authoring surface
Observer is the verification platform
the real subject is the product behavior being checked

This is the preferred pattern.

The scaffold command above is simply the productized version of this pattern.

It makes the two layers explicit at generation time instead of making the reader infer them later.

Beginner Translation Of The Pattern

If the phrase "proxy-language verification" sounds abstract, translate it into ordinary language like this:

Python is the pen
the product behavior is the thing you are writing about
Observer is the notebook that keeps the record straight

Or even more simply:

Python is how you say the test
the product behavior is what the test means

That is the whole idea.

You are not using Python because you want to prove "Python worked".

You are using Python because it is a convenient way to express checks against something else.

Anti-Pattern: Wrapper Script As The Test Subject

The wrong shape looks like this:

write one Python script that orchestrates a lot of work
register that script as one Observer test target
treat exit 0 from that script as the proof that verification succeeded

Why this is weak:

Observer only sees the wrapper, not the underlying verification units
failure localization is poor
target identity becomes vague or meaningless
inventory becomes coarse and unhelpful
analytics and compare artifacts lose useful granularity
product certification ends up composed from blobs instead of explicit proofs

You still get execution, but you do not get a strong verification model.

Side-By-Side: Wrong Versus Right

Wrong:

Observer target -> run_release_checks.py -> many hidden checks -> one exit code

Right:

Observer target -> cli/help-shows-subcommands
Observer target -> package/wheel-has-license
Observer target -> compiler/rejects-bad-input

In the wrong shape, the real verification surface is hidden inside one program.

In the right shape, Observer can see the actual proof units.

Preferred Pattern: Granular Proxy Tests

The better shape is:

use a host language such as Python to author granular tests
let each test correspond to one real behavior of the underlying system
expose those tests through the Observer provider boundary
derive inventory from those granular targets
run suites and product stages against those explicit targets

That gives Observer meaningful units such as:

cli/help-shows-subcommands
compiler/emits-canonical-json
package/wheel-contains-license
api/rejects-missing-token

Those are much better verification targets than something like:

python-wrapper-ran

The practical test is simple:

Ask this question for every target name:

If this target fails, will the name tell me what product contract regressed?

If the answer is no, the target is probably too coarse.

How This Maps To GLC-Shaped Work

The GLC-shaped lesson is not “pick Python”.

The lesson is:

keep the orchestration layer separate from the semantic subject
decompose the verification surface into maintained proof units and proof stages
certify the product from explicit stage contracts rather than from one orchestration blob

That is exactly why observer new shell-proxy is modeled the way it is.

The command is trying to preserve that structure at the moment the tree is created.

What To Optimize For

When you use a proxy language, optimize for these properties:

target identity should name the real thing being verified
each target should represent one meaningful proof unit
assertions should be about product behavior, not wrapper-script survival
provider output should expose a useful target set, not one orchestration blob
reports and compares should tell you what changed in the product surface

Naming Targets So Humans Can Understand Them

Good target names are one of the biggest usability wins in Observer.

Use names that answer this question:

What real thing did we just verify?

Prefer names like:

compiler/emits-canonical-json
compiler/rejects-malformed-input
package/writes-license-metadata
api/rejects-missing-token
cli/version-reports-build-stamp

Avoid names like:

wrapper
smoke
integration
python-script
run-all

Those names tell you almost nothing once a report, compare, or product stage fails.

Good Example

Good:

Python test: package/metadata-has-license
Python test: package/wheel-imports-cleanly
Python test: cli/version-reports-build-stamp

Each test uses Python as a scripting medium, but each target refers to a real product contract.

Bad Example

Bad:

Python script: run_release_checks.py
one Observer target runs it
exit code is treated as the only meaningful signal

That shape hides the real verification surface inside the wrapper.

Why This Pattern Matters To Observer

Observer becomes much more valuable when it sees the real verification topology.

Granular proxy-language tests improve:

inventory quality
failure localization
report usefulness
analytics fidelity
compare clarity
product certification composition

This is especially important for Python, shell, and other scripting-friendly integrations. Those languages should usually be treated as verification media, not as the semantic subject of the test unless Python itself is what you are actually trying to verify.

For copy-pasteable examples of this pattern, see ../examples/python-proxy-pattern.

The Easiest Practical Starting Point

If you want the least confusing way to begin, do this:

start with ../examples/python-proxy-pattern/multi-test-strategy/README.md
run make list
run make inventory
inspect tests.inv
run make report
read .observer/report.default.jsonl

That path is short, granular, and close to the real usage model this guide is recommending.

Prerequisites

Observer itself is a Rust workspace and the main binary is published as frogfish-observer but installs the executable observer.

Typical prerequisites are:

Rust and Cargo for the Observer CLI itself
whatever toolchain your provider host needs
a POSIX shell for the repo-owned starter Makefiles

Install the CLI from crates.io:

cargo install frogfish-observer

Or run it from the repository root during development:

cargo run -q -p frogfish-observer -- --help

The Minimal Working Set

For a basic project, you usually need four files:

observer.toml
tests.inv
tests.obs
.observer/

What they mean:

observer.toml: provider configuration
tests.inv: canonical inventory derived from a provider
tests.obs: suite expectations
.observer/: generated reports, hashes, and local derived artifacts

Only observer.toml and tests.obs are normally authored by hand.

tests.inv is usually generated.

.observer/ should generally be treated as working output, not as hand-edited source.

Recommended Repository Layout

This is a good default shape for one provider-backed verification area:

your-project/
  observer.toml
  tests.obs
  tests.inv
  .observer/
  build/
  src/

If your project has several verification areas, keep each one local to the thing it verifies:

your-project/
  unit/
    observer.toml
    tests.inv
    tests.obs
    .observer/
  workflow/
    tests.obs
    .observer/
  product.json

That second shape is what product certification is for: each stage stays local, and product.json ties them together.

First Real Setup: Rust Starter

The quickest way to understand Observer is to run a starter that already works.

Use the runnable Rust starter in this repository:

lib/rust/starter/

Its important files are:

lib/rust/starter/
  Cargo.toml
  Makefile
  observer.toml
  tests.inv
  tests.obs
  src/
  expected.default.jsonl
  expected.inventory.sha256
  expected.suite.sha256

What each file does:

Cargo.toml: builds the Rust provider host
Makefile: wraps the common Observer flows
observer.toml: tells Observer how to invoke the provider host
tests.inv: canonical inventory for the provider targets
tests.obs: the expectations to enforce
expected.*: checked-in verification artifacts used by the starter's make verify

Step 1: Build the provider host

cd lib/rust/starter
make build

This builds the provider binary that Observer will call for list and run operations.

Step 2: Inspect raw provider discovery

make list

This writes the raw provider output to .observer/provider-list.json.

Use this when you need to answer the question: "is the provider itself exposing the targets I think it is?"

Step 3: Derive canonical inventory

make inventory
cat tests.inv

This is the first major Observer contract.

Inventory is the explicit execution surface. Once inventory exists, suite execution no longer depends on fuzzy runtime discovery.

Step 4: Inspect provider configuration

The starter's observer.toml looks like this:

version = "0"

[providers.rust]
command = "./build/target/debug/ledger-observer-host"
cwd = "."
inherit_env = false

Important fields:

command: the provider host executable
cwd: working directory for that provider
inherit_env = false: makes the host less dependent on ambient machine state

If the provider cannot be found or behaves differently from machine to machine, check this file first.

Step 5: Run the suite with human output

make run

This runs:

observer run --inventory tests.inv --suite tests.obs --config observer.toml --surface simple --ui rich --report none --color never --show-output all

Use this mode when you are working interactively and want readable operator feedback.

Step 6: Emit a machine-readable report

make report

This writes:

.observer/report.default.jsonl

That JSONL report is what later commands consume.

Step 7: Verify hashes and golden report artifacts

make verify

This checks:

inventory hash
suite hash
report JSONL

The point is not only that the run passes. The point is that the contracts and generated evidence are stable.

Python As A Proxy Language

Python is a particularly important example of the pattern above because many teams reach for it first.

The preferred Python model is not:

write one Python wrapper script
let Observer run it
call the job done if the script exits zero

The preferred Python model is:

write Python tests with the Observer Python integration
use those tests to verify real product behaviors at granular scope
let Python act as the scripting proxy for the thing you actually care about

If you want a one-line rule for teams:

Do not ask Observer to verify that a Python wrapper script ran. Ask Observer to verify the real behaviors that the Python tests are checking.

That means a Python-based provider is often best when the real subject is:

a binary interface
a package or install surface
a network interaction
a generated artifact
a workflow with precise observable checkpoints

Python is just the control language. The product contract is still the center.

Python Team Checklist

If you are reviewing a Python-based Observer setup, check these five things.

Are target names about product behavior rather than about the wrapper script?
Does the provider expose several meaningful targets instead of one orchestration target?
Does tests.inv look like a useful public execution contract?
Would a failed report line tell an operator what actually regressed?
Could the same targets participate cleanly in a later product certification stage?

If most answers are no, the setup is probably still in the wrapper-script anti-pattern.

Python Pattern In One Sentence Each

Single-test strategy:

use when there is one real proof unit and its name is still meaningful

Multi-test strategy:

use when one product surface can be decomposed into several real proofs

Staged product strategy:

use when several verification areas must pass together as one product verdict

See the concrete examples here:

For the Python integration and runnable examples, see ../lib/python/README.md and ../lib/python/HOWTO.md.

Understanding The Main Files

Before going file by file, here is the plain-English version.

observer.toml tells Observer how to find the provider
tests.inv tells Observer what targets exist
tests.obs tells Observer what should be true about those targets
.observer/ stores the evidence generated by the run

That is the basic working set.

`observer.toml`

This defines providers.

Use it to answer:

which providers exist
how to invoke them
which working directory they run in
whether environment inheritance is allowed

`tests.inv`

This is canonical inventory. It is the explicit list of runnable targets.

In normal workflows, generate it with:

observer derive-inventory --config observer.toml --provider rust > tests.inv

If inventory changes unexpectedly, treat that as a meaningful contract change rather than just build noise.

`tests.obs`

This is the suite.

The Rust starter uses the simple suite surface:

test prefix: "ledger/" timeoutMs: 1000: expect exit = 0.

test "ledger/rejects-overdraft" timeoutMs: 1000: [
	expect exit = 0.
	expect out contains "denied overdraft".
].

Use the simple surface when you mainly need expectation-based test verification.

Use the full surface when you need richer workflow logic, branching, extraction, publication, or more complex verification flows.

`.observer/`

This is where local generated artifacts usually go.

Common contents include:

report JSONL
current hashes
provider discovery output
generated HTML views
derived analytics

Keep it local to the verification area you are working in.

The Commands You Will Actually Use

If you feel lost, treat these as the normal six:

derive-inventory
hash-inventory
hash-suite
run
cube
view

Everything else is either deeper validation, product-level composition, or operator convenience.

`derive-inventory`

Use when you need to convert provider output into canonical inventory.

observer derive-inventory --config observer.toml --provider rust > tests.inv

`hash-inventory`

Use when inventory should be treated as a stable contract.

observer hash-inventory --inventory tests.inv

`hash-suite`

Use when the suite itself is part of the contract you want to pin.

observer hash-suite --suite tests.obs --surface simple

`run`

Use for normal suite execution.

Interactive operator mode:

observer run --inventory tests.inv --suite tests.obs --config observer.toml --surface simple --ui rich --report none

Machine-readable mode:

observer run --inventory tests.inv --suite tests.obs --config observer.toml --surface simple --ui off --report jsonl > .observer/report.default.jsonl

`doctor`

Use before a run when you suspect setup problems.

Examples:

observer doctor --inventory tests.inv --suite tests.obs --surface simple
observer doctor --config observer.toml --provider rust

doctor is the command to reach for when you are not sure whether the problem is in the provider, config, inventory, or suite wiring.

`cube`

Use when one report should become a derived build artifact.

observer cube --report .observer/report.default.jsonl --out .observer/build.cube.json

`compare`

Use when you want one build compared against another.

observer compare --cube build-a.cube.json --cube build-b.cube.json --out compare.json

`view`

Use when you need a self-contained HTML artifact for local inspection or sharing.

observer view --cube .observer/build.cube.json --out .observer/build.html

Product Certification

Use product certification when one product is only healthy if several verification stages pass together.

You can author product inputs either as canonical JSON or as TOML that lowers mechanically into the same canonical product model.

The example product file in this repository looks like this:

{
  "k": "observer_product",
  "v": "0",
  "product_id": "demo",
  "product_label": "Demo Product",
  "certification_rule": "all_pass",
  "stages": [
    {
      "stage_id": "unit",
      "runner": {
        "k": "observer_suite",
        "cwd": "unit",
        "suite": "tests.obs",
        "inventory": "tests.inv",
        "surface": "simple",
        "mode": "default"
      }
    },
    {
      "stage_id": "workflow",
      "runner": {
        "k": "observer_suite",
        "cwd": "workflow",
        "suite": "tests.obs",
        "surface": "full",
        "mode": "default"
      }
    }
  ]
}

The important design rule is that each stage runs from its own declared working directory.

That lets a product pull together verification areas that would otherwise remain scattered shell glue.

Product stages can also import a child product as one explicit proof stage through the observer_product runner. In TOML authoring, that shape is naturally expressed with a [subproduct.<id>] stanza.

Run product certification with:

observer certify --product product.json --ui off --report jsonl > .observer/product.default.jsonl

Then derive analytics from the product report:

observer cube-product --report .observer/product.default.jsonl --root . --out .observer/analytics-product

CMake Model Certification

Observer also has a first product slice for CMake-constructed products.

Use it when CMake already defines construction truth and Observer should certify that surface.

Current shape:

configure and build with CMake so the File API reply exists
lower the CMake model
hash the lowered model if needed
certify the product stage or derive analytics from the resulting report

Core commands:

observer lower-cmake-model --build out/build/debug --out .observer/cmake-model.json
observer hash-cmake-model --model .observer/cmake-model.json

The repo-owned example is under:

tests/cmake-model/observer/

If you are trying to understand the current CMake slice, start there instead of from the spec.

Recipes

If you want the easiest beginner path, start with the first two recipes only.

They cover most first-time adoption problems.

Recipe: start from zero with a provider-backed project

create observer.toml with one provider definition
make sure the provider host can answer list and run
derive tests.inv
write tests.obs
run observer doctor
run the suite interactively
emit a JSONL report
add hash checks or golden report checks if the flow is meant to stay stable

Short version:

provider -> inventory -> suite -> report

Recipe: figure out why nothing is running

run the provider host directly
run observer derive-inventory
inspect tests.inv
run observer doctor
verify the suite actually selects targets present in inventory

In most cases, the issue is one of:

provider host path is wrong
provider host cwd is wrong
provider emits a different target than the suite expects
inventory was not regenerated after a provider change

This is the first debugging loop to memorize.

Recipe: produce a shareable artifact from one run

emit report JSONL
derive a cube
render an HTML view

Commands:

observer run --inventory tests.inv --suite tests.obs --config observer.toml --surface simple --ui off --report jsonl > .observer/report.default.jsonl
observer cube --report .observer/report.default.jsonl --out .observer/build.cube.json
observer view --cube .observer/build.cube.json --out .observer/build.html

Recipe: compare two builds

derive one cube per build
compare those cubes
render the compare HTML

Commands:

observer compare --cube build-a.cube.json --cube build-b.cube.json --out compare.json
observer view --compare compare.json --out compare.html

How To Think About Output Modes

Observer intentionally separates human output from machine output.

Human UI goes to stderr when practical.
Machine-readable artifacts go to stdout or explicit files.

That means a command like this is normal:

observer run --inventory tests.inv --suite tests.obs --report jsonl > report.jsonl

You still see human progress, but stdout remains clean enough to capture the structured report.

If you want only the machine artifact, use --ui off.

Troubleshooting

If you are stuck, work from left to right through the pipeline:

provider
inventory
suite
report
derived artifacts

Do not jump straight to the product layer if the provider and inventory are not already trustworthy.

`observer` says it cannot find a provider

Check:

observer.toml path to the provider binary
cwd for that provider
whether the binary was built at all

The suite does not match any targets

Check:

whether tests.inv was regenerated after provider changes
whether suite target names actually match inventory entries
whether the suite surface is correct (simple versus full)

The run works locally but not in CI

Check:

whether provider config relies on inherited environment
whether generated paths differ by machine
whether you are treating derived artifacts as canonical when they are actually volatile

I am not sure whether the problem is config, provider, inventory, or suite

Run:

observer doctor --config observer.toml --inventory tests.inv --suite tests.obs --surface simple

Then work backwards from the first concrete finding.

Where To Go Next

If you are teaching a team, this is the simplest recommendation to give them:

read the top of this guide through the Python sections
copy ../examples/python-proxy-pattern/multi-test-strategy
rename the targets so they describe your real product behaviors
only add product stages after the target surface is already clean

That sequence avoids the most common failure mode: building a large wrapper-script blob and only later discovering that Observer cannot see the verification surface clearly.

If you want a working example, start with ../lib/rust/starter/README.md.
If you want product certification examples, see ../examples/product-certify.
If you want the architecture and rationale, read ../OBSERVER.md.
If you want normative detail, use the ../specs directory as reference material.

The important habit is to treat Observer as a contract pipeline, not as a magical test launcher. Once you do that, the folder structure and command flow become much easier to reason about.

FilesExpand file tree

USER_GUIDE.md

Latest commit

History

USER_GUIDE.md

File metadata and controls

Observer User Guide

If You Are New, Start Here

Scaffold Model

Why This Is Better Than new python

Scaffold Command Reference

What Each Strategy Means

What The Proxy Changes

What The Subject Family Changes

Scaffold Matrix

What Gets Generated

How To Choose Quickly

Example Recipes

The Three Rules To Remember

Rule 1: The target should name the real thing being verified

Rule 2: Inventory is a contract, not a cache

Rule 3: A report is evidence, not just console exhaust

Which Path Should I Choose?

Choose single-test-strategy when

Choose multi-test-strategy when

Choose staged-product when

The Fastest Useful Reading Order

What Observer Is

When To Use Which Surface

Primary Pattern: Proxy-Language Verification

Beginner Translation Of The Pattern

Anti-Pattern: Wrapper Script As The Test Subject

Side-By-Side: Wrong Versus Right

Preferred Pattern: Granular Proxy Tests

How This Maps To GLC-Shaped Work

What To Optimize For

Naming Targets So Humans Can Understand Them

Good Example

Bad Example

Why This Pattern Matters To Observer

The Easiest Practical Starting Point

Prerequisites

The Minimal Working Set

Recommended Repository Layout

First Real Setup: Rust Starter

Step 1: Build the provider host

Step 2: Inspect raw provider discovery

Step 3: Derive canonical inventory

Step 4: Inspect provider configuration

Step 5: Run the suite with human output

Step 6: Emit a machine-readable report

Step 7: Verify hashes and golden report artifacts

Python As A Proxy Language

Python Team Checklist

Python Pattern In One Sentence Each

Understanding The Main Files

observer.toml

tests.inv

tests.obs

.observer/

The Commands You Will Actually Use

derive-inventory

hash-inventory

hash-suite

run

doctor

cube

compare

view

Product Certification

CMake Model Certification

Recipes

Recipe: start from zero with a provider-backed project

Recipe: figure out why nothing is running

Recipe: produce a shareable artifact from one run

Recipe: compare two builds

How To Think About Output Modes

Troubleshooting

observer says it cannot find a provider

The suite does not match any targets

Why This Is Better Than `new python`

Choose `single-test-strategy` when

Choose `multi-test-strategy` when

Choose `staged-product` when

`observer.toml`

`tests.inv`

`tests.obs`

`.observer/`

`derive-inventory`

`hash-inventory`

`hash-suite`

`run`

`doctor`

`cube`

`compare`

`view`

`observer` says it cannot find a provider