This guide is for people who need to set up Observer, understand how the pieces fit together, and run real verification flows without reverse-engineering the repository.
It is not a design spec.
If you want platform rationale, read OBSERVER.md. If you want normative detail, read the files under specs. If you want to get a project working, stay here.
If you want the gentlest possible on-ramp first, open QUICKSTART.html. It asks two practical questions and gives you a copy-paste starter command.
If Observer currently feels confusing, use this stripped-down mental model first.
Observer is not mainly asking you to "run a script".
Observer wants you to make the verification surface explicit.
That usually means:
- expose real targets through a provider
- derive inventory from those targets
- write a suite against that inventory
- run the suite and keep the report
If you only need one short sentence, use this one:
Observer turns a verification setup into an explicit contract pipeline.
If you want Observer to lay down a working starting tree for you, run one of these first:
observer new shell-proxy demo --proxy python --subject-family rust
observer new shell-proxy demo --proxy javascript --subject-family rust
observer new shell-proxy demo --proxy python --subject-family c --strategy single-test
observer new shell-proxy demo --proxy python --subject-family rust --strategy staged-product
The default scaffold is multi-test, because that is the preferred everyday pattern.
That command shape matters. It separates two concerns explicitly:
- the proxy/orchestration language such as Python
- the subject family under test such as Rust, C, or Python
Today the built-in shell proxies are:
pythonjavascript
Today the built-in subject families are:
rustcpythongeneric
The scaffold command is trying to model the real adoption shape directly.
It is not asking only one question.
It is asking two questions:
- what language is acting as the shell-orchestration proxy
- what subject family is actually being verified
That distinction matters because these are usually not the same thing.
In many real systems, including the GLC-style shape you called out, the proxy language exists because it is better at orchestration, process control, JSON handling, or filesystem inspection than plain shell.
That does not mean the proxy language is the semantic thing being certified.
Examples:
--proxy python --subject-family rust: Python orchestrates checks against a Rust product surface--proxy javascript --subject-family c: JavaScript orchestrates checks against a C product surface--proxy python --subject-family python: Python is still just the proxy layer unless the target identities are actually about externally visible Python product behavior--proxy javascript --subject-family generic: JavaScript orchestrates checks against a broader CLI, protocol, workflow, or artifact surface
If you want one sentence for the model, use this one:
Proxy language is how the checks are expressed. Subject family is what those checks are about.
observer new python sounds like Observer thinks the project is mainly about Python.
That is too weak for the real pattern.
The stronger model is:
- external shell-proxy pattern
- explicit proxy backend
- explicit subject family
That is why the primary command is now observer new shell-proxy ....
The hidden observer new python ... alias still exists for compatibility, but it is not the teaching surface anymore.
The main command is:
observer new shell-proxy <path> --proxy <python|javascript> --subject-family <rust|c|python|generic>
Useful flags:
--strategy single-test|multi-test|staged-product--proxy python|javascript--subject-family rust|c|python|generic--provider <id>--name <display-name>--force--dry-run
Defaults:
- strategy defaults to
multi-test - provider id defaults to the proxy name
- subject family defaults to
rust
single-test means:
- one externally meaningful proof unit
- only use it when there is truly one contract worth naming
multi-test means:
- several granular proof units over one product surface
- this is the normal default because it gives better inventory and better failure localization
staged-product means:
- several verification concerns that should certify together
- the generated tree uses a GLC-shaped teaching decomposition: unit, golden, dict
Changing --proxy changes the orchestration surface and the generated provider host implementation.
Today that means:
python: generated provider hosts and helper code are Pythonjavascript: generated provider hosts and helper code are JavaScript
The intent is not to make Python and JavaScript look like the subject under test.
The intent is to let teams choose the orchestration surface that best handles process glue, JSON, files, and small control logic.
Changing --subject-family changes the teaching language around the generated tree.
Today it affects:
- generated README guidance
- subject-family wording in the scaffolded project
- how the scaffold tells you to interpret the toy implementation
Current subject families mean:
rust: think crate, binary, or generated artifact surfacec: think library, binary, or generated artifact surfacepython: think module, CLI, or generated artifact surfacegeneric: think product, protocol, workflow, or artifact surface
Today most subject families still mainly adjust wording.
The main exception is the Rust multi-test starter, which now generates a worked Cargo proof starter with native Rust tests plus an external shell-proxy Observer stage.
That is intentional. The command shape stays stable while subject-specialized starter content grows incrementally.
You can think of the built-in scaffolds as a matrix.
Rows are proxy backends.
Columns are strategies.
Subject family is an interpretation layer over that matrix.
proxy backend single-test multi-test staged-product
python yes yes yes
javascript yes yes yes
And subject family is currently:
subject-family affects generated wording and usage guidance
rust yes
c yes
python yes
generic yes
All shell-proxy scaffolds generate the same broad artifact story:
- provider host source
- observer config
- suite files
- Makefile shortcuts
- a vendored Observer host SDK for the chosen proxy language
single-test and multi-test generate a suite-first tree.
staged-product generates a product-first tree with:
unitgoldendictproduct.json
The toy code is intentionally replaceable.
The important part is the contract shape, target naming, and stage separation.
Use this shortcut.
Choose --proxy python when:
- your team already uses Python for glue code
- you want a small orchestration surface with direct filesystem and JSON handling
Choose --proxy javascript when:
- your team already uses Node-based tooling
- you want orchestration close to existing JS toolchains
Choose --subject-family rust when:
- the real thing under test is mostly a Rust crate, binary, or generated Rust-oriented artifact surface
Choose --subject-family c when:
- the real thing under test is mostly a C library, binary, or generated low-level artifact surface
Choose --subject-family python when:
- the real thing under test is a Python product surface, while still keeping target names about externally visible behavior rather than about the proxy wrapper itself
Choose --subject-family generic when:
- the real thing under test is broader than one implementation language, such as a CLI, wire protocol, package surface, or workflow artifact graph
Start with a Python shell proxy over a Rust product surface:
observer new shell-proxy demo --proxy python --subject-family rust
cd demo
make report
Start with a JavaScript shell proxy over a C product surface:
observer new shell-proxy demo --proxy javascript --subject-family c
cd demo
make report
Start with the staged product teaching shape:
observer new shell-proxy demo --proxy python --subject-family rust --strategy staged-product
cd demo
make certify
Inspect the file plan without writing anything:
observer new shell-proxy demo --proxy javascript --subject-family generic --dry-run
If you forget everything else, remember these three rules.
Good:
compiler/emits-canonical-jsoncli/help-shows-subcommandspackage/wheel-has-license
Bad:
python-wrapper-ranrun-all-checksintegration-script
tests.inv is not a throwaway implementation detail.
It is the explicit list of targets Observer believes exist.
If the target set changes, that is a real verification change.
When Observer emits JSONL, that output is not noise.
It is the machine-readable proof that later commands use for:
- cubes
- compares
- views
- product certification rollup
Use this decision guide.
If you want Observer to generate the starting tree for one of these paths, use observer new shell-proxy <path> --proxy python --subject-family <name> --strategy <name>.
- there is only one externally meaningful proof unit
- the target still names a real product contract
- splitting further would be fake granularity
- one system exposes several distinct behaviors
- you want failure localization
- you want better compare and analytics artifacts
- you want inventory to describe the real surface cleanly
- several verification areas must pass together
- one area is unit-like, another is corpus-like, another is consistency-like
- release health is a product question, not one suite question
If you want to get to a working mental model quickly, read in this order:
- this guide through the proxy-language sections
- ../examples/python-proxy-pattern/single-test-strategy/README.md
- ../examples/python-proxy-pattern/multi-test-strategy/README.md
- ../examples/python-proxy-pattern/staged-product/README.md
- one runnable starter such as ../lib/python/starter/README.md or ../lib/rust/starter/README.md
Observer is a deterministic verification platform built around explicit contracts and derived artifacts.
The core flow is:
- a provider exposes tests or workflow targets
- Observer derives canonical inventory from that provider
- a suite selects from that inventory and declares expectations
- Observer runs the suite and emits a structured report
- optional derived artifacts such as cubes, compares, compare indexes, and HTML views are produced from that report
- optional product certification combines multiple stages into one product verdict
If you remember nothing else, remember this pipeline:
provider -> inventory -> suite -> report -> cube/compare/view -> product
The rest of the tool is built around that shape.
Observer has a few distinct working layers.
- Provider: how a language or workflow exposes executable targets to Observer.
- Inventory: the canonical list of runnable targets.
- Suite: the expectations you want enforced.
- Report: the machine-readable record of one execution.
- Analytics: derived artifacts such as cubes, compares, compare indexes, and HTML views.
- Product: an ordered, multi-stage certification contract above suites.
Use them like this:
- If you are onboarding a new language integration, start with a provider and inventory.
- If you already have inventory, write a suite and run it.
- If you need artifact history or build-to-build comparison, derive cubes and compares.
- If release health depends on multiple verification areas, define a product and use
certify.
One usage pattern deserves to be made explicit because it is likely to be the main way many teams adopt Observer.
The language you use to author tests is often not the real subject under test.
That language is frequently just the most convenient control surface for expressing verification against something else.
Examples:
- Python tests that verify a CLI's behavior
- Python tests that verify generated files or package outputs
- Python tests that verify a service contract or protocol exchange
- Python tests that verify compiler output or workflow results
In that model:
- Python is the authoring surface
- Observer is the verification platform
- the real subject is the product behavior being checked
This is the preferred pattern.
The scaffold command above is simply the productized version of this pattern.
It makes the two layers explicit at generation time instead of making the reader infer them later.
If the phrase "proxy-language verification" sounds abstract, translate it into ordinary language like this:
- Python is the pen
- the product behavior is the thing you are writing about
- Observer is the notebook that keeps the record straight
Or even more simply:
- Python is how you say the test
- the product behavior is what the test means
That is the whole idea.
You are not using Python because you want to prove "Python worked".
You are using Python because it is a convenient way to express checks against something else.
The wrong shape looks like this:
- write one Python script that orchestrates a lot of work
- register that script as one Observer test target
- treat
exit 0from that script as the proof that verification succeeded
Why this is weak:
- Observer only sees the wrapper, not the underlying verification units
- failure localization is poor
- target identity becomes vague or meaningless
- inventory becomes coarse and unhelpful
- analytics and compare artifacts lose useful granularity
- product certification ends up composed from blobs instead of explicit proofs
You still get execution, but you do not get a strong verification model.
Wrong:
Observer target -> run_release_checks.py -> many hidden checks -> one exit code
Right:
Observer target -> cli/help-shows-subcommands
Observer target -> package/wheel-has-license
Observer target -> compiler/rejects-bad-input
In the wrong shape, the real verification surface is hidden inside one program.
In the right shape, Observer can see the actual proof units.
The better shape is:
- use a host language such as Python to author granular tests
- let each test correspond to one real behavior of the underlying system
- expose those tests through the Observer provider boundary
- derive inventory from those granular targets
- run suites and product stages against those explicit targets
That gives Observer meaningful units such as:
cli/help-shows-subcommandscompiler/emits-canonical-jsonpackage/wheel-contains-licenseapi/rejects-missing-token
Those are much better verification targets than something like:
python-wrapper-ran
The practical test is simple:
Ask this question for every target name:
If this target fails, will the name tell me what product contract regressed?
If the answer is no, the target is probably too coarse.
The GLC-shaped lesson is not “pick Python”.
The lesson is:
- keep the orchestration layer separate from the semantic subject
- decompose the verification surface into maintained proof units and proof stages
- certify the product from explicit stage contracts rather than from one orchestration blob
That is exactly why observer new shell-proxy is modeled the way it is.
The command is trying to preserve that structure at the moment the tree is created.
When you use a proxy language, optimize for these properties:
- target identity should name the real thing being verified
- each target should represent one meaningful proof unit
- assertions should be about product behavior, not wrapper-script survival
- provider output should expose a useful target set, not one orchestration blob
- reports and compares should tell you what changed in the product surface
Good target names are one of the biggest usability wins in Observer.
Use names that answer this question:
What real thing did we just verify?
Prefer names like:
compiler/emits-canonical-jsoncompiler/rejects-malformed-inputpackage/writes-license-metadataapi/rejects-missing-tokencli/version-reports-build-stamp
Avoid names like:
wrappersmokeintegrationpython-scriptrun-all
Those names tell you almost nothing once a report, compare, or product stage fails.
Good:
- Python test:
package/metadata-has-license - Python test:
package/wheel-imports-cleanly - Python test:
cli/version-reports-build-stamp
Each test uses Python as a scripting medium, but each target refers to a real product contract.
Bad:
- Python script:
run_release_checks.py - one Observer target runs it
- exit code is treated as the only meaningful signal
That shape hides the real verification surface inside the wrapper.
Observer becomes much more valuable when it sees the real verification topology.
Granular proxy-language tests improve:
- inventory quality
- failure localization
- report usefulness
- analytics fidelity
- compare clarity
- product certification composition
This is especially important for Python, shell, and other scripting-friendly integrations. Those languages should usually be treated as verification media, not as the semantic subject of the test unless Python itself is what you are actually trying to verify.
For copy-pasteable examples of this pattern, see ../examples/python-proxy-pattern.
If you want the least confusing way to begin, do this:
- start with ../examples/python-proxy-pattern/multi-test-strategy/README.md
- run
make list - run
make inventory - inspect
tests.inv - run
make report - read
.observer/report.default.jsonl
That path is short, granular, and close to the real usage model this guide is recommending.
Observer itself is a Rust workspace and the main binary is published as frogfish-observer but installs the executable observer.
Typical prerequisites are:
- Rust and Cargo for the Observer CLI itself
- whatever toolchain your provider host needs
- a POSIX shell for the repo-owned starter Makefiles
Install the CLI from crates.io:
cargo install frogfish-observerOr run it from the repository root during development:
cargo run -q -p frogfish-observer -- --helpFor a basic project, you usually need four files:
observer.toml
tests.inv
tests.obs
.observer/
What they mean:
observer.toml: provider configurationtests.inv: canonical inventory derived from a providertests.obs: suite expectations.observer/: generated reports, hashes, and local derived artifacts
Only observer.toml and tests.obs are normally authored by hand.
tests.inv is usually generated.
.observer/ should generally be treated as working output, not as hand-edited source.
This is a good default shape for one provider-backed verification area:
your-project/
observer.toml
tests.obs
tests.inv
.observer/
build/
src/
If your project has several verification areas, keep each one local to the thing it verifies:
your-project/
unit/
observer.toml
tests.inv
tests.obs
.observer/
workflow/
tests.obs
.observer/
product.json
That second shape is what product certification is for: each stage stays local, and product.json ties them together.
The quickest way to understand Observer is to run a starter that already works.
Use the runnable Rust starter in this repository:
lib/rust/starter/
Its important files are:
lib/rust/starter/
Cargo.toml
Makefile
observer.toml
tests.inv
tests.obs
src/
expected.default.jsonl
expected.inventory.sha256
expected.suite.sha256
What each file does:
Cargo.toml: builds the Rust provider hostMakefile: wraps the common Observer flowsobserver.toml: tells Observer how to invoke the provider hosttests.inv: canonical inventory for the provider targetstests.obs: the expectations to enforceexpected.*: checked-in verification artifacts used by the starter'smake verify
cd lib/rust/starter
make buildThis builds the provider binary that Observer will call for list and run operations.
make listThis writes the raw provider output to .observer/provider-list.json.
Use this when you need to answer the question: "is the provider itself exposing the targets I think it is?"
make inventory
cat tests.invThis is the first major Observer contract.
Inventory is the explicit execution surface. Once inventory exists, suite execution no longer depends on fuzzy runtime discovery.
The starter's observer.toml looks like this:
version = "0"
[providers.rust]
command = "./build/target/debug/ledger-observer-host"
cwd = "."
inherit_env = falseImportant fields:
command: the provider host executablecwd: working directory for that providerinherit_env = false: makes the host less dependent on ambient machine state
If the provider cannot be found or behaves differently from machine to machine, check this file first.
make runThis runs:
observer run --inventory tests.inv --suite tests.obs --config observer.toml --surface simple --ui rich --report none --color never --show-output all
Use this mode when you are working interactively and want readable operator feedback.
make reportThis writes:
.observer/report.default.jsonl
That JSONL report is what later commands consume.
make verifyThis checks:
- inventory hash
- suite hash
- report JSONL
The point is not only that the run passes. The point is that the contracts and generated evidence are stable.
Python is a particularly important example of the pattern above because many teams reach for it first.
The preferred Python model is not:
- write one Python wrapper script
- let Observer run it
- call the job done if the script exits zero
The preferred Python model is:
- write Python tests with the Observer Python integration
- use those tests to verify real product behaviors at granular scope
- let Python act as the scripting proxy for the thing you actually care about
If you want a one-line rule for teams:
Do not ask Observer to verify that a Python wrapper script ran. Ask Observer to verify the real behaviors that the Python tests are checking.
That means a Python-based provider is often best when the real subject is:
- a binary interface
- a package or install surface
- a network interaction
- a generated artifact
- a workflow with precise observable checkpoints
Python is just the control language. The product contract is still the center.
If you are reviewing a Python-based Observer setup, check these five things.
- Are target names about product behavior rather than about the wrapper script?
- Does the provider expose several meaningful targets instead of one orchestration target?
- Does
tests.invlook like a useful public execution contract? - Would a failed report line tell an operator what actually regressed?
- Could the same targets participate cleanly in a later product certification stage?
If most answers are no, the setup is probably still in the wrapper-script anti-pattern.
Single-test strategy:
- use when there is one real proof unit and its name is still meaningful
Multi-test strategy:
- use when one product surface can be decomposed into several real proofs
Staged product strategy:
- use when several verification areas must pass together as one product verdict
See the concrete examples here:
- ../examples/python-proxy-pattern/single-test-strategy/README.md
- ../examples/python-proxy-pattern/multi-test-strategy/README.md
- ../examples/python-proxy-pattern/staged-product/README.md
For the Python integration and runnable examples, see ../lib/python/README.md and ../lib/python/HOWTO.md.
Before going file by file, here is the plain-English version.
observer.tomltells Observer how to find the providertests.invtells Observer what targets existtests.obstells Observer what should be true about those targets.observer/stores the evidence generated by the run
That is the basic working set.
This defines providers.
Use it to answer:
- which providers exist
- how to invoke them
- which working directory they run in
- whether environment inheritance is allowed
This is canonical inventory. It is the explicit list of runnable targets.
In normal workflows, generate it with:
observer derive-inventory --config observer.toml --provider rust > tests.invIf inventory changes unexpectedly, treat that as a meaningful contract change rather than just build noise.
This is the suite.
The Rust starter uses the simple suite surface:
test prefix: "ledger/" timeoutMs: 1000: expect exit = 0.
test "ledger/rejects-overdraft" timeoutMs: 1000: [
expect exit = 0.
expect out contains "denied overdraft".
].
Use the simple surface when you mainly need expectation-based test verification.
Use the full surface when you need richer workflow logic, branching, extraction, publication, or more complex verification flows.
This is where local generated artifacts usually go.
Common contents include:
- report JSONL
- current hashes
- provider discovery output
- generated HTML views
- derived analytics
Keep it local to the verification area you are working in.
If you feel lost, treat these as the normal six:
derive-inventoryhash-inventoryhash-suiteruncubeview
Everything else is either deeper validation, product-level composition, or operator convenience.
Use when you need to convert provider output into canonical inventory.
observer derive-inventory --config observer.toml --provider rust > tests.invUse when inventory should be treated as a stable contract.
observer hash-inventory --inventory tests.invUse when the suite itself is part of the contract you want to pin.
observer hash-suite --suite tests.obs --surface simpleUse for normal suite execution.
Interactive operator mode:
observer run --inventory tests.inv --suite tests.obs --config observer.toml --surface simple --ui rich --report noneMachine-readable mode:
observer run --inventory tests.inv --suite tests.obs --config observer.toml --surface simple --ui off --report jsonl > .observer/report.default.jsonlUse before a run when you suspect setup problems.
Examples:
observer doctor --inventory tests.inv --suite tests.obs --surface simple
observer doctor --config observer.toml --provider rustdoctor is the command to reach for when you are not sure whether the problem is in the provider, config, inventory, or suite wiring.
Use when one report should become a derived build artifact.
observer cube --report .observer/report.default.jsonl --out .observer/build.cube.jsonUse when you want one build compared against another.
observer compare --cube build-a.cube.json --cube build-b.cube.json --out compare.jsonUse when you need a self-contained HTML artifact for local inspection or sharing.
observer view --cube .observer/build.cube.json --out .observer/build.htmlUse product certification when one product is only healthy if several verification stages pass together.
You can author product inputs either as canonical JSON or as TOML that lowers mechanically into the same canonical product model.
The example product file in this repository looks like this:
{
"k": "observer_product",
"v": "0",
"product_id": "demo",
"product_label": "Demo Product",
"certification_rule": "all_pass",
"stages": [
{
"stage_id": "unit",
"runner": {
"k": "observer_suite",
"cwd": "unit",
"suite": "tests.obs",
"inventory": "tests.inv",
"surface": "simple",
"mode": "default"
}
},
{
"stage_id": "workflow",
"runner": {
"k": "observer_suite",
"cwd": "workflow",
"suite": "tests.obs",
"surface": "full",
"mode": "default"
}
}
]
}The important design rule is that each stage runs from its own declared working directory.
That lets a product pull together verification areas that would otherwise remain scattered shell glue.
Product stages can also import a child product as one explicit proof stage through the observer_product runner. In TOML authoring, that shape is naturally expressed with a [subproduct.<id>] stanza.
Run product certification with:
observer certify --product product.json --ui off --report jsonl > .observer/product.default.jsonlThen derive analytics from the product report:
observer cube-product --report .observer/product.default.jsonl --root . --out .observer/analytics-productObserver also has a first product slice for CMake-constructed products.
Use it when CMake already defines construction truth and Observer should certify that surface.
Current shape:
- configure and build with CMake so the File API reply exists
- lower the CMake model
- hash the lowered model if needed
- certify the product stage or derive analytics from the resulting report
Core commands:
observer lower-cmake-model --build out/build/debug --out .observer/cmake-model.json
observer hash-cmake-model --model .observer/cmake-model.jsonThe repo-owned example is under:
tests/cmake-model/observer/
If you are trying to understand the current CMake slice, start there instead of from the spec.
If you want the easiest beginner path, start with the first two recipes only.
They cover most first-time adoption problems.
- create
observer.tomlwith one provider definition - make sure the provider host can answer
listandrun - derive
tests.inv - write
tests.obs - run
observer doctor - run the suite interactively
- emit a JSONL report
- add hash checks or golden report checks if the flow is meant to stay stable
Short version:
provider -> inventory -> suite -> report
- run the provider host directly
- run
observer derive-inventory - inspect
tests.inv - run
observer doctor - verify the suite actually selects targets present in inventory
In most cases, the issue is one of:
- provider host path is wrong
- provider host
cwdis wrong - provider emits a different target than the suite expects
- inventory was not regenerated after a provider change
This is the first debugging loop to memorize.
- emit report JSONL
- derive a cube
- render an HTML view
Commands:
observer run --inventory tests.inv --suite tests.obs --config observer.toml --surface simple --ui off --report jsonl > .observer/report.default.jsonl
observer cube --report .observer/report.default.jsonl --out .observer/build.cube.json
observer view --cube .observer/build.cube.json --out .observer/build.html- derive one cube per build
- compare those cubes
- render the compare HTML
Commands:
observer compare --cube build-a.cube.json --cube build-b.cube.json --out compare.json
observer view --compare compare.json --out compare.htmlObserver intentionally separates human output from machine output.
- Human UI goes to stderr when practical.
- Machine-readable artifacts go to stdout or explicit files.
That means a command like this is normal:
observer run --inventory tests.inv --suite tests.obs --report jsonl > report.jsonlYou still see human progress, but stdout remains clean enough to capture the structured report.
If you want only the machine artifact, use --ui off.
If you are stuck, work from left to right through the pipeline:
- provider
- inventory
- suite
- report
- derived artifacts
Do not jump straight to the product layer if the provider and inventory are not already trustworthy.
Check:
observer.tomlpath to the provider binarycwdfor that provider- whether the binary was built at all
Check:
- whether
tests.invwas regenerated after provider changes - whether suite target names actually match inventory entries
- whether the suite surface is correct (
simpleversusfull)
Check:
- whether provider config relies on inherited environment
- whether generated paths differ by machine
- whether you are treating derived artifacts as canonical when they are actually volatile
Run:
observer doctor --config observer.toml --inventory tests.inv --suite tests.obs --surface simpleThen work backwards from the first concrete finding.
If you are teaching a team, this is the simplest recommendation to give them:
- read the top of this guide through the Python sections
- copy ../examples/python-proxy-pattern/multi-test-strategy
- rename the targets so they describe your real product behaviors
- only add product stages after the target surface is already clean
That sequence avoids the most common failure mode: building a large wrapper-script blob and only later discovering that Observer cannot see the verification surface clearly.
- If you want a working example, start with ../lib/rust/starter/README.md.
- If you want product certification examples, see ../examples/product-certify.
- If you want the architecture and rationale, read ../OBSERVER.md.
- If you want normative detail, use the ../specs directory as reference material.
The important habit is to treat Observer as a contract pipeline, not as a magical test launcher. Once you do that, the folder structure and command flow become much easier to reason about.