diff --git a/docs/index.md b/docs/index.md
index dff41f54..1d45c148 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -7,19 +7,70 @@ Built on top of [NumPyro](https://num.pyro.ai/), PyRenew provides configurable c
 
 ## Renewal models
 
-A renewal model estimates new infections from recent past infections using a generation interval distribution $w(s)$: the probability that $s$ time units separate infection in an index case and a secondary case.
-The core renewal equation is:
+A renewal model estimates new infections from recent past infections.
+It combines two distinct discrete convolutions which describe different processes: transmission between infections and delay from infection to observation.
 
-$$I(t) = \mathcal{R}(t) \sum_{s} I(t-s) \, w(s)$$
+- The **renewal equation** maps past infections to new infections using the generation interval distribution $w_\tau$.
+- The **observation equation** maps latent infections to expected observed events using the delay distribution $\pi_d$.
 
-where $R_t$ is the time-varying reproduction number.
+#### Renewal equation
 
-Inference is complicated by the fact that observational data require their own models ([Bhatt et al., 2023, S2](https://doi.org/10.1093/jrsssa/qnad030)).
-The observation equation links infections to expected observations:
+New infections arise from past infections through a generation interval distribution.
 
-$$\mu(t) = \alpha \sum_{s} I(t-s) \, \pi(s)$$
+Let $I(t)$ denote the latent number of new infections at time $t$, and let $\mathcal{R}(t)$ denote the time-varying reproduction number.
+Assume the generation interval distribution has finite support over lags $\tau = 1, \dots, G$. Let $w_\tau$ denote the probability that a secondary infection occurs $\tau$ days after infection in the primary case, with
 
-where $\alpha$ is the ascertainment rate and $\pi(s)$ is the delay distribution from infection to observation.
+$$
+\sum_{\tau=1}^{G} w_\tau = 1, \qquad w_\tau \ge 0.
+$$
+
+Then the renewal equation is
+
+$$
+I(t) = \mathcal{R}(t) \sum_{\tau=1}^{G} I(t - \tau)\, w_\tau.
+$$
+
+Here, $\tau$ indexes the generation interval.
+
+In PyRenew, the latent process is represented on a **per-capita scale** (infection proportion) and is multiplied by a population size downstream when connecting to count observations.
+
+#### Observation equation
+
+Infections are latent and are not directly observed; instead, the data consist of events that occur some time after infection, such as hospitalizations or emergency department visits.
+
+Let $\mu(t)$ denote the expected number of observed events at time $t$, and let $\alpha$ denote an **ascertainment rate**, the probability an infection is observed as an event. Assume the delay from infection to observation has finite support over lags $d = 0, \dots, D$. Let $\pi_d$ denote the probability that an infection is observed $d$ days later, with
+
+$$
+\sum_{d=0}^{D} \pi_d = 1, \qquad \pi_d \ge 0.
+$$
+
+Then the observation equation is
+
+$$
+\mu(t) = \alpha \sum_{d=0}^{D} I(t - d)\, \pi_d.
+$$
+
+Here, $d$ indexes lags in the infection-to-observation delay distribution.
+
+#### Stochastic observation model
+
+The observation equation defines the expected number of observed events at time $t$, but the actual observed data are stochastic.
+
+Let $Y(t)$ denote the observed number of events at time $t$. We model observations as draws from a count distribution with central value (typically mean) $\mu(t)$:
+
+$$
+Y(t) \sim \text{Distribution}(\mu(t), \theta).
+$$
+
+One possible choice is the Poisson distribution, which assumes the variance equals the mean.
+In practice, epidemiological count data are often overdispersed relative to the Poisson. Negative binomial distributions are a common choice for modeling these overdispersed counts.
+
+The model thus has two layers:
+
+- A **mechanistic layer**, where the renewal and delay convolutions determine the predicted number of observations $\mu(t)$ from the latent infections $I(t)$.
+- A **stochastic observation layer**, where observed counts $Y(t)$ vary around $\mu(t)$ according to a specified distribution.
+
+This separation allows the model to distinguish between systematic structure driven by transmission and reporting delays, and stochastic variability in observed data.
 
 ## Design
 
diff --git a/docs/tutorials/building_multisignal_models.qmd b/docs/tutorials/building_multisignal_models.qmd
index 16c2accb..7d67e262 100644
--- a/docs/tutorials/building_multisignal_models.qmd
+++ b/docs/tutorials/building_multisignal_models.qmd
@@ -184,7 +184,7 @@ Latent infection processes implement the renewal equation to generate infection
 
 ### Generation Interval
 
-The generation interval PMF specifies the probability that a secondary infection occurs $d$ days after the primary infection.
+The generation interval PMF specifies the probability that a secondary infection occurs $\tau$ days after the primary infection.
 
 ```{python}
 # | label: gen-interval
@@ -206,6 +206,8 @@ The initial infections RV `I0_rv` specifies the **proportion of the population i
 I0_rv = DistributionalVariable("I0", dist.Beta(1, 100))
 ```
 
+
+
 ### Initial Log Rt
 
 We place a prior on the initial log(Rt), centered at 0.0 (Rt = 1.0) with moderate uncertainty:
@@ -554,6 +556,15 @@ print(f"  Latent process: {type(model.latent).__name__}")
 print(f"  Observation processes: {list(model.observations.keys())}")
 ```
 
+### Model  Identifiability.
+
+The renewal equation is linear in the initial infections `I0_rv`, so scaling `I0_rv` by a factor $c$ scales the entire infection trajectory by $c$.
+In practice, `I0_rv` is weakly identified because each observation process links infections to data through a signal-specific ascertainment rate $\alpha_s$ — the probability that an infection is observed as an event in signal $s$.
+Doubling `I0_rv` while halving all ascertainment rates produces identical expected observations.
+Without external information to anchor either the ascertainment rates or the absolute infection level, the data cannot distinguish "more infections, lower ascertainment" from "fewer infections, higher ascertainment."
+The priors on `I0_rv` and on the ascertainment rates resolve this ambiguity.
+
+
 ## Fitting the Model to Data:  `model.run()`
 
 When you call `model.run()`, you supply two types of information:
diff --git a/docs/tutorials/latent_hierarchical_infections.qmd b/docs/tutorials/latent_hierarchical_infections.qmd
index 25d65233..8ef3092d 100644
--- a/docs/tutorials/latent_hierarchical_infections.qmd
+++ b/docs/tutorials/latent_hierarchical_infections.qmd
@@ -48,26 +48,119 @@ from pyrenew.deterministic import DeterministicPMF, DeterministicVariable
 
 ## Overview
 
-`HierarchicalInfections` models a jurisdiction with $K$ subpopulations, each running its own renewal equation. Transmission dynamics are related through a hierarchical decomposition of the effective reproduction number:
 
-$$\log \mathcal{R}_k(t) = \log \mathcal{R}_{\text{baseline}}(t) + \delta_k(t)$$
+`HierarchicalInfections` extends the renewal model to a population composed of $K$ subpopulations, each with its own latent infection trajectory.
 
-where $\mathcal{R}_{\text{baseline}}(t)$ is the jurisdiction-level reproduction number shared across all subpopulations, and $\delta_k(t)$ is the deviation for subpopulation $k$. A sum-to-zero constraint ($\sum_k \delta_k(t) = 0$) ensures identifiability.
+As in the single-population case, infections evolve according to the renewal equation. For each subpopulation $k = 1, \dots, K$, we define
 
-Each subpopulation runs its own renewal equation with $\mathcal{R}_k(t)$, and results are aggregated using population fractions: $I_{\text{aggregate}}(t) = \sum_k p_k \, I_k(t)$.
+$$
+I_k(t) = \mathcal{R}_k(t) \sum_{\tau=1}^{G} I_k(t - \tau)\, w_\tau,
+$$
+
+where:
+
+- $I_k(t)$ is the latent infection proportion in subpopulation $k$ at time $t$
+- $\mathcal{R}_k(t)$ is the effective reproduction number for subpopulation $k$
+- $w_\tau$ is the generation interval PMF
+
+The generation interval is shared across all subpopulations.
+
+### Hierarchical decomposition of $\mathcal{R}_k(t)$
+
+The reproduction numbers are modeled through a hierarchical decomposition:
+
+$$
+\log \mathcal{R}_k(t) = \log \mathcal{R}_{\text{baseline}}(t) + \delta_k(t),
+$$
+
+where:
+
+- $\mathcal{R}_{\text{baseline}}(t)$ is the shared baseline trajectory
+- $\delta_k(t)$ is the deviation for subpopulation $k$
+
+This separates:
+
+- **shared dynamics** via $\mathcal{R}_{\text{baseline}}(t)$
+- **relative differences** via $\delta_k(t)$
+
+The baseline controls *where the epidemic is going*, while the deviations control *which subpopulations are above or below that trajectory*.
+
+Because deviations are additive on the log scale, they act multiplicatively on $\mathcal{R}_k(t)$: positive $\delta_k(t)$ increases transmission relative to the baseline, while negative values decrease it.
+
+### Aggregation
+
+Each subpopulation produces its own infection trajectory $I_k(t)$. These are combined into an aggregate infection process using population fractions $p_k$, where $p_k$ is the fraction of the total population in subpopulation $k$, with
+
+$$
+\sum_{k=1}^{K} p_k = 1, \qquad p_k \ge 0.
+$$
+
+The aggregate infection trajectory is then
+
+$$
+I_{\text{aggregate}}(t) = \sum_{k=1}^{K} p_k \, I_k(t).
+$$
+
+The aggregate trajectory $I_{\text{aggregate}}(t)$ is then passed to observation processes, which transform it into expected observations as described in the observation model tutorials.
+
+### Relationship to the single-population model
+
+This model generalizes the single-population renewal model:
+
+- `SharedInfections`: one trajectory $I(t)$ with one $\mathcal{R}(t)$
+- `HierarchicalInfections`: multiple trajectories $I_k(t)$ with a shared baseline $\mathcal{R}_{\text{baseline}}(t)$ and subpopulation deviations $\delta_k(t)$
+
+When all $\delta_k(t) = 0$, the model reduces to the shared (single-population) case.
 
 This tutorial assumes familiarity with the renewal equation, generation interval, initial conditions (`I0`, `initial_log_rt`), and temporal processes. See [Latent Infection Processes](latent_infections.md) for that background.
 
+---
+
 ## Model Structure
 
-`HierarchicalInfections` requires the same inputs as `SharedInfections` (generation interval, `I0`, `initial_log_rt`) plus a second temporal process for the subpopulation deviations:
+`HierarchicalInfections` uses the same core inputs as `SharedInfections`, with additional structure for subpopulation variation.
+
+### Core inputs (shared across subpopulations)
+
+- **`gen_int_rv`**: Generation interval distribution $w_\tau$
+- **`I0_rv`**: Initial infection level (shared by default across subpopulations, but can be specified per subpopulation)
+- **`initial_log_rt_rv`**: Initial value of $\log \mathcal{R}_{\text{baseline}}(t)$ at time $0$
 
-- **`baseline_rt_process`**: Temporal process for $\log \mathcal{R}_{\text{baseline}}(t)$. One trajectory, shared across all subpopulations.
-- **`subpop_rt_deviation_process`**: Temporal process for $\delta_k(t)$. Produces $K$ trajectories (one per subpopulation), initialized at zero and centered via the sum-to-zero constraint.
+### Temporal processes
 
-Population structure (`subpop_fractions`) is provided at **sample time**, not at construction time. This allows a single model specification to be fit to different jurisdictions.
+Two temporal processes define the evolution of $\log \mathcal{R}_k(t)$:
+
+- **`baseline_rt_process`**
+  Temporal process for $\log \mathcal{R}_{\text{baseline}}(t)$.
+  Produces a single trajectory shared across all subpopulations.
+
+- **`subpop_rt_deviation_process`**
+  Temporal process for $\delta_k(t)$.
+  Produces $K$ trajectories (one per subpopulation), which are centered at each time point to satisfy the sum-to-zero constraint.
+
+Together, these define the full set of reproduction numbers:
+
+$$
+\log \mathcal{R}_k(t)
+= \log \mathcal{R}_{\text{baseline}}(t) + \delta_k(t),
+\qquad
+\mathcal{R}_k(t) = \exp\big(\log \mathcal{R}_k(t)\big).
+$$
+
+
+### Population structure
+
+Population fractions $p_k$ are provided at **sample time**, not at model construction. This allows a single model specification to be reused across different jurisdictions or stratifications.
+
+### Random variables
+
+As in other PyRenew models, all inputs are specified as `RandomVariable`s:
+
+- `DeterministicVariable` and `DeterministicPMF` for fixed values (used in this tutorial)
+- `DistributionalVariable` for parameters with priors (used in inference)
+
+In practice, the temporal processes and initial conditions are typically given prior distributions, allowing the model to infer both shared and subpopulation-specific transmission dynamics.
 
-As in the other latent process tutorials, these inputs are `RandomVariable`s. We use `DeterministicVariable` and `DeterministicPMF` here so the prior behavior is easy to inspect, but in an inference workflow you would usually replace at least some of them with prior distributions.
 
 ```{python}
 # | label: model-setup
@@ -96,7 +189,7 @@ model = HierarchicalInfections(
     gen_int_rv=gen_int_rv,
     I0_rv=DeterministicVariable("I0", I0_val),
     initial_log_rt_rv=DeterministicVariable("initial_log_rt", initial_log_rt),
-    baseline_rt_process=DifferencedAR1(autoreg=0.9, innovation_sd=0.01),
+    baseline_rt_process=DifferencedAR1(autoreg=0.5, innovation_sd=0.01),
     subpop_rt_deviation_process=AR1(autoreg=0.8, innovation_sd=0.05),
     n_initialization_points=n_init,
 )
@@ -104,7 +197,10 @@ model = HierarchicalInfections(
 
 ## The Sum-to-Zero Constraint
 
-Without a constraint, the baseline and deviations are not identifiable: shifting the baseline up by some amount $c$ and all deviations down by $c$ produces the same subpopulation $\mathcal{R}_k(t)$ values. `HierarchicalInfections` enforces $\sum_k \delta_k(t) = 0$ at every time step by centering the raw deviation trajectories. This ensures $\mathcal{R}_{\text{baseline}}(t)$ is the geometric mean of the subpopulation $\mathcal{R}_k(t)$ values.
+Without a constraint, the baseline and deviations are not identifiable: shifting the baseline up by some amount $c$ and all deviations down by $c$ produces the same subpopulation $\mathcal{R}_k(t)$ values. `HierarchicalInfections` enforces $\sum_k \delta_k(t) = 0$ at every time step by centering the raw deviation trajectories.
+This ensures $\mathcal{R}_{\text{baseline}}(t)$ is the *unweighted* geometric mean of the subpopulation $\mathcal{R}_k(t)$ values, so the baseline represents the typical transmission level across subpopulations.
+
+Note that this is the unweighted mean across subpopulations, not population-weighted by $p_k$. As a result, $\mathcal{R}_{\text{baseline}}(t)$ does **not** in general equal the jurisdiction-level reproduction number implied by the aggregate infection trajectory $I_{\text{aggregate}}(t)$. When subpopulations differ in size, a small subpopulation with a large $\mathcal{R}_k(t)$ contributes equally to the baseline but only marginally to the aggregate.
 
 ```{python}
 # | label: verify-constraint
@@ -125,7 +221,8 @@ print(
 
 ## Choosing the Baseline Temporal Process
 
-The baseline process governs the jurisdiction-level trend in $\mathcal{R}(t)$. The same temporal process options apply as for `SharedInfections` (see [Temporal Process Choice](latent_infections.md#temporal-process-choice)), and the same considerations hold: AR(1) for mean reversion, DifferencedAR1 for persistent trends with stabilizing rate of change, RandomWalk for unconstrained drift.
+The baseline process governs the jurisdiction-level trend in $\mathcal{R}(t)$.
+The same temporal process options apply as in `SharedInfections` (see [Temporal Process Choice](latent_infections.md#temporal-process-choice)): AR(1) for mean reversion, DifferencedAR1 for persistent trends with stabilizing rate of change, RandomWalk for unconstrained drift.
 
 We use DifferencedAR1 with small `innovation_sd` for the baseline throughout this tutorial. The prior predictive for baseline $\mathcal{R}(t)$ behaves much like the corresponding `SharedInfections` example, so we do not repeat that full comparison here. The focus of this tutorial is the deviation process and how it changes the spread of subpopulation trajectories around the baseline.
 
@@ -133,8 +230,8 @@ The same high-level decision rules apply here:
 
 | If you believe... | Consider |
 |-------------------|----------|
-| Jurisdiction-wide Rt should stay near a long-run level | `AR1` for the baseline |
-| Jurisdiction-wide Rt may drift over the modeled horizon | `DifferencedAR1` for the baseline |
+| Jurisdiction-wide $\mathcal{R}(t)$ should stay near a long-run level | `AR1` for the baseline |
+| Jurisdiction-wide $\mathcal{R}(t)$ may drift over the modeled horizon | `DifferencedAR1` for the baseline |
 | Local differences should fade back toward the baseline | `AR1` for deviations |
 | Local differences can persist or accumulate | `RandomWalk` for deviations |
 
@@ -145,9 +242,11 @@ Two parameters matter most when tuning these processes:
 
 ## Choosing the Deviation Temporal Process
 
-The deviation process controls how subpopulation $\mathcal{R}_k(t)$ values spread around the baseline. The deviations $\delta_k(t)$ are initialized at zero (all subpopulations start at the baseline) and evolve according to the chosen temporal process.
+The deviation process controls how subpopulation $\mathcal{R}_k(t)$ values move relative to the baseline $\mathcal{R}_{\text{baseline}}(t)$, not the overall trend itself.
+How $\delta_k(t)$ behaves at $t = 0$ depends on the process. `AR1` draws its initial value from the stationary distribution, so the prior spread of $\delta_k(0)$ already matches the stationary standard deviation and stays at that width throughout. `RandomWalk` starts at exactly $\delta_k(0) = 0$ and its spread grows over time.
 
 The key question: **are local differences transient or persistent?**
+This determines whether subpopulations quickly return to the baseline or can diverge and remain different over time.
 
 ```{python}
 # | label: helpers
@@ -189,12 +288,15 @@ def sample_hierarchical(baseline_process, deviation_process, label):
     }
 
 
-baseline_process = DifferencedAR1(autoreg=0.9, innovation_sd=0.01)
+baseline_process = DifferencedAR1(autoreg=0.5, innovation_sd=0.01)
 ```
 
+
 ### AR(1) Deviations
 
-AR(1) deviations revert toward zero. Subpopulations that temporarily deviate from the baseline are pulled back. The `autoreg` coefficient controls how quickly: values near 1 produce slow reversion (local differences linger), values near 0 produce fast reversion (subpopulations snap back to the baseline).
+AR(1) deviations have **bounded variance**. `AR1` draws its initial value from the stationary distribution of the process, so $\delta_k(0)$ is already dispersed at the stationary standard deviation $\sigma / \sqrt{1 - \phi^2}$ and the envelope stays at that width. With `autoreg = 0.8` and `innovation_sd = 0.05`, the stationary standard deviation is approximately $0.083$ on the log scale.
+
+The `autoreg` coefficient still matters: if a subpopulation drifts away from zero by chance, $\phi$ controls how quickly it is pulled back. Values near 1 produce slow pullback (local differences linger), values near 0 produce fast pullback (subpopulations snap back to the baseline). What this looks like in a prior predictive plot is a **constant-width band** rather than a fanning-out cloud.
 
 ```{python}
 # | label: ar1-deviations-sample
@@ -226,17 +328,56 @@ def deviation_df(samples, label):
     return pd.DataFrame(data)
 
 
+def deviation_summary_df(samples):
+    """Build a long-format median and 90% interval per (day, subpop) using all samples."""
+    devs = samples["deviations"]
+    rows = []
+    for k in range(n_subpops):
+        col = devs[:, :, k]
+        for d in range(n_days):
+            rows.append(
+                {
+                    "day": d,
+                    "subpop": f"subpop {k}",
+                    "median": float(np.median(col[:, d])),
+                    "q05": float(np.percentile(col[:, d], 5)),
+                    "q95": float(np.percentile(col[:, d], 95)),
+                }
+            )
+    return pd.DataFrame(rows)
+
+
 ar1_dev_df = deviation_df(ar1_dev_samples, "AR(1) deviations")
+ar1_dev_summary = deviation_summary_df(ar1_dev_samples)
 ```
 
 ```{python}
 # | label: fig-ar1-deviations
-# | fig-cap: AR(1) deviation trajectories (50 samples, all 6 subpopulations overlaid). Deviations revert toward zero, meaning subpopulations track the baseline and local differences are transient.
+# | fig-cap: 'AR(1) deviation trajectories (50 samples, all 6 subpopulations). Because `AR1` draws its initial value from the stationary distribution, the envelope is already at its stationary width at day 0 and stays bounded there. Compare the y-axis scale to fig-rw-deviations below.'
 (
-    p9.ggplot(ar1_dev_df, p9.aes(x="day", y="deviation", group="sample"))
-    + p9.geom_line(alpha=0.1, size=0.5, color="purple")
+    p9.ggplot()
+    + p9.geom_line(
+        ar1_dev_df,
+        p9.aes(x="day", y="deviation", group="sample"),
+        alpha=0.08,
+        size=0.5,
+        color="purple",
+    )
+    + p9.geom_ribbon(
+        ar1_dev_summary,
+        p9.aes(x="day", ymin="q05", ymax="q95"),
+        fill="purple",
+        alpha=0.25,
+    )
+    + p9.geom_line(
+        ar1_dev_summary,
+        p9.aes(x="day", y="median"),
+        color="purple",
+        size=0.8,
+    )
     + p9.geom_hline(yintercept=0, color="black", linetype="dotted")
     + p9.facet_wrap("~ subpop", ncol=3)
+    + p9.coord_cartesian(ylim=(-0.5, 0.5))
     + p9.labs(
         x="Days",
         y="Deviation $\\delta_k(t)$",
@@ -248,7 +389,7 @@ ar1_dev_df = deviation_df(ar1_dev_samples, "AR(1) deviations")
 
 ### RandomWalk Deviations
 
-RandomWalk deviations have no mean reversion. Subpopulations can drift away from the baseline and stay there. Differences that emerge early in the time series persist and can grow.
+RandomWalk deviations have **unbounded variance**. The spread of $\delta_k(t)$ grows linearly with time as $\sigma^2 t$, so over a 28-day horizon with `innovation_sd = 0.05` the standard deviation reaches roughly $0.265$ on the log scale — about three times the AR(1) stationary value above. There is no pullback toward zero: subpopulations that drift away from the baseline stay drifted, and differences that emerge early persist and can grow. The visual signature is a **fanning-out cloud** rather than a constant-width band.
 
 ```{python}
 # | label: rw-deviations-sample
@@ -262,16 +403,36 @@ rw_dev_samples = sample_hierarchical(
 ```{python}
 # | label: rw-deviations-data
 rw_dev_df = deviation_df(rw_dev_samples, "RandomWalk deviations")
+rw_dev_summary = deviation_summary_df(rw_dev_samples)
 ```
 
 ```{python}
 # | label: fig-rw-deviations
-# | fig-cap: RandomWalk deviation trajectories (50 samples, all 6 subpopulations overlaid). Deviations drift freely with no tendency to return to zero. Subpopulations can diverge permanently from the baseline.
+# | fig-cap: 'RandomWalk deviation trajectories (50 samples, 6 subpopulations). The envelope fans out continuously over the 28-day horizon and reaches roughly three times the AR(1) stationary width. The y-axis matches fig-ar1-deviations above for direct comparison.'
 (
-    p9.ggplot(rw_dev_df, p9.aes(x="day", y="deviation", group="sample"))
-    + p9.geom_line(alpha=0.1, size=0.5, color="purple")
+    p9.ggplot()
+    + p9.geom_line(
+        rw_dev_df,
+        p9.aes(x="day", y="deviation", group="sample"),
+        alpha=0.08,
+        size=0.5,
+        color="purple",
+    )
+    + p9.geom_ribbon(
+        rw_dev_summary,
+        p9.aes(x="day", ymin="q05", ymax="q95"),
+        fill="purple",
+        alpha=0.25,
+    )
+    + p9.geom_line(
+        rw_dev_summary,
+        p9.aes(x="day", y="median"),
+        color="purple",
+        size=0.8,
+    )
     + p9.geom_hline(yintercept=0, color="black", linetype="dotted")
     + p9.facet_wrap("~ subpop", ncol=3)
+    + p9.coord_cartesian(ylim=(-0.5, 0.5))
     + p9.labs(
         x="Days",
         y="Deviation $\\delta_k(t)$",
@@ -284,18 +445,20 @@ rw_dev_df = deviation_df(rw_dev_samples, "RandomWalk deviations")
 ### Comparing Deviation Processes
 
 ```{python}
-# | label: deviation-summary
-print(f"Deviation spread at day {n_days} (across all subpops and samples):")
-print(
-    f"{'Process':<24} {'Median |dev|':>14} {'90th % |dev|':>14} {'Max |dev|':>12}"
-)
-print("-" * 66)
+# | label: tbl-deviation-summary
+# | tbl-cap: Deviation spread at day 28 (across all subpopulations and samples).
+deviation_summary_rows = []
 for label, s in [("AR(1)", ar1_dev_samples), ("RandomWalk", rw_dev_samples)]:
     devs = np.abs(s["deviations"][:, -1, :]).flatten()
-    print(
-        f"{label:<24} {np.median(devs):>14.3f} "
-        f"{np.percentile(devs, 90):>14.3f} {np.max(devs):>12.3f}"
+    deviation_summary_rows.append(
+        {
+            "Process": label,
+            "Median |dev|": f"{np.median(devs):.3f}",
+            "90th % |dev|": f"{np.percentile(devs, 90):.3f}",
+            "Max |dev|": f"{np.max(devs):.3f}",
+        }
     )
+pd.DataFrame(deviation_summary_rows)
 ```
 
 AR(1) deviations stay close to zero because mean reversion continuously pulls them back. RandomWalk deviations accumulate over time. As in the SharedInfections tutorial, this is prior-modeling guidance rather than a uniquely determined epidemiologic rule. The choice depends on the epidemiological setting:
@@ -348,7 +511,7 @@ pairs_df = pd.concat(
 
 ```{python}
 # | label: fig-pairs-baseline
-# | fig-cap: Baseline Rt trajectories are identical for both configurations (same process, same seed). The difference is in how subpopulations spread around this baseline.
+# | fig-cap: 'Baseline $\mathcal{R}(t)$ trajectories are identical for both configurations (same process, same seed). The difference is in how subpopulations spread around this baseline.'
 baseline_df = pairs_df[pairs_df["subpop"] == "baseline"]
 (
     p9.ggplot(baseline_df, p9.aes(x="day", y="rt", group="sample"))
@@ -358,34 +521,109 @@ baseline_df = pairs_df[pairs_df["subpop"] == "baseline"]
     + p9.labs(
         x="Days",
         y="Rt (baseline)",
-        title="Baseline Rt (DifferencedAR1, autoreg=0.9, sd=0.01)",
+        title=r"Baseline $\mathcal{R}(t)$ (DifferencedAR1, autoreg=0.5, sd=0.01)",
     )
     + theme_tutorial
 )
 ```
 
 ```{python}
-# | label: fig-pairs-subpop
-# | fig-cap: Subpopulation Rt trajectories under two deviation process choices (50 samples, one subpopulation shown). With AR(1) deviations (left), subpopulation Rt tracks the baseline closely. With RandomWalk deviations (right), subpopulation Rt can diverge.
-subpop0_df = pairs_df[pairs_df["subpop"] == "subpop 0"]
+# | label: pairs-subpop-summary-data
+def baseline_vs_subpop_summary(samples, subpop_idx, label):
+    """Summarize baseline and one subpopulation's Rt across all prior draws."""
+    rt_base = samples["rt_baseline"]
+    rt_sub = samples["rt_subpop"][:, :, subpop_idx]
+    rows = []
+    for d in range(n_days):
+        rows.append(
+            {
+                "day": d,
+                "series": "baseline",
+                "median": float(np.median(rt_base[:, d])),
+                "q05": float(np.percentile(rt_base[:, d], 5)),
+                "q95": float(np.percentile(rt_base[:, d], 95)),
+                "config": label,
+            }
+        )
+        rows.append(
+            {
+                "day": d,
+                "series": "subpop 0",
+                "median": float(np.median(rt_sub[:, d])),
+                "q05": float(np.percentile(rt_sub[:, d], 5)),
+                "q95": float(np.percentile(rt_sub[:, d], 95)),
+                "config": label,
+            }
+        )
+    return pd.DataFrame(rows)
+
+
+config_order = [
+    "DifferencedAR1 + AR(1) deviations",
+    "DifferencedAR1 + RandomWalk deviations",
+]
+
+pairs_summary_df = pd.concat(
+    [
+        baseline_vs_subpop_summary(ar1_dev_samples, 0, config_order[0]),
+        baseline_vs_subpop_summary(rw_dev_samples, 0, config_order[1]),
+    ]
+)
+pairs_summary_df["config"] = pd.Categorical(
+    pairs_summary_df["config"], categories=config_order, ordered=True
+)
+
+subpop0_df = pairs_df[pairs_df["subpop"] == "subpop 0"].copy()
 subpop0_df["config"] = pd.Categorical(
-    subpop0_df["config"],
-    categories=[
-        "DifferencedAR1 + AR(1) deviations",
-        "DifferencedAR1 + RandomWalk deviations",
-    ],
-    ordered=True,
+    subpop0_df["config"], categories=config_order, ordered=True
 )
+baseline_summary = pairs_summary_df[pairs_summary_df["series"] == "baseline"]
+subpop_summary = pairs_summary_df[pairs_summary_df["series"] == "subpop 0"]
+```
+
+```{python}
+# | label: fig-pairs-subpop
+# | fig-cap: 'Prior predictive bands for subpopulation 0''s $\mathcal{R}(t)$ (blue) overlaid on the shared baseline $\mathcal{R}_{\text{baseline}}(t)$ (black), under two deviation process choices. Each panel shows 50 subpopulation 0 trajectories (light blue lines), the 5–95% prior interval for subpopulation 0 (blue ribbon) and for the baseline (grey ribbon), and their medians (solid lines). The y-axis is log-scaled so that additive log-deviations appear as constant multiplicative widths. With AR(1) deviations (left), the subpopulation band has nearly the same width as the baseline band at every day, because the deviation variance is stationary. With RandomWalk deviations (right), the subpopulation band widens relative to the baseline band as the horizon grows, because deviation variance accumulates.'
 (
-    p9.ggplot(subpop0_df, p9.aes(x="day", y="rt", group="sample"))
-    + p9.geom_line(alpha=0.2, size=0.5, color="steelblue")
+    p9.ggplot()
+    + p9.geom_ribbon(
+        baseline_summary,
+        p9.aes(x="day", ymin="q05", ymax="q95"),
+        fill="black",
+        alpha=0.15,
+    )
+    + p9.geom_line(
+        subpop0_df,
+        p9.aes(x="day", y="rt", group="sample"),
+        alpha=0.15,
+        size=0.4,
+        color="steelblue",
+    )
+    + p9.geom_ribbon(
+        subpop_summary,
+        p9.aes(x="day", ymin="q05", ymax="q95"),
+        fill="steelblue",
+        alpha=0.3,
+    )
+    + p9.geom_line(
+        baseline_summary,
+        p9.aes(x="day", y="median"),
+        color="black",
+        size=1.0,
+    )
+    + p9.geom_line(
+        subpop_summary,
+        p9.aes(x="day", y="median"),
+        color="steelblue",
+        size=1.0,
+    )
     + p9.geom_hline(yintercept=1.0, color="red", linetype="dashed", alpha=0.7)
     + p9.facet_wrap("~ config", ncol=2)
-    + p9.coord_cartesian(ylim=(0, rt_cap))
+    + p9.scale_y_log10(limits=(0.5, rt_cap))
     + p9.labs(
         x="Days",
-        y="Rt (subpopulation 0)",
-        title="Subpopulation Rt Under Two Deviation Process Choices",
+        y=r"$\mathcal{R}(t)$ (log scale)",
+        title=r"Subpopulation 0 $\mathcal{R}(t)$ vs Baseline Under Two Deviation Process Choices",
     )
     + theme_tutorial
 )
@@ -393,7 +631,7 @@ subpop0_df["config"] = pd.Categorical(
 
 ```{python}
 # | label: fig-pairs-all-subpops
-# | fig-cap: All 6 subpopulation Rt trajectories from a single prior draw, under both configurations. AR(1) deviations (left) keep subpopulations tightly bundled; RandomWalk deviations (right) allow them to spread apart.
+# | fig-cap: 'All 6 subpopulation $\mathcal{R}(t)$ trajectories from a single prior draw, under both configurations. AR(1) deviations (left) keep subpopulations tightly bundled; RandomWalk deviations (right) allow them to spread apart.'
 single_draw_data = []
 for label, s in [
     ("DifferencedAR1 +\nAR(1) deviations", ar1_dev_samples),
@@ -446,7 +684,7 @@ single_draw_df["config"] = pd.Categorical(
     + p9.labs(
         x="Days",
         y="Rt",
-        title="Single Prior Draw: Baseline (black) and Subpopulation Rt",
+        title=r"Single Prior Draw: Baseline (black) and Subpopulation $\mathcal{R}(t)$",
         color="",
     )
     + theme_tutorial
diff --git a/docs/tutorials/latent_infections.qmd b/docs/tutorials/latent_infections.qmd
index e09c8a95..b06e62d3 100644
--- a/docs/tutorials/latent_infections.qmd
+++ b/docs/tutorials/latent_infections.qmd
@@ -3,6 +3,12 @@ title: Latent Infection Processes
 format:
   gfm:
     code-fold: true
+  html:
+    toc: true
+    embed-resources: true
+    self-contained-math: true
+    code-fold: true
+    code-tools: true
 engine: jupyter
 jupyter:
   jupytext:
@@ -45,27 +51,24 @@ from pyrenew.randomvariable import DistributionalVariable
 
 ## Overview
 
-In infectious disease modeling, the true number of new infections each day is not directly observed. Instead, we observe indirect signals: hospital admissions, emergency department visits, reported cases, wastewater concentrations. The latent infection process generates the unobserved infection trajectory that underlies all of these signals.
-
-In PyRenew, the latent process is represented on a **per-capita scale** (infection proportion) and is typically multiplied by a population size downstream when connecting to count observations.
-
-PyRenew models latent infections using the **renewal equation**:
-
-$$I(t) = \mathcal{R}(t) \sum_{s=1}^{S} I(t-s) \, w(s)$$
+In infectious disease modeling, the true number of new infections at each time point is not directly observed. Instead, we observe indirect signals: hospital admissions, emergency department visits, reported cases, wastewater concentrations.
+The latent infection process generates the unobserved infection trajectory that underlies all of these signals.
 
-where:
+PyRenew models latent infections using the **renewal equation**, which describes how new infections arise from recent past infections.
 
-- $I(t)$ is the number of new infections on day $t$
-- $\mathcal{R}(t)$ is the time-varying reproduction number
-- $w(s)$ is the **generation interval** distribution: the probability that $s$ days separate infection of a primary case and a secondary case
+Assume the generation interval has finite support over lags $\tau = 1, \dots, G$, with
 
-Today's infections equal the reproduction number times a weighted sum of recent past infections, where the weights come from the generation interval.
+$$
+\sum_{\tau=1}^{G} w_\tau = 1, \qquad w_\tau \ge 0.
+$$
 
-Observation processes then transform the latent infections into expected observations. Each signal $k$ has its own ascertainment rate $\alpha_k$ and delay distribution $\pi_k(s)$:
+Then the renewal equation is
 
-$$\mu_k(t) = \alpha_k \sum_{s} I(t-s) \, \pi_k(s)$$
+$$
+I(t) = \mathcal{R}(t) \sum_{\tau=1}^{G} I(t - \tau)\, w_\tau.
+$$
 
-The latent infection trajectory $I(t)$ is shared across all signals.
+Here, $\tau$ indexes lags in the generation interval.
 
 PyRenew provides two latent infection classes:
 
@@ -76,12 +79,12 @@ PyRenew provides two latent infection classes:
 
 `SharedInfections` requires four inputs:
 
-1. A **generation interval** distribution (`gen_int_rv`)
-2. **Initial infection prevalence** (`I0_rv`)
-3. **Initial log reproduction number** (`initial_log_rt_rv`)
-4. A **temporal process** for $\mathcal{R}(t)$ dynamics (`shared_rt_process`)
+1. Generation interval distribution $w_\tau$ (`gen_int_rv`)
+2. Infection prevalence at time $0$ as a proportion of the population (`I0_rv`)
+3. Value for $log(\mathcal{R}(t))$ at time $0$ (`initial_log_rt_rv`)
+4. A temporal process for $\mathcal{R}(t)$ dynamics (`shared_rt_process`)
 
-All inputs are **RandomVariables**, a quantity that is either known (observed, conditioned on) or unknown (to be inferred). See [PyRenew's RandomVariable abstract base class](random_variables.md). In this tutorial, we use `DeterministicVariable` and `DeterministicPMF` (fixed values) for illustration. In real inference, you would use `DistributionalVariable` with priors for quantities you want to estimate:
+All inputs are **RandomVariables**, a quantity that is either known (observed, conditioned on) or unknown (to be inferred).  See [PyRenew's RandomVariable abstract base class](random_variables.md).  In this tutorial, we use `DeterministicVariable` and `DeterministicPMF` (fixed values) for illustration.  In real inference, you would use `DistributionalVariable` with priors for quantities you want to estimate:
 
 ```python
 # Fixed value (for illustration)
@@ -93,7 +96,7 @@ I0_rv = DistributionalVariable("I0", dist.Beta(1, 1000))
 
 ### Generation Interval
 
-The generation interval is a key epidemiological input to the renewal equation. In many renewal-model analyses, it is specified from external studies rather than estimated jointly from the focal surveillance data. Because `R_t` estimates depend on this choice, sensitivity analysis is often warranted. For COVID-19, most transmission occurs within the first few days. We define a 7-day PMF for illustration:
+The generation interval is a key epidemiological input to the renewal equation. In many renewal-model analyses, it is specified from external studies rather than estimated jointly from the focal surveillance data. Because `R_t` estimates depend on this choice, sensitivity analysis is often warranted. For COVID-19, most transmission occurs within the first few days. We define a 7-day distribution for illustration:
 
 ```{python}
 # | label: generation-interval
@@ -122,23 +125,48 @@ gi_df = pd.DataFrame({"day": days, "probability": np.array(gen_int_pmf)})
 )
 ```
 
-The generation interval length determines the minimum initialization period: with an $S$-point generation interval PMF, the renewal equation at day 0 needs an infection-history vector long enough to supply the previous $S$ infection values used in the convolution.
+The generation interval length determines the minimum initialization period: with a $G$-point generation interval distribution, the renewal equation at time $0$ needs an infection-history vector long enough to supply the previous $G$ infection values used in the convolution.
 
 ### Initial Conditions: `I0` and `initial_log_rt`
 
-These two parameters jointly define the infection history before the observation period begins. Understanding their interaction requires knowing how `SharedInfections` initializes the renewal equation.
-
-**The backprojection mechanism.** The renewal equation at day 0 needs a vector of recent infections to convolve with the generation interval. `SharedInfections` constructs this initialization vector via exponential backprojection:
-
-$$I_{\text{init}}(\tau) = I_0 \cdot e^{r \cdot \tau}, \quad \tau = 0, 1, \ldots, n_{\text{init}} - 1$$
-
-where $r$ is the asymptotic growth rate implied by the initial reproduction number $\mathcal{R}(0) = e^{\text{initial\\_log\\_rt}}$ and the generation interval. The function `r_approx_from_R` converts $\mathcal{R}(0)$ and the generation interval into $r$ using Newton's method.
-
-* **The level is set by `I0`**.</br>`I0` is the proportion of the population infected at the start of the initialization period. The renewal equation is linear in `I0`, so scaling `I0` by a factor $c$ scales the entire infection trajectory by $c$. In practice, `I0` is weakly identified by the data because observed counts depend on $\alpha_k \cdot I(t)$, where $\alpha_k$ is the ascertainment rate for signal $k$. Doubling `I0` while halving all ascertainment rates produces identical expected counts. Without external information to pin either the ascertainment rates or the absolute infection level, the data cannot distinguish "more infections, lower ascertainment" from "fewer infections, higher ascertainment." The prior on `I0` or on ascertainment resolves this ambiguity.
-
-* **The shape is set by `initial_log_rt`**. <br>`initial_log_rt` enters the model in two places: it is the starting point of the $\mathcal{R}(t)$ trajectory ($\mathcal{R}(0) = e^{\text{initial\\_log\\_rt}}$), and it determines the exponential growth rate $r$ used to fill the initialization vector. When `initial_log_rt = 0`, the growth rate $r = 0$ and the initialization is flat at level `I0`. When `initial_log_rt > 0`, infections are growing exponentially on day 0; when `initial_log_rt < 0`, they are declining.
-
-The initialization vector is what the renewal equation "sees" as recent infection history on day 0. We can compute it directly for three values of `initial_log_rt`:
+These two parameters jointly define the infection history before the observation
+period begins. Understanding their interaction requires knowing how the latent
+infections process initializes the renewal equation.
+
+**The backprojection mechanism.** The renewal equation at time $0$ needs a
+vector of recent infections to convolve with the generation interval. The latent
+infections process constructs this initialization vector via exponential
+backprojection. Let $\tau = 0, 1, \ldots, n_{\text{init}} - 1$ index positions
+in the initialization vector, where $\tau = 0$ is the earliest time point and
+$\tau = n_{\text{init}} - 1$ is the time point immediately before the
+observation period begins. Then:
+
+$$I_{\text{init}}(\tau) = I_0 \cdot e^{r \cdot \tau}, \quad \tau = 0, 1,
+\ldots, n_{\text{init}} - 1$$
+
+where $r$ is the asymptotic growth rate implied by the reproduction number at
+the start of the observation period, $\mathcal{R}(t=0) =
+e^{\text{initial\_log\_rt}}$, and the generation interval. The function
+`r_approx_from_R` converts $\mathcal{R}(t=0)$ and the generation interval into
+$r$ using Newton's method.
+
+* **The level is set by `I0`**.</br>
+`I0` is the infection prevalence at the earliest point in the initialization
+period, $n_{\text{init}} - 1$ time points before $t = 0$. It sets the scale of
+the entire initialization vector: $I_{\text{init}}(0) = I_0$, with subsequent
+entries growing or declining exponentially toward $t = 0$.
+
+* **The shape is set by `initial_log_rt`**.<br>
+`initial_log_rt` enters the model in two places: it is the starting point of
+the $\mathcal{R}(t)$ trajectory ($\mathcal{R}(t=0) =
+e^{\text{initial\_log\_rt}}$), and it determines the exponential growth rate
+$r$ used to construct the initialization vector. When `initial_log_rt = 0`,
+$r = 0$ and the initialization vector is flat at level `I0`. When
+`initial_log_rt > 0`, infections are growing exponentially at $t = 0$; when
+`initial_log_rt < 0`, they are declining.
+
+
+The initialization vector is what the renewal equation "sees" as recent infection history at time $0$. We can compute it directly for three values of `initial_log_rt`:
 
 ```{python}
 # | label: backprojection-compute
@@ -196,7 +224,7 @@ init_df = pd.DataFrame(init_data)
 )
 ```
 
-The initialization vector matters because the renewal equation is a convolution: infections on day 0 depend on infections from days $-1$ through $-(S-1)$, weighted by the generation interval. A flat initialization (stable) means the renewal equation starts with uniform recent history. A growing initialization means the most recent days have disproportionately more infections, which amplifies the effect of the generation interval's short-lag weights.
+The initialization vector matters because the renewal equation is a convolution: infections on day 0 depend on infections from days $-1$ through $-(K-1)$, weighted by the generation interval. A flat initialization (stable) means the renewal equation starts with uniform recent history. A growing initialization means the most recent days have disproportionately more infections, which amplifies the effect of the generation interval's short-lag weights.
 
 After day 0, the temporal process takes over. How quickly the trajectory departs from its initial behavior depends on the temporal process choice and its hyperparameters, which we examine in the next section.
 
@@ -205,7 +233,7 @@ After day 0, the temporal process takes over. How quickly the trajectory departs
 The temporal process governs how $\log \mathcal{R}(t)$ evolves day to day.
 
 To evaluate what a given process implies, we use **prior predictive checks**: drawing many samples from the model *before seeing any data* and examining the distribution of trajectories. A single sample tells you little (the trajectory depends on the random seed), but the envelope of many samples reveals the structural constraints built into the process.
-We fix the initial conditions to a moderately growing epidemic (`initial_log_rt = 0.2`, so $\mathcal{R}(0) \approx 1.22$) with `I0 = 0.001`. Starting from growth rather than equilibrium makes the behavioral differences between temporal processes visible: mean-reverting processes pull $\mathcal{R}(t)$ back, while non-reverting processes do not.
+We fix the initial conditions to a growing epidemic (`initial_log_rt = 0.5`, so $\mathcal{R}(0) \approx 1.65$) with `I0 = 0.001`. Starting well above equilibrium rather than near it makes the behavioral differences between temporal processes visible: the median trajectory of a mean-reverting process drifts back toward $\mathcal{R} = 1$, while a non-reverting process does not.
 
 This section is primarily **modeling guidance for prior specification in PyRenew**, not a claim that epidemiologic theory uniquely determines one temporal process choice. The appropriate process depends on the scientific setting, time horizon, and how strongly you want the prior to regularize latent transmission dynamics.
 
@@ -214,7 +242,7 @@ This section is primarily **modeling guidance for prior specification in PyRenew
 n_days = 28
 n_init = len(gen_int_pmf)
 n_samples = 200
-initial_log_rt = 0.2
+initial_log_rt = 0.5
 I0_val = 0.001
 rt_cap = 3.0
 
@@ -246,15 +274,44 @@ def sample_process(rt_process, label):
 
 
 def rt_spaghetti_plot(rt_array, title, color="steelblue"):
-    """Build a spaghetti plot of Rt trajectories from a (n_samples, n_days) array."""
-    data = []
-    for i in range(rt_array.shape[0]):
-        for d in range(rt_array.shape[1]):
-            data.append({"day": d, "rt": float(rt_array[i, d]), "sample": i})
-    df = pd.DataFrame(data)
+    """Build a spaghetti plot of Rt trajectories with a median + 90% ribbon overlay."""
+    n_traj, n_t = rt_array.shape
+    spaghetti_data = []
+    for i in range(n_traj):
+        for d in range(n_t):
+            spaghetti_data.append(
+                {"day": d, "rt": float(rt_array[i, d]), "sample": i}
+            )
+    df = pd.DataFrame(spaghetti_data)
+    summary_df = pd.DataFrame(
+        {
+            "day": np.arange(n_t),
+            "median": np.median(rt_array, axis=0),
+            "q05": np.percentile(rt_array, 5, axis=0),
+            "q95": np.percentile(rt_array, 95, axis=0),
+        }
+    )
     return (
-        p9.ggplot(df, p9.aes(x="day", y="rt", group="sample"))
-        + p9.geom_line(alpha=0.1, size=0.5, color=color)
+        p9.ggplot()
+        + p9.geom_line(
+            df,
+            p9.aes(x="day", y="rt", group="sample"),
+            alpha=0.08,
+            size=0.5,
+            color=color,
+        )
+        + p9.geom_ribbon(
+            summary_df,
+            p9.aes(x="day", ymin="q05", ymax="q95"),
+            fill=color,
+            alpha=0.25,
+        )
+        + p9.geom_line(
+            summary_df,
+            p9.aes(x="day", y="median"),
+            color=color,
+            size=1.0,
+        )
         + p9.geom_hline(
             yintercept=1.0, color="red", linetype="dashed", alpha=0.7
         )
@@ -284,7 +341,7 @@ def rt_summary(rt_array, label):
 
 ### Random Walk
 
-The simplest temporal process. Each day, $\log \mathcal{R}(t)$ equals the previous day's value plus independent noise:
+The simplest temporal process.
 
 $$x_t = x_{t-1} + \varepsilon_t, \quad \varepsilon_t \sim N(0, \sigma)$$
 
@@ -299,7 +356,7 @@ rw_samples = sample_process(RandomWalk(innovation_sd=0.05), "RandomWalk")
 
 ```{python}
 # | label: fig-rw
-# | fig-cap: Prior predictive Rt trajectories under a Random Walk (innovation_sd = 0.05). The envelope widens steadily over time because variance grows linearly. Trajectories drift in both directions from the starting value with no tendency to return.
+# | fig-cap: 'Prior predictive $\mathcal{R}(t)$ trajectories under a Random Walk (innovation_sd = 0.05). The envelope widens steadily over time because variance grows linearly. Trajectories drift in both directions from the starting value with no tendency to return.'
 rt_spaghetti_plot(rw_samples["rt"], "Random Walk (innovation_sd = 0.05)")
 ```
 
@@ -330,7 +387,7 @@ ar1_samples = sample_process(AR1(autoreg=0.9, innovation_sd=0.05), "AR1")
 
 ```{python}
 # | label: fig-ar1
-# | fig-cap: Prior predictive Rt trajectories under AR(1) (autoreg = 0.9, innovation_sd = 0.05). The envelope stabilizes rather than growing over time. Trajectories starting at Rt = 1.22 are pulled back toward Rt = 1.
+# | fig-cap: 'Prior predictive $\mathcal{R}(t)$ trajectories under AR(1) (autoreg = 0.9, innovation_sd = 0.05). The envelope stabilizes rather than growing over time. The median trajectory (solid line) drifts from the starting $\mathcal{R}(0)$ of 1.65 back toward 1, illustrating mean reversion of the level.'
 rt_spaghetti_plot(
     ar1_samples["rt"], "AR(1) (autoreg = 0.9, innovation_sd = 0.05)"
 )
@@ -341,49 +398,59 @@ rt_spaghetti_plot(
 rt_summary(ar1_samples["rt"], "AR(1) (autoreg = 0.9, innovation_sd = 0.05)")
 ```
 
+
+
 ### DifferencedAR1
 
-An AR(1) process applied to the *first differences* (day-to-day changes) of $\log \mathcal{R}(t)$. The rate of change is mean-reverting, but the level is not:
+DifferencedAR1 models *changes* in $\mathcal{R}(t)$ as an autoregressive process. In terms of increments,
 
-$$\Delta x_t = \phi \, \Delta x_{t-1} + \varepsilon_t, \quad \text{where } \Delta x_t = x_t - x_{t-1}$$
+$$
+\Delta x_t = \texttt{autoreg} \cdot \Delta x_{t-1} + \varepsilon_t, \quad x_t = x_{t-1} + \Delta x_t,
+$$
 
-Equivalently, $\log \mathcal{R}(t)$ is the cumulative sum of a mean-reverting rate-of-change process. If $\mathcal{R}(t)$ is trending upward, the trend can persist (unlike AR(1), which would pull it back). But the *acceleration* stabilizes (unlike a random walk, where the rate of change is unconstrained).
+so the autoregressive structure applies to the *rate of change* rather than the level.
 
-This is an integrated process (analogous to ARIMA(1,1,0) in time-series terminology). A critical consequence is that **variance grows faster than for a random walk**. Persistent rates of change accumulate, so the same `innovation_sd` produces a much wider envelope than a random walk with equal `innovation_sd`. To achieve comparable trajectory spread, DifferencedAR1 needs smaller `innovation_sd`.
+This has an important consequence: while changes in $\mathcal{R}(t)$ are mean-reverting, the level itself is not. As a result, trajectories can exhibit sustained upward or downward trends and may drift over time, even when the innovations are small.
+This process accumulates changes over time, so variability builds up in the level. Because successive changes are correlated, increases or decreases can persist for several steps in a row. These sustained runs can lead to wider trajectories than a random walk with the same `innovation_sd`, even though the individual innovations are small.
 
 **Hyperparameters:**
 
 - `autoreg` ($\phi$): controls persistence of the rate of change. Values near 1 mean trends persist longer; values near 0 mean trends dissipate quickly.
 - `innovation_sd` ($\sigma$): standard deviation of shocks to the rate of change (not to the level).
 
-We use `innovation_sd = 0.01` (smaller than the 0.05 used for Random Walk and AR(1) above) to produce a comparable visual envelope:
+We use `innovation_sd = 0.01` (smaller than the 0.05 used for Random Walk and AR(1) above).
 
 ```{python}
 # | label: dar1-sample
 dar1_samples = sample_process(
-    DifferencedAR1(autoreg=0.9, innovation_sd=0.01), "DifferencedAR1"
+    DifferencedAR1(autoreg=0.5, innovation_sd=0.01), "DifferencedAR1"
 )
 ```
 
 ```{python}
 # | label: fig-dar1
-# | fig-cap: Prior predictive Rt trajectories under DifferencedAR1 (autoreg = 0.9, innovation_sd = 0.01). Trajectories can trend persistently in one direction, unlike AR(1). The rate of change stabilizes, unlike a random walk. Note the smaller innovation_sd compared to the other processes.
+# | fig-cap: 'DifferencedAR1 allows sustained directional trends because the increments are autocorrelated. Even with a smaller innovation scale than the other examples, the induced prior on $\mathcal{R}(t)$ can still be fairly diffuse over this time horizon.'
 rt_spaghetti_plot(
     dar1_samples["rt"],
-    "DifferencedAR1 (autoreg = 0.9, innovation_sd = 0.01)",
+    "DifferencedAR1 (autoreg = 0.5, innovation_sd = 0.01)",
 )
 ```
 
 ```{python}
 # | label: dar1-summary
 rt_summary(
-    dar1_samples["rt"], "DifferencedAR1 (autoreg = 0.9, innovation_sd = 0.01)"
+    dar1_samples["rt"], "DifferencedAR1 (autoreg = 0.5, innovation_sd = 0.01)"
 )
 ```
 
 ### Comparing the Three Processes
 
-The plots above use different `innovation_sd` values for each process. This is intentional: `innovation_sd` controls different quantities in each process (daily step size for RandomWalk, noise around mean reversion for AR(1), shock to the rate of change for DifferencedAR1), so equal numerical values do not produce equal spread.
+The plots above use different hyperparameter values for each process. This is intentional: the parameters `innovation_sd` and `autoreg` play different roles across processes (daily step size for RandomWalk, noise around the level for AR(1), and shocks to the *rate of change* for DifferencedAR1), so equal numerical values are not directly comparable.
+
+For AR(1), `autoreg` controls how strongly $\mathcal{R}(t)$ is pulled back toward $1$, directly limiting variation in the level. For DifferencedAR1, `autoreg` instead controls how persistent changes are over time. Because these changes accumulate, even small innovations can produce substantial drift in $\mathcal{R}(t)$ when persistence is high.
+
+As a result, DifferencedAR1 can exhibit wider trajectories than might be expected from the scale of `innovation_sd` alone. Hyperparameters are chosen to illustrate characteristic behavior rather than to match marginal variability, so the resulting spreads are not directly comparable across panels.
+
 
 ```{python}
 # | label: comparison-data
@@ -391,7 +458,7 @@ comparison_data = []
 process_labels = {
     "Random Walk\n(sd=0.05)": rw_samples,
     "AR(1)\n(autoreg=0.9, sd=0.05)": ar1_samples,
-    "DifferencedAR1\n(autoreg=0.9, sd=0.01)": dar1_samples,
+    "DifferencedAR1\n(autoreg=0.5, sd=0.01)": dar1_samples,
 }
 for label, s in process_labels.items():
     rt = s["rt"]
@@ -415,7 +482,7 @@ comparison_df["process"] = pd.Categorical(
 
 ```{python}
 # | label: fig-comparison
-# | fig-cap: Side-by-side comparison of all three temporal processes using median trajectories and 90% prior intervals. Hyperparameters are chosen to produce broadly comparable envelopes; the main difference is trajectory shape.
+# | fig-cap: Side-by-side comparison of all three temporal processes using median trajectories and 90% prior intervals.
 (
     p9.ggplot(
         comparison_df.groupby(["process", "day"])["rt"]
@@ -446,34 +513,53 @@ comparison_df["process"] = pd.Categorical(
 ```
 
 ```{python}
-# | label: comparison-summary
-print(f"Rt at day {n_days} by temporal process:")
-print(
-    f"{'Process':<20} {'innovation_sd':>14} {'Median':>8} {'90% interval':>20} {'Max':>8}"
-)
-print("-" * 74)
-for label, sd, s in [
-    ("Random Walk", 0.05, rw_samples),
-    ("AR(1)", 0.05, ar1_samples),
-    ("DifferencedAR1", 0.01, dar1_samples),
+# | label: tbl-comparison-summary
+# | tbl-cap: '$\mathcal{R}(t)$ at day 28 by temporal process.'
+summary_rows = []
+for label, ar, sd, s in [
+    ("Random Walk", None, 0.05, rw_samples),
+    ("AR(1)", 0.9, 0.05, ar1_samples),
+    ("DifferencedAR1", 0.5, 0.01, dar1_samples),
 ]:
     rt = s["rt"][:, -1]
-    print(
-        f"{label:<20} {sd:>14.2f} {np.median(rt):>8.2f} "
-        f"[{np.percentile(rt, 5):>6.2f}, {np.percentile(rt, 95):>6.2f}]"
-        f"{np.max(rt):>11.1f}"
+    summary_rows.append(
+        {
+            "Process": label,
+            "autoreg": "-" if ar is None else f"{ar:.2f}",
+            "innovation_sd": f"{sd:.2f}",
+            "Median": f"{np.median(rt):.2f}",
+            "5%": f"{np.percentile(rt, 5):.2f}",
+            "95%": f"{np.percentile(rt, 95):.2f}",
+            "Max": f"{np.max(rt):.1f}",
+        }
     )
+pd.DataFrame(summary_rows)
 ```
 
 **When to use which process:**
 
-- **AR(1)** assumes $\mathcal{R}(t)$ reverts to a baseline. Appropriate when modeling a pathogen near endemic equilibrium, or over short time horizons where large sustained departures from the current level are implausible.
+- **AR(1)** assumes $\mathcal{R}(t)$ reverts to $1$. Appropriate when modeling a pathogen near endemic equilibrium, or over short time horizons where large sustained departures from the current level are implausible.
 - **Random Walk** assumes no preferred direction and no memory beyond the current value. Appropriate when you have little prior knowledge about whether $\mathcal{R}(t)$ will increase or decrease, and the data are dense enough to constrain the trajectory.
 - **DifferencedAR1** assumes trends can persist but the rate of change stabilizes. Appropriate for epidemic waves where $\mathcal{R}(t)$ can sustain a direction of movement (e.g., rising during a wave, declining after interventions).
 
 ### Effect of Hyperparameters
 
-The hyperparameters `autoreg` and `innovation_sd` control how tightly the prior constrains $\mathcal{R}(t)$. We illustrate with DifferencedAR1, varying both parameters:
+The hyperparameters `autoreg` and `innovation_sd` control how tightly the prior constrains $\mathcal{R}(t)$, but their roles differ in important ways for DifferencedAR1.\
+
+The `innovation_sd` parameter sets the scale of shocks to the *rate of change* in $\mathcal{R}(t)$, while `autoreg` controls how persistent those changes are over time. In terms of increments, where $\phi =$ `autoreg`
+
+
+$$\Delta x_t = \phi \cdot \Delta x_{t-1} + \varepsilon_t$$
+
+`autoreg` ($\phi$) determines how strongly each change carries forward into the next. The level then accumulates these changes via $x_t = x_{t-1} + \Delta x_t$.
+
+When `autoreg` is large (e.g., 0.9), increments are highly persistent: a positive change in $\mathcal{R}(t)$ is likely to be followed by further positive changes. This produces sustained upward or downward trends, even when `innovation_sd` is small. Because these persistent changes accumulate in the level, the resulting prior over $\mathcal{R}(t)$ can be relatively diffuse over time.
+
+Lowering `autoreg` reduces this persistence. Increments decorrelate more quickly, causing positive and negative changes to cancel out, which limits long runs of same-direction movement and produces a tighter prior over $\mathcal{R}(t)$.
+
+Importantly, this differs from AR(1): in AR(1), `autoreg` controls mean reversion of the *level*, directly shrinking $\mathcal{R}(t)$ toward its mean. In DifferencedAR1, only the *changes* are mean-reverting, so the level itself can drift. As a result, controlling long-term variability in DifferencedAR1 typically requires adjusting both `innovation_sd` (step size) and `autoreg` (persistence of trends).
+
+We illustrate these effects below for DifferencedAR1 by varying both parameters:
 
 ```{python}
 # | label: hyperparam-sample
@@ -508,7 +594,7 @@ hp_df["config"] = pd.Categorical(
 
 ```{python}
 # | label: fig-hyperparam-effect
-# | fig-cap: Effect of DifferencedAR1 hyperparameters on prior Rt distribution. Lower autoreg and innovation_sd (left) produce a tighter envelope; higher values (right) allow wider drift. All start at Rt = 1.22.
+# | fig-cap: 'Effect of DifferencedAR1 hyperparameters on prior $\mathcal{R}(t)$ distribution. Lower autoreg and innovation_sd (left) produce a tighter envelope; higher values (right) allow wider drift. All start at $\mathcal{R}(0)$ = 1.65.'
 (
     p9.ggplot(hp_df, p9.aes(x="day", y="rt", group="sample"))
     + p9.geom_line(alpha=0.1, size=0.5, color="steelblue")
@@ -518,23 +604,27 @@ hp_df["config"] = pd.Categorical(
     + p9.labs(
         x="Days",
         y="Rt",
-        title="DifferencedAR1: Effect of Hyperparameters on Prior Rt",
+        title=r"DifferencedAR1: Effect of Hyperparameters on Prior $\mathcal{R}(t)$",
     )
     + theme_tutorial
 )
 ```
 
 ```{python}
-# | label: hyperparam-summary
-print(f"Rt at day {n_days} by DifferencedAR1 configuration:")
-print(f"{'Config':<28} {'Median':>8} {'90% interval':>20}")
-print("-" * 58)
+# | label: tbl-hyperparam-summary
+# | tbl-cap: '$\mathcal{R}(t)$ at day 28 by DifferencedAR1 configuration.'
+hp_summary_rows = []
 for name in hyperparam_configs:
     rt = hyperparam_samples[name]["rt"][:, -1]
-    print(
-        f"{name:<28} {np.median(rt):>8.2f} "
-        f"[{np.percentile(rt, 5):>6.2f}, {np.percentile(rt, 95):>6.2f}]"
+    hp_summary_rows.append(
+        {
+            "Config": name,
+            "Median": f"{np.median(rt):.2f}",
+            "5%": f"{np.percentile(rt, 5):.2f}",
+            "95%": f"{np.percentile(rt, 95):.2f}",
+        }
     )
+pd.DataFrame(hp_summary_rows)
 ```
 
 Both hyperparameters matter, and they interact:
@@ -573,7 +663,7 @@ inf_comparison_df["process"] = pd.Categorical(
 
 ```{python}
 # | label: fig-infection-comparison
-# | fig-cap: Prior predictive infection trajectories (log scale). The nonlinear transformation from Rt to infections amplifies differences between temporal processes. The vertical spread represents orders-of-magnitude differences in infection levels.
+# | fig-cap: 'Prior predictive infection trajectories (log scale). The nonlinear transformation from $\mathcal{R}(t)$ to infections amplifies differences between temporal processes. The vertical spread represents orders-of-magnitude differences in infection levels.'
 (
     p9.ggplot(
         inf_comparison_df, p9.aes(x="day", y="infections", group="sample")
diff --git a/docs/tutorials/observation_processes_counts.qmd b/docs/tutorials/observation_processes_counts.qmd
index e48be0d0..9ad4724c 100644
--- a/docs/tutorials/observation_processes_counts.qmd
+++ b/docs/tutorials/observation_processes_counts.qmd
@@ -40,41 +40,79 @@ from pyrenew import datasets
 
 ## Overview
 
-Count observation processes model the lag between infections and an observed outcome such as hospital admissions, emergency department visits, confirmed cases, or deaths.
-Observed data can be aggregated or available as subpopulation-level counts, which are modeled by classes `Counts` and `CountsBySubpop`, respectively.
+Count observation processes model how latent infections give rise to observed data such as hospital admissions, emergency department visits, confirmed cases, or deaths.
 
-Count observation processes transform infections into predicted counts by applying an event probability and/or ascertainment rate and convolving with a delay distribution.
+In the renewal modeling framework, observations are generated in two steps:
 
-The predicted observations on day $t$ are:
+1. A **deterministic transformation**, where infections are mapped to an expected number of observed events via an ascertainment rate and a delay distribution.
+2. A **stochastic observation model**, where observed counts are drawn from a distribution with that predicted value.
 
-$$\mu(t) = \alpha \sum_{s} I(t-s) \, \pi(s)$$
+Observed data can be aggregated or available as subpopulation-level counts, which are modeled by the classes `Counts` and `CountsBySubpop`, respectively.
+
+### Observation equation
+
+The deterministic transformation is given by the observation equation:
+
+$$
+\mu(t) = \alpha \sum_{d=0}^{D} I(t-d)\, \pi_d
+$$
 
 where:
 
-- $I(t-s)$ is the number of incident (new) infections on day $t-s$ (i.e., $s$ days before day $t$)
-- $\alpha$ is the rate of ascertained counts per infection (e.g., infection-to-hospital admission rate). This can model a mix of biological effects (e.g. some percentage of infections lead to hospital admissions, but not all) and reporting effects (e.g. some percentage of admissions that occur are reported, but not all).
-- $\pi(s)$ is the delay distribution from infection to observation, conditional on an infection leading to an observation
+- $I(t-d)$ is the number of incident (new) infections on day $t-d$
+- $\alpha$ is the **ascertainment rate**, the probability that an infection results in an observed event (e.g., hospitalization)
+- $\pi_d$ is the delay distribution from infection to observation, conditional on an infection leading to an observed event
+
+The delay distribution has finite support over lags $d = 0, \dots, D$ and satisfies
+
+$$
+\sum_{d=0}^{D} \pi_d = 1, \qquad \pi_d \ge 0.
+$$
+
+This equation is a discrete convolution: observations at time $t$ arise from infections that occurred $d$ days earlier, weighted by the delay distribution.
+
+### Stochastic observation model
+
+The observation equation defines the expected number of observed events, but the actual data are stochastic.
+
+Let $Y(t)$ denote the observed number of events at time $t$. Observations are modeled as draws from a count distribution with central value (typically but not necessary mean) $\mu(t)$:
+
+$$
+Y(t) \sim \text{Distribution}(\mu(t), \theta).
+$$
 
-Discrete observations are generated by sampling from a noise distribution—e.g. Poisson or negative binomial—to model reporting variability.
-Poisson assumes variance equals the mean; negative binomial accommodates the overdispersion common in surveillance data.
+One possible choice is the Poisson distribution, which assumes the variance equals the mean.
+In practice, epidemiological count data are often overdispersed relative to the Poisson. Negative binomial distributions are a common choice for modeling these overdispersed counts.
 
-**Note on terminology:** In real-world inference, incident infections are typically a *latent* (unobserved) quantity and must be estimated from observed data like hospital admissions. In this tutorial, we simulate the observation process by specifying infections directly and showing how they produce hospital admissions through convolution and sampling.
+
+This yields a two-layer model:
+
+- A **mechanistic layer**, where the delay convolution determines the predicted number of observations $\mu(t)$ from latent infections $I(t)$.
+- A **stochastic observation layer**, where observed counts $Y(t)$ vary around $\mu(t)$ according to a specified distribution.
+
+**Note on terminology:** In real-world inference, incident infections $I(t)$ are typically *latent* (unobserved) and must be inferred from observed data. In this tutorial, we simulate the observation process by specifying infections directly and showing how they produce observed counts through convolution and sampling.
 
 ## Hospital admissions example
 
 For hospital admissions data, we construct a `Counts` observation process.
-The delay is the key mechanism: infections from $s$ days ago ($I(t-s)$) contribute to today's expected hospital admissions ($\mu(t)$) weighted by the probability ($\pi(s)$) that an infection leads to hospitalization after exactly $s$ days. The convolution sums these contributions across all past days.
 
-The process generates hospital admissions by sampling from a negative binomial distribution:
+The delay is the key mechanism: infections from $d$ days ago ($I(t-d)$) contribute to today's predicted hospital admissions ($\mu(t)$), weighted by the probability $\pi_d$ that an infection leads to hospitalization after exactly $d$ days. The convolution sums these contributions across all past days.
+
+Observed hospital admissions are then generated by sampling from a negative binomial distribution:
+
+$$
+Y(t) \sim \text{NegBinomial}(\mu(t), \phi)
+$$
 
-$$Y_t \sim \text{NegativeBinomial}(\text{mean} = \mu(t), \text{concentration} = \phi)$$
+where $\phi$ is a concentration parameter controlling variability.
 
-The concentration parameter $\phi$ (sometimes called $k$ or the dispersion parameter) controls overdispersion: as $\phi \to \infty$, the distribution approaches Poisson; smaller values allow greater overdispersion.
+The negative binomial distribution is commonly used because real-world count data are overdispersed, meaning the variance exceeds the mean.
 
-We use the negative binomial distribution because real-world hospital admission counts exhibit overdispersion—the variance exceeds the mean.
-The Poisson distribution assumes variance equals the mean, which is too restrictive. The negative binomial adds an overdispersion term:
+$$
+\mathrm{Var}[Y(t)] = \mu(t) + \frac{\mu(t)^2}{\phi}.
+$$
 
-$$\text{Var}[Y_t] = \mu + \frac{\mu^2}{\phi}$$
+As $\phi \to \infty$, the negative binomial distribution approaches a Poisson distribution.
 
 In this example, we use fixed parameter values for illustration; in practice, these parameters would be estimated from data using weakly informative priors.
 
@@ -192,7 +230,7 @@ def first_valid_observation_day(obs_process) -> int:
     return obs_process.lookback_days()
 ```
 
-## Simulating observed hospital admissions given a single day's worth of infections
+## Simulating observations from a single-day infection pulse
 
 To demonstrate how a `Counts` observation process works, we examine how infections occurring on a single day result in observed hospital admissions.
 
@@ -248,11 +286,13 @@ plot_infections = (
 plot_infections
 ```
 
-Because all infections occur on a single day, this allows us to see how one day's worth of infections result in hospital admissions spread over subsequent days according to the delay distribution.
+Because all infections occur on a single day, this example shows how a single pulse of infections produces observed events over time through the delay distribution.
 
 ## Predicted admissions without observation noise.
 
-First, we compute the predicted admissions from the convolution alone, without observation noise. This is the mean of the distribution from which samples are drawn.
+First, we compute the predicted admissions from the convolution alone, without observation noise.
+This gives the predicted number of observations $\mu(t)$.
+
 
 ```{python}
 # | label: predicted-no-noise
@@ -261,6 +301,7 @@ from pyrenew.convolve import compute_delay_ascertained_incidence
 
 # Scale infections by IHR (ascertainment rate)
 infections_scaled = infections * float(ihr_rv.sample())
+
 predicted_admissions, offset = compute_delay_ascertained_incidence(
     p_observed_given_incident=1.0,
     latent_incidence=infections_scaled,
@@ -269,6 +310,8 @@ predicted_admissions, offset = compute_delay_ascertained_incidence(
 )
 ```
 
+*Note:* in the above implementation, the ascertainment rate is applied by scaling infections before convolution. This is equivalent to applying $\alpha$ after the convolution in the observation equation.
+
 ```{python}
 # | label: plot-predicted-no-noise
 # Relative peak day for plotting
@@ -327,12 +370,12 @@ plot_predicted = (
 plot_predicted
 ```
 
-The predicted admissions mirror the delay distribution, shifted by the infection spike day and scaled by the IHR.
-
+The predicted admissions follow the shape of the delay distribution, shifted by the infection spike day and scaled by the ascertainment rate.
 
 ## Observation Noise (Negative Binomial)
 
-The negative binomial distribution adds stochastic variation. Sampling multiple times from the same infections shows the range of possible observations:
+The negative binomial distribution adds stochastic variation around $\mu(t)$, corresponding to the stochastic observation layer.
+Sampling multiple times from the same infections shows the range of possible observations:
 
 ```{python}
 # | label: sample-realizations
@@ -491,9 +534,11 @@ The concentration parameter $\phi$ controls overdispersion:
 
 We compare three concentration values spanning two orders of magnitude:
 
-- **φ = 1**: high overdispersion (noisy)
-- **φ = 10**: moderate overdispersion
-- **φ = 100**: nearly Poisson (minimal noise)
+- **$\phi$ = 1**: high overdispersion (noisy)
+- **$\phi$ = 10**: moderate overdispersion
+- **$\phi$ = 100**: nearly Poisson (minimal noise)
+
+We hold daily infections constant over time so that any variation in the observed counts comes entirely from the observation model.
 
 ```{python}
 # | label: concentration-comparisons
@@ -529,7 +574,7 @@ for conc_val in concentration_values:
                 {
                     "day": i,
                     "admissions": float(admit),
-                    "concentration": f"φ = {int(conc_val)}",
+                    "concentration": f"phi = {int(conc_val)}",
                     "replicate": seed,
                 }
             )
@@ -542,7 +587,7 @@ conc_df = pd.DataFrame(conc_results)
 # Convert to ordered categorical
 conc_df["concentration"] = pd.Categorical(
     conc_df["concentration"],
-    categories=["φ = 1", "φ = 10", "φ = 100"],
+    categories=["phi = 1", "phi = 10", "phi = 100"],
     ordered=True,
 )
 
diff --git a/docs/tutorials/observation_processes_measurements.qmd b/docs/tutorials/observation_processes_measurements.qmd
index c19ed61b..9728bcb7 100644
--- a/docs/tutorials/observation_processes_measurements.qmd
+++ b/docs/tutorials/observation_processes_measurements.qmd
@@ -53,11 +53,18 @@ The `Measurements` class models continuous signals derived from infections. Unli
 
 ### The general pattern
 
-All measurement observation processes follow the same pattern:
+All measurement observation processes follow the same two-layer structure as the count observation model:
 
-$$y_t \sim \text{Noise}(\mu(t))$$
+1. A deterministic transformation defining the expected measurement value
+2. A stochastic observation model
 
-where $\mu(t)$ is the predicted measurement value on day $t$, computed from infections via a domain-specific transformation, and the noise model adds stochastic variation around predictions.
+Let $\mu(t)$ denote the expected measurement at time $t$. Observations are modeled as
+
+$$
+Y(t) \sim \text{Noise}(f(\mu(t))),
+$$
+
+where $f(\cdot)$ is a transformation (often the identity or log function), and the noise model adds stochastic variation around this transformed prediction.
 
 Subclasses implement `_predicted_obs()` to compute $\mu(t)$ from infections. PyRenew provides the noise model.
 
@@ -90,7 +97,7 @@ Measurement data typically exhibits **sensor-level variability**: different inst
 - **Sensor SD**: Measurement precision (noise level)
 
 ```
-observed ~ Normal(predicted + sensor_mode[sensor], sensor_sd[sensor])
+observed ~ Normal(f(\mu(t)) + sensor_mode[sensor], sensor_sd[sensor])
 ```
 
 The sensor-level RVs must implement `sample(n_groups=...)`. Use `VectorizedVariable` to wrap simple distributions:
@@ -173,26 +180,49 @@ based on the [PyRenew-HEW](https://github.com/cdcgov/pyrenew-multisignal) family
 **The wastewater signal**
 
 Wastewater treatment plants measure viral RNA concentrations in sewage.
-The predicted concentration depends on:
+The concentration is determined by:
 
 - **Infections**: People shed virus into wastewater
 - **Shedding kinetics**: Viral shedding peaks a few days after infection
-- **Scaling factors**: Genome copies per infection, wastewater volume
+- **Scaling factors**: Genome copies per infection and wastewater volume
+
+### Observation model
+
+The predicted concentration at time $t$ is given by
+
+$$
+\mu(t) = \frac{G}{V} \sum_{d=0}^{D} I(t-d)\, \pi_d,
+$$
+
+where:
+
+- $I(t-d)$ is the number of infections on day $t-d$
+- $\pi_d$ is the shedding kinetics PMF, giving the fraction of total shedding occurring $d$ days after infection, analogous to the delay distribution in count observation models
+- $G$ is the number of genome copies shed per infection
+- $V$ is the wastewater volume per person per day
+
+This has the same convolution structure as the observation equation for count data, with a different transformation applied to the expected value.
 
-The predicted log-concentration on day $t$ is:
+We model the observed **log-concentration** as
 
-$$\log(\mu(t)) = \log\left(\frac{G}{V} \cdot \sum_{s} I(t-s) \, \pi(s)\right)$$
+$$
+Y(t) \sim \text{Normal}\big(\log(\mu(t)) + \text{sensor\_mode}, \ \text{sensor\_sd}\big),
+$$
 
 where:
 
-- $I(t-s)$ is infections on day $t-s$
-- $\pi(s)$ is the shedding kinetics PMF (fraction shed on day $s$ post-infection)
-- $G$ is genome copies shed per infection
-- $V$ is wastewater volume per person per day
+- $Y(t)$ is the observed log-concentration at time $t$
+- $\text{sensor\_mode}$ represents a sensor-specific bias (additive shift on the log scale)
+- $\text{sensor\_sd}$ represents the measurement variability for that sensor
 
-Observations are log-concentrations with normal noise:
+### Interpretation
 
-$$y_t \sim \text{Normal}(\log(\mu(t)) + \text{sensor\_mode}, \text{sensor\_sd})$$
+This model follows the same two-layer structure used for count observations:
+
+- A **deterministic layer**, where the convolution defines the predicted concentration $\mu(t)$ from latent infections
+- A **stochastic observation layer**, where observed measurements $Y(t)$ vary around $\log(\mu(t))$ due to sensor noise
+
+The key difference from count data is that measurements are modeled on a continuous (often log-transformed) scale, and variability is represented using a normal distribution rather than a count distribution.
 
 ### Implementing the Wastewater class
 
@@ -253,8 +283,8 @@ class Wastewater(Measurements):
 
     def _predicted_obs(self, infections: ArrayLike) -> ArrayLike:
         """
-        Compute predicted log-concentration from infections.
-
+        Compute the log of the expected concentration from infections.
+        This corresponds to log(mu(t)), where mu(t) is defined by the observation equation.
         Applies shedding kinetics convolution, then scales by
         genome copies and volume to get concentration.
         """
diff --git a/pyproject.toml b/pyproject.toml
index e72a52b0..3f84cf70 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -26,12 +26,14 @@ dev = [
     "mkdocstrings>=0.30.0",
     "mkdocstrings-python>=1.18.2",
     "nbconvert>=7.16.6",
+    "pandas>=3.0.2",
+    "plotnine>=0.14.0",
+    "pre-commit>=4.5.1",
     "pytest>=8.4.2",
     "pytest-cov>=6.3.0",
-    "plotnine>=0.14.0",
     "pytest-mpl>=0.17.0",
-    "scipy>=1.16.1",
     "ruff>=0.15.0",
+    "scipy>=1.16.1",
 ]
 
 [build-system]