diff --git a/CHANGELOG.md b/CHANGELOG.md
index 95183eb..b39abda 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,14 @@ for versioning even while in research-stage development.
 
 ### Added
 
+- Review-driven circadian updates in NumPy and ResNet circadian cores:
+  - optional reward-modulated wake learning (`use_reward_modulated_learning`)
+  - optional adaptive sleep budget scaling (`use_adaptive_sleep_budget`)
+  - `get_last_reward_scale()` telemetry helper
+- Baseline and ResNet benchmark CLI flags for reward modulation and adaptive sleep budget controls.
+- Review follow-up docs:
+  - `docs/circadian-model-review-notes.md`
+  - `docs/adr/ADR-0004-reward-modulated-wake-and-adaptive-sleep-budget.md`
 - Open-source community baseline files:
   - `LICENSE` (MIT)
   - `CODE_OF_CONDUCT.md`
@@ -41,6 +49,9 @@ for versioning even while in research-stage development.
 
 ### Changed
 
+- ResNet benchmark defaults now enable adaptive sleep budget scaling by default while keeping reward-modulated learning disabled by default.
+- Updated circadian unit tests (NumPy + Torch) with coverage for reward scaling and adaptive budget behavior.
+- Updated README, model card, and core module docs to document new circadian controls.
 - Repositioned repository messaging to Circadian Predictive Coding as the primary focus.
 - Updated `README.md` with:
   - circadian-first project framing
diff --git a/README.md b/README.md
index 410bf63..163f9b0 100644
--- a/README.md
+++ b/README.md
@@ -48,6 +48,7 @@ This lets model capacity adapt over time instead of staying fixed.
 - NumPy circadian predictive coding baseline for small-scale experiments
 - Torch ResNet-50 benchmark pipeline for speed and accuracy comparisons
 - Adaptive sleep triggers, adaptive split/prune thresholds, dual-timescale chemical dynamics
+- Reward-modulated wake learning and adaptive sleep budget scaling (NumPy + ResNet circadian head)
 - Function-preserving split behavior and guarded sleep rollback
 - Multi-seed benchmark runner with JSON/CSV output
 
@@ -141,6 +142,12 @@ Toy baseline:
 python predictive_coding_experiment.py
 ```
 
+Toy baseline with review-driven circadian controls:
+
+```powershell
+python predictive_coding_experiment.py --adaptive-sleep-trigger --adaptive-sleep-budget --reward-modulated-learning --reward-scale-min 0.8 --reward-scale-max 1.4
+```
+
 ResNet benchmark (all 3 models):
 
 ```powershell
@@ -183,6 +190,7 @@ pytest -q
 - Governance: [GOVERNANCE.md](GOVERNANCE.md)
 - Support process: [SUPPORT.md](SUPPORT.md)
 - Model Card: [docs/model-card.md](docs/model-card.md)
+- Review Notes: [docs/circadian-model-review-notes.md](docs/circadian-model-review-notes.md)
 
 ## Citation
 
diff --git a/docs/adr/ADR-0004-reward-modulated-wake-and-adaptive-sleep-budget.md b/docs/adr/ADR-0004-reward-modulated-wake-and-adaptive-sleep-budget.md
new file mode 100644
index 0000000..ffa8110
--- /dev/null
+++ b/docs/adr/ADR-0004-reward-modulated-wake-and-adaptive-sleep-budget.md
@@ -0,0 +1,47 @@
+# ADR-0004: Add Reward-Modulated Wake Updates and Adaptive Sleep Budget Scaling
+
+## Context
+
+External model review highlighted two practical gaps in the circadian learning loop:
+
+- wake learning lacks an explicit task-relevance modulation signal
+- sleep structural budgets rely heavily on fixed schedules/hyperparameters
+
+We need incremental improvements that keep the model deterministic, lightweight, and easy to test.
+
+## Decision
+
+Add two optional mechanisms to both circadian implementations:
+
+- `CircadianPredictiveCodingNetwork` (NumPy baseline)
+- `CircadianPredictiveCodingHead` (ResNet benchmark path)
+
+1. Reward-modulated wake updates
+- Compute batch difficulty from mean absolute output error.
+- Compare difficulty against an EMA baseline.
+- Scale learning updates by a clipped reward factor.
+
+2. Adaptive sleep budget scaling
+- Compute a budget scale from:
+  - recent energy plateau severity
+  - current hidden chemical variance
+- Apply the scale before enforcing configured split/prune limits and fraction caps.
+
+Expose both controls through baseline CLI and ResNet benchmark CLI flags. Keep defaults conservative (`off`) for backward compatibility.
+
+## Alternatives Considered
+
+1. Add RL/Bayesian meta-controller for sleep scheduling.
+- Rejected for now: larger complexity and harder reproducibility.
+
+2. Modulate split/prune ranking directly with reward first.
+- Deferred: needs clearer attribution of neuron-level contribution and stronger evaluation harness.
+
+3. Apply reward/adaptive budget in ResNet path first.
+- Initially rejected for sequencing reasons, then implemented after NumPy validation.
+
+## Consequences
+
+- Circadian wake updates can prioritize harder batches without changing model topology.
+- Sleep events become less dependent on manual tuning while staying deterministic.
+- New behavior is opt-in, preserving existing experiments by default.
diff --git a/docs/circadian-model-review-notes.md b/docs/circadian-model-review-notes.md
new file mode 100644
index 0000000..7bb9670
--- /dev/null
+++ b/docs/circadian-model-review-notes.md
@@ -0,0 +1,38 @@
+# Circadian Model Review Notes
+
+Review source: `C:\Users\Avery\Downloads\Circadian model review.pdf`
+
+## Scope
+
+This note maps review recommendations to concrete repository changes and follow-up work.
+
+## Implemented in this pass
+
+1. Reward-modulated wake learning
+- Added optional reward scaling in `CircadianPredictiveCodingNetwork` wake updates.
+- Batch difficulty is measured from mean absolute output error relative to an EMA baseline.
+- Scale is clipped via config to keep updates stable.
+- Why this: gives a simple task-relevance signal without changing the core predictive-coding math.
+
+2. Adaptive sleep budget scaling
+- Added optional scaling for split/prune budgets based on:
+  - energy plateau severity
+  - hidden chemical variance
+- Preserves `max_split_per_sleep` and `max_prune_per_sleep` as hard caps.
+- Why this: reduces manual schedule sensitivity while staying deterministic and lightweight.
+
+3. CLI exposure for new controls
+- Added baseline CLI flags to toggle reward modulation and adaptive sleep budget behavior.
+
+4. ResNet circadian parity
+- Added the same reward-modulated wake learning and adaptive sleep budget scaling to `CircadianPredictiveCodingHead`.
+- Exposed benchmark CLI/config knobs so ResNet benchmark runs can enable the mechanisms.
+
+5. Test coverage
+- Added unit tests for reward scaling behavior and adaptive budget expansion/contraction in both NumPy and Torch circadian paths.
+
+## Still pending (next iterations)
+
+1. Explore deeper-layer structural plasticity (current ResNet adaptation remains head-focused).
+2. Prototype faster inference variants (reduced steps or amortized inference).
+3. Evaluate reward-biased split/prune ranking directly (currently reward influences wake updates and importance EMA).
diff --git a/docs/model-card.md b/docs/model-card.md
index e25817c..aeef8ed 100644
--- a/docs/model-card.md
+++ b/docs/model-card.md
@@ -21,7 +21,9 @@ It tracks per-neuron chemical usage, modulates plasticity during wake, and appli
 - Extension: circadian dynamics:
   - chemical accumulation and decay
   - plasticity gating
+  - reward-modulated wake learning (optional)
   - adaptive sleep triggers
+  - adaptive sleep budget scaling (optional)
   - structural split/prune
   - optional rollback and homeostatic controls
 
diff --git a/docs/modules/core.md b/docs/modules/core.md
index c300e88..1070e4b 100644
--- a/docs/modules/core.md
+++ b/docs/modules/core.md
@@ -4,6 +4,7 @@
 
 - Define model behavior (`BackpropMLP`, `PredictiveCodingNetwork`, `CircadianPredictiveCodingNetwork`)
 - Define ResNet-50 benchmark variants (`BackpropResNet50Classifier`, `PredictiveCodingResNet50Classifier`, `CircadianPredictiveCodingResNet50Classifier`)
+- Implement circadian mechanisms such as chemical gating, reward-modulated wake updates, and adaptive sleep budgeting
 - Provide activation utilities
 - Define neuron adaptation interfaces and traffic summaries
 
diff --git a/src/adapters/cli.py b/src/adapters/cli.py
index 590037c..1f7dfcc 100644
--- a/src/adapters/cli.py
+++ b/src/adapters/cli.py
@@ -117,6 +117,37 @@ def build_argument_parser() -> argparse.ArgumentParser:
         type=float,
         default=circadian_defaults.sleep_chemical_variance_threshold,
     )
+    parser.add_argument(
+        "--adaptive-sleep-budget",
+        action="store_true",
+        help="Scale split/prune budgets based on plateau severity and chemical variance.",
+    )
+    parser.add_argument(
+        "--adaptive-sleep-budget-min-scale",
+        type=float,
+        default=circadian_defaults.adaptive_sleep_budget_min_scale,
+    )
+    parser.add_argument(
+        "--adaptive-sleep-budget-max-scale",
+        type=float,
+        default=circadian_defaults.adaptive_sleep_budget_max_scale,
+    )
+
+    parser.add_argument(
+        "--reward-modulated-learning",
+        action="store_true",
+        help="Scale wake learning rate by batch difficulty relative to recent error baseline.",
+    )
+    parser.add_argument(
+        "--reward-scale-min",
+        type=float,
+        default=circadian_defaults.reward_scale_min,
+    )
+    parser.add_argument(
+        "--reward-scale-max",
+        type=float,
+        default=circadian_defaults.reward_scale_max,
+    )
 
     parser.add_argument(
         "--split-weight-norm-mix",
@@ -188,6 +219,12 @@ def main() -> None:
         sleep_energy_window=arguments.sleep_energy_window,
         sleep_plateau_delta=arguments.sleep_plateau_delta,
         sleep_chemical_variance_threshold=arguments.sleep_chemical_variance_threshold,
+        use_adaptive_sleep_budget=arguments.adaptive_sleep_budget,
+        adaptive_sleep_budget_min_scale=arguments.adaptive_sleep_budget_min_scale,
+        adaptive_sleep_budget_max_scale=arguments.adaptive_sleep_budget_max_scale,
+        use_reward_modulated_learning=arguments.reward_modulated_learning,
+        reward_scale_min=arguments.reward_scale_min,
+        reward_scale_max=arguments.reward_scale_max,
         split_weight_norm_mix=arguments.split_weight_norm_mix,
         prune_weight_norm_mix=arguments.prune_weight_norm_mix,
         prune_decay_steps=arguments.prune_decay_steps,
diff --git a/src/adapters/resnet_benchmark_cli.py b/src/adapters/resnet_benchmark_cli.py
index df50e20..8807ad0 100644
--- a/src/adapters/resnet_benchmark_cli.py
+++ b/src/adapters/resnet_benchmark_cli.py
@@ -113,6 +113,23 @@ def build_argument_parser() -> argparse.ArgumentParser:
     parser.add_argument("--circ-sleep-energy-window", type=int, default=48)
     parser.add_argument("--circ-sleep-plateau-delta", type=float, default=5e-5)
     parser.add_argument("--circ-sleep-chemical-variance-threshold", type=float, default=0.02)
+    parser.add_argument(
+        "--circ-use-adaptive-sleep-budget",
+        dest="circ_use_adaptive_sleep_budget",
+        action="store_true",
+        help="Enable adaptive split/prune budget scaling by plateau and chemical variance.",
+    )
+    parser.add_argument(
+        "--circ-disable-adaptive-sleep-budget",
+        dest="circ_use_adaptive_sleep_budget",
+        action="store_false",
+        help="Disable adaptive split/prune budget scaling.",
+    )
+    parser.set_defaults(circ_use_adaptive_sleep_budget=True)
+    parser.add_argument("--circ-adaptive-sleep-budget-min-scale", type=float, default=0.25)
+    parser.add_argument("--circ-adaptive-sleep-budget-max-scale", type=float, default=1.0)
+    parser.add_argument("--circ-adaptive-sleep-budget-plateau-weight", type=float, default=0.6)
+    parser.add_argument("--circ-adaptive-sleep-budget-variance-weight", type=float, default=0.4)
     parser.add_argument(
         "--circ-force-sleep",
         dest="circ_force_sleep",
@@ -168,6 +185,15 @@ def build_argument_parser() -> argparse.ArgumentParser:
     parser.add_argument("--circ-plasticity-sensitivity-max", type=float, default=0.55)
     parser.add_argument("--circ-plasticity-importance-mix", type=float, default=0.50)
     parser.add_argument("--circ-min-plasticity", type=float, default=0.5)
+    parser.add_argument(
+        "--circ-use-reward-modulated-learning",
+        action="store_true",
+        help="Scale wake learning rate by batch difficulty relative to a moving baseline.",
+    )
+    parser.add_argument("--circ-reward-baseline-decay", type=float, default=0.95)
+    parser.add_argument("--circ-reward-difficulty-exponent", type=float, default=1.0)
+    parser.add_argument("--circ-reward-scale-min", type=float, default=0.75)
+    parser.add_argument("--circ-reward-scale-max", type=float, default=1.5)
     parser.add_argument("--circ-use-adaptive-thresholds", action="store_true", default=None)
     parser.add_argument("--circ-adaptive-split-percentile", type=float, default=92.0)
     parser.add_argument("--circ-adaptive-prune-percentile", type=float, default=8.0)
@@ -260,6 +286,15 @@ def main() -> None:
         circadian_sleep_chemical_variance_threshold=(
             args.circ_sleep_chemical_variance_threshold
         ),
+        circadian_use_adaptive_sleep_budget=args.circ_use_adaptive_sleep_budget,
+        circadian_adaptive_sleep_budget_min_scale=args.circ_adaptive_sleep_budget_min_scale,
+        circadian_adaptive_sleep_budget_max_scale=args.circ_adaptive_sleep_budget_max_scale,
+        circadian_adaptive_sleep_budget_plateau_weight=(
+            args.circ_adaptive_sleep_budget_plateau_weight
+        ),
+        circadian_adaptive_sleep_budget_variance_weight=(
+            args.circ_adaptive_sleep_budget_variance_weight
+        ),
         circadian_enable_sleep_rollback=args.circ_enable_sleep_rollback,
         circadian_sleep_rollback_tolerance=args.circ_sleep_rollback_tolerance,
         circadian_sleep_rollback_metric=args.circ_sleep_rollback_metric,
@@ -291,6 +326,11 @@ def main() -> None:
         circadian_plasticity_sensitivity_max=args.circ_plasticity_sensitivity_max,
         circadian_plasticity_importance_mix=args.circ_plasticity_importance_mix,
         circadian_min_plasticity=args.circ_min_plasticity,
+        circadian_use_reward_modulated_learning=args.circ_use_reward_modulated_learning,
+        circadian_reward_baseline_decay=args.circ_reward_baseline_decay,
+        circadian_reward_difficulty_exponent=args.circ_reward_difficulty_exponent,
+        circadian_reward_scale_min=args.circ_reward_scale_min,
+        circadian_reward_scale_max=args.circ_reward_scale_max,
         circadian_use_adaptive_thresholds=(
             True
             if args.circ_use_adaptive_thresholds is None
diff --git a/src/app/resnet50_benchmark.py b/src/app/resnet50_benchmark.py
index af88f29..f3e7c06 100644
--- a/src/app/resnet50_benchmark.py
+++ b/src/app/resnet50_benchmark.py
@@ -70,6 +70,11 @@ class ResNet50BenchmarkConfig:
     circadian_sleep_energy_window: int = 48
     circadian_sleep_plateau_delta: float = 5e-5
     circadian_sleep_chemical_variance_threshold: float = 0.02
+    circadian_use_adaptive_sleep_budget: bool = True
+    circadian_adaptive_sleep_budget_min_scale: float = 0.25
+    circadian_adaptive_sleep_budget_max_scale: float = 1.0
+    circadian_adaptive_sleep_budget_plateau_weight: float = 0.6
+    circadian_adaptive_sleep_budget_variance_weight: float = 0.4
     circadian_enable_sleep_rollback: bool = True
     circadian_sleep_rollback_tolerance: float = 0.002
     circadian_sleep_rollback_metric: str = "cross_entropy"
@@ -91,6 +96,11 @@ class ResNet50BenchmarkConfig:
     circadian_plasticity_sensitivity_max: float = 0.55
     circadian_plasticity_importance_mix: float = 0.50
     circadian_min_plasticity: float = 0.5
+    circadian_use_reward_modulated_learning: bool = False
+    circadian_reward_baseline_decay: float = 0.95
+    circadian_reward_difficulty_exponent: float = 1.0
+    circadian_reward_scale_min: float = 0.75
+    circadian_reward_scale_max: float = 1.5
     circadian_use_adaptive_thresholds: bool = True
     circadian_adaptive_split_percentile: float = 92.0
     circadian_adaptive_prune_percentile: float = 8.0
@@ -484,6 +494,11 @@ def _benchmark_circadian(
         plasticity_sensitivity_max=config.circadian_plasticity_sensitivity_max,
         plasticity_importance_mix=config.circadian_plasticity_importance_mix,
         min_plasticity=config.circadian_min_plasticity,
+        use_reward_modulated_learning=config.circadian_use_reward_modulated_learning,
+        reward_baseline_decay=config.circadian_reward_baseline_decay,
+        reward_difficulty_exponent=config.circadian_reward_difficulty_exponent,
+        reward_scale_min=config.circadian_reward_scale_min,
+        reward_scale_max=config.circadian_reward_scale_max,
         use_adaptive_thresholds=config.circadian_use_adaptive_thresholds,
         adaptive_split_percentile=config.circadian_adaptive_split_percentile,
         adaptive_prune_percentile=config.circadian_adaptive_prune_percentile,
@@ -517,6 +532,11 @@ def _benchmark_circadian(
         sleep_energy_window=config.circadian_sleep_energy_window,
         sleep_plateau_delta=config.circadian_sleep_plateau_delta,
         sleep_chemical_variance_threshold=config.circadian_sleep_chemical_variance_threshold,
+        use_adaptive_sleep_budget=config.circadian_use_adaptive_sleep_budget,
+        adaptive_sleep_budget_min_scale=config.circadian_adaptive_sleep_budget_min_scale,
+        adaptive_sleep_budget_max_scale=config.circadian_adaptive_sleep_budget_max_scale,
+        adaptive_sleep_budget_plateau_weight=config.circadian_adaptive_sleep_budget_plateau_weight,
+        adaptive_sleep_budget_variance_weight=config.circadian_adaptive_sleep_budget_variance_weight,
     )
     model = CircadianPredictiveCodingResNet50Classifier(
         num_classes=loaders.num_classes,
@@ -879,6 +899,14 @@ def _validate_benchmark_config(config: ResNet50BenchmarkConfig) -> None:
         )
     if not (0.0 <= config.circadian_plasticity_importance_mix <= 1.0):
         raise ValueError("circadian_plasticity_importance_mix must be between 0 and 1.")
+    if not (0.0 <= config.circadian_reward_baseline_decay < 1.0):
+        raise ValueError("circadian_reward_baseline_decay must be in [0, 1).")
+    if config.circadian_reward_difficulty_exponent <= 0.0:
+        raise ValueError("circadian_reward_difficulty_exponent must be positive.")
+    if config.circadian_reward_scale_min <= 0.0:
+        raise ValueError("circadian_reward_scale_min must be positive.")
+    if config.circadian_reward_scale_max < config.circadian_reward_scale_min:
+        raise ValueError("circadian_reward_scale_max must be >= circadian_reward_scale_min.")
     if config.circadian_sleep_warmup_steps < 0:
         raise ValueError("circadian_sleep_warmup_steps must be non-negative.")
     if not (0.0 <= config.circadian_sleep_split_only_until_fraction <= 1.0):
@@ -907,6 +935,22 @@ def _validate_benchmark_config(config: ResNet50BenchmarkConfig) -> None:
         raise ValueError("circadian_sleep_plateau_delta must be non-negative.")
     if config.circadian_sleep_chemical_variance_threshold < 0.0:
         raise ValueError("circadian_sleep_chemical_variance_threshold must be non-negative.")
+    if config.circadian_adaptive_sleep_budget_min_scale <= 0.0:
+        raise ValueError("circadian_adaptive_sleep_budget_min_scale must be positive.")
+    if (
+        config.circadian_adaptive_sleep_budget_max_scale
+        < config.circadian_adaptive_sleep_budget_min_scale
+    ):
+        raise ValueError(
+            "circadian_adaptive_sleep_budget_max_scale must be >= "
+            "circadian_adaptive_sleep_budget_min_scale."
+        )
+    if config.circadian_adaptive_sleep_budget_max_scale > 1.0:
+        raise ValueError("circadian_adaptive_sleep_budget_max_scale must be <= 1.0.")
+    if config.circadian_adaptive_sleep_budget_plateau_weight < 0.0:
+        raise ValueError("circadian_adaptive_sleep_budget_plateau_weight must be non-negative.")
+    if config.circadian_adaptive_sleep_budget_variance_weight < 0.0:
+        raise ValueError("circadian_adaptive_sleep_budget_variance_weight must be non-negative.")
     if config.circadian_sleep_rollback_tolerance < 0.0:
         raise ValueError("circadian_sleep_rollback_tolerance must be non-negative.")
     if config.circadian_sleep_rollback_eval_batches < 0:
diff --git a/src/core/circadian_predictive_coding.py b/src/core/circadian_predictive_coding.py
index a59ea32..e41c404 100644
--- a/src/core/circadian_predictive_coding.py
+++ b/src/core/circadian_predictive_coding.py
@@ -53,6 +53,13 @@ class CircadianConfig:
     plasticity_importance_mix: float = 0.50
     min_plasticity: float = 0.20
 
+    # Optional reward modulation: scale wake updates toward harder batches.
+    use_reward_modulated_learning: bool = False
+    reward_baseline_decay: float = 0.95
+    reward_difficulty_exponent: float = 1.0
+    reward_scale_min: float = 0.75
+    reward_scale_max: float = 1.5
+
     # Static thresholds are still available, but adaptive percentile thresholds
     # can be enabled to react to changing chemical distributions over training.
     use_adaptive_thresholds: bool = False
@@ -90,6 +97,13 @@ class CircadianConfig:
     sleep_plateau_delta: float = 1e-3
     sleep_chemical_variance_threshold: float = 0.02
 
+    # Optional adaptive budget scaling for split/prune counts during sleep.
+    use_adaptive_sleep_budget: bool = False
+    adaptive_sleep_budget_min_scale: float = 0.25
+    adaptive_sleep_budget_max_scale: float = 1.0
+    adaptive_sleep_budget_plateau_weight: float = 0.6
+    adaptive_sleep_budget_variance_weight: float = 0.4
+
     # Gradual pruning: marked neurons decay for a few epochs before removal.
     prune_decay_steps: int = 1
     prune_decay_factor: float = 0.60
@@ -174,6 +188,8 @@ def __init__(
         self._epoch_count = 0
         self._epochs_since_sleep = 0
         self._energy_history: list[float] = []
+        self._reward_error_ema: float | None = None
+        self._last_reward_scale = 1.0
         self._replay_memory: deque[ReplaySnapshot] = deque(
             maxlen=self.config.replay_memory_size
         )
@@ -251,17 +267,20 @@ def _run_training_step(
         grad_hidden_bias = np.sum(hidden_prior_gradient, axis=0, keepdims=True) / sample_count
 
         self._update_chemical_layer(hidden_state)
-        self._update_importance_ema(grad_hidden_output)
+        reward_scale = self._compute_reward_scale(output_error)
+        self._last_reward_scale = reward_scale
+        self._update_importance_ema(grad_hidden_output, reward_scale=reward_scale)
         plasticity = self.get_plasticity_state()
 
         gated_input_hidden = grad_input_hidden * plasticity[np.newaxis, :]
         gated_hidden_output = grad_hidden_output * plasticity[:, np.newaxis]
         gated_hidden_bias = grad_hidden_bias * plasticity[np.newaxis, :]
+        effective_learning_rate = learning_rate * reward_scale
 
-        self.weight_hidden_output -= learning_rate * gated_hidden_output
-        self.bias_output -= learning_rate * grad_output_bias
-        self.weight_input_hidden -= learning_rate * gated_input_hidden
-        self.bias_hidden -= learning_rate * gated_hidden_bias
+        self.weight_hidden_output -= effective_learning_rate * gated_hidden_output
+        self.bias_output -= effective_learning_rate * grad_output_bias
+        self.weight_input_hidden -= effective_learning_rate * gated_input_hidden
+        self.bias_hidden -= effective_learning_rate * gated_hidden_bias
 
         self._record_hidden_traffic(hidden_state)
         energy = self._compute_energy(
@@ -400,6 +419,10 @@ def get_plasticity_state(self) -> Array:
         plasticity = np.exp(-sensitivity * self._hidden_chemical)
         return np.clip(plasticity, self.config.min_plasticity, 1.0)
 
+    def get_last_reward_scale(self) -> float:
+        """Return most recent reward modulation scale from wake training."""
+        return float(self._last_reward_scale)
+
     def _compute_plasticity_sensitivity(self) -> Array:
         if not self.config.use_adaptive_plasticity_sensitivity:
             return np.full_like(self._hidden_chemical, self.config.plasticity_sensitivity)
@@ -439,11 +462,29 @@ def _record_hidden_traffic(self, hidden_state: Array) -> None:
         self._traffic_sum += np.mean(np.abs(hidden_state), axis=0)
         self._traffic_steps += 1
 
-    def _update_importance_ema(self, grad_hidden_output: Array) -> None:
-        importance = np.mean(np.abs(grad_hidden_output), axis=1)
+    def _update_importance_ema(self, grad_hidden_output: Array, reward_scale: float) -> None:
+        importance = np.mean(np.abs(grad_hidden_output), axis=1) * float(reward_scale)
         decay = self.config.importance_ema_decay
         self._importance_ema = decay * self._importance_ema + (1.0 - decay) * importance
 
+    def _compute_reward_scale(self, output_error: Array) -> float:
+        if not self.config.use_reward_modulated_learning:
+            return 1.0
+
+        batch_error = float(np.mean(np.abs(output_error)))
+        baseline = batch_error if self._reward_error_ema is None else self._reward_error_ema
+        difficulty_ratio = batch_error / max(float(baseline), 1e-8)
+        raw_scale = difficulty_ratio ** self.config.reward_difficulty_exponent
+        reward_scale = float(
+            np.clip(raw_scale, self.config.reward_scale_min, self.config.reward_scale_max)
+        )
+
+        # Why this: update baseline after computing ratio so scale reflects
+        # current surprise against past performance, not a blended present.
+        decay = self.config.reward_baseline_decay
+        self._reward_error_ema = decay * float(baseline) + (1.0 - decay) * batch_error
+        return reward_scale
+
     def _update_chemical_layer(self, hidden_state: Array) -> None:
         activity = np.mean(np.abs(hidden_state), axis=0)
         if not self.config.use_dual_chemical:
@@ -575,8 +616,13 @@ def _resolve_split_prune_thresholds(self) -> tuple[float, float]:
     def _resolve_sleep_budgets(
         self, current_step: int | None, total_steps: int | None
     ) -> tuple[int, int, bool]:
-        split_budget = self._resolve_structural_budget(self.config.max_split_per_sleep)
-        prune_budget = self._resolve_structural_budget(self.config.max_prune_per_sleep)
+        budget_scale = self._compute_adaptive_sleep_budget_scale()
+        split_budget = self._resolve_structural_budget(
+            self.config.max_split_per_sleep, budget_scale=budget_scale
+        )
+        prune_budget = self._resolve_structural_budget(
+            self.config.max_prune_per_sleep, budget_scale=budget_scale
+        )
         if current_step is None or total_steps is None or total_steps <= 0:
             return split_budget, prune_budget, False
 
@@ -590,15 +636,58 @@ def _resolve_sleep_budgets(
             split_budget = 0
         return split_budget, prune_budget, False
 
-    def _resolve_structural_budget(self, configured_limit: int) -> int:
+    def _compute_adaptive_sleep_budget_scale(self) -> float:
+        if not self.config.use_adaptive_sleep_budget:
+            return 1.0
+        if len(self._energy_history) < self.config.sleep_energy_window:
+            return 1.0
+
+        recent = self._energy_history[-self.config.sleep_energy_window :]
+        energy_improvement = max(0.0, float(recent[0] - recent[-1]))
+
+        plateau_delta = self.config.sleep_plateau_delta
+        if plateau_delta <= 1e-12:
+            plateau_score = 1.0 if energy_improvement <= 0.0 else 0.0
+        else:
+            plateau_score = float(
+                np.clip((plateau_delta - energy_improvement) / plateau_delta, 0.0, 1.0)
+            )
+
+        variance_threshold = self.config.sleep_chemical_variance_threshold
+        if variance_threshold <= 1e-12:
+            variance_score = 1.0
+        else:
+            chemical_variance = float(np.var(self._hidden_chemical))
+            variance_score = float(np.clip(chemical_variance / variance_threshold, 0.0, 1.0))
+
+        plateau_weight = max(0.0, self.config.adaptive_sleep_budget_plateau_weight)
+        variance_weight = max(0.0, self.config.adaptive_sleep_budget_variance_weight)
+        total_weight = plateau_weight + variance_weight
+        if total_weight <= 1e-12:
+            combined_signal = 1.0
+        else:
+            combined_signal = (
+                plateau_weight * plateau_score + variance_weight * variance_score
+            ) / total_weight
+
+        min_scale = self.config.adaptive_sleep_budget_min_scale
+        max_scale = self.config.adaptive_sleep_budget_max_scale
+        return float(min_scale + (max_scale - min_scale) * combined_signal)
+
+    def _resolve_structural_budget(self, configured_limit: int, budget_scale: float) -> int:
         if configured_limit <= 0:
             return 0
+        if budget_scale <= 0.0:
+            return 0
+        scaled_limit = int(np.floor(float(configured_limit) * budget_scale))
+        if scaled_limit <= 0:
+            scaled_limit = 1
         fraction = self.config.sleep_max_change_fraction
         if fraction <= 0.0:
             return 0
         by_fraction = int(np.floor(float(self.hidden_dim) * fraction))
         by_fraction = max(by_fraction, int(self.config.sleep_min_change_count))
-        return min(int(configured_limit), int(by_fraction))
+        return min(int(scaled_limit), int(by_fraction))
 
     def _compute_split_scores(self) -> Array:
         chemical_component = self._normalize_vector(self._hidden_chemical)
@@ -958,6 +1047,14 @@ def _validate_config(self, config: CircadianConfig) -> None:
             raise ValueError("plasticity_importance_mix must be between 0 and 1")
         if not (0.0 < config.min_plasticity <= 1.0):
             raise ValueError("min_plasticity must be in (0, 1]")
+        if not (0.0 <= config.reward_baseline_decay < 1.0):
+            raise ValueError("reward_baseline_decay must be in [0, 1)")
+        if config.reward_difficulty_exponent <= 0.0:
+            raise ValueError("reward_difficulty_exponent must be positive")
+        if config.reward_scale_min <= 0.0:
+            raise ValueError("reward_scale_min must be positive")
+        if config.reward_scale_max < config.reward_scale_min:
+            raise ValueError("reward_scale_max must be >= reward_scale_min")
         if not (0.0 <= config.split_weight_norm_mix <= 1.0):
             raise ValueError("split_weight_norm_mix must be between 0 and 1")
         if not (0.0 <= config.prune_weight_norm_mix <= 1.0):
@@ -984,6 +1081,16 @@ def _validate_config(self, config: CircadianConfig) -> None:
             raise ValueError("sleep_plateau_delta must be non-negative")
         if config.sleep_chemical_variance_threshold < 0.0:
             raise ValueError("sleep_chemical_variance_threshold must be non-negative")
+        if config.adaptive_sleep_budget_min_scale <= 0.0:
+            raise ValueError("adaptive_sleep_budget_min_scale must be positive")
+        if config.adaptive_sleep_budget_max_scale < config.adaptive_sleep_budget_min_scale:
+            raise ValueError("adaptive_sleep_budget_max_scale must be >= min scale")
+        if config.adaptive_sleep_budget_max_scale > 1.0:
+            raise ValueError("adaptive_sleep_budget_max_scale must be <= 1.0")
+        if config.adaptive_sleep_budget_plateau_weight < 0.0:
+            raise ValueError("adaptive_sleep_budget_plateau_weight must be non-negative")
+        if config.adaptive_sleep_budget_variance_weight < 0.0:
+            raise ValueError("adaptive_sleep_budget_variance_weight must be non-negative")
         if config.max_split_per_sleep < 0 or config.max_prune_per_sleep < 0:
             raise ValueError("max split/prune per sleep must be non-negative")
         if config.split_noise_scale < 0.0:
diff --git a/src/core/resnet50_variants.py b/src/core/resnet50_variants.py
index 7fa3476..7f89194 100644
--- a/src/core/resnet50_variants.py
+++ b/src/core/resnet50_variants.py
@@ -28,6 +28,11 @@ class CircadianHeadConfig:
     plasticity_sensitivity_max: float = 1.20
     plasticity_importance_mix: float = 0.50
     min_plasticity: float = 0.20
+    use_reward_modulated_learning: bool = False
+    reward_baseline_decay: float = 0.95
+    reward_difficulty_exponent: float = 1.0
+    reward_scale_min: float = 0.75
+    reward_scale_max: float = 1.5
     use_adaptive_thresholds: bool = False
     adaptive_split_percentile: float = 85.0
     adaptive_prune_percentile: float = 20.0
@@ -61,6 +66,11 @@ class CircadianHeadConfig:
     sleep_energy_window: int = 32
     sleep_plateau_delta: float = 1e-4
     sleep_chemical_variance_threshold: float = 0.02
+    use_adaptive_sleep_budget: bool = False
+    adaptive_sleep_budget_min_scale: float = 0.25
+    adaptive_sleep_budget_max_scale: float = 1.0
+    adaptive_sleep_budget_plateau_weight: float = 0.6
+    adaptive_sleep_budget_variance_weight: float = 0.4
 
 
 @dataclass(frozen=True)
@@ -297,6 +307,8 @@ def __init__(
         self._prune_cooldown = torch.zeros(hidden_dim, dtype=torch.int32, device=device)
         self._steps_since_sleep = 0
         self._energy_history: list[float] = []
+        self._reward_error_ema: float | None = None
+        self._last_reward_scale = 1.0
         generator_device = "cuda" if str(device).startswith("cuda") else "cpu"
         generator = torch.Generator(device=generator_device)
         generator.manual_seed(seed + 9_999)
@@ -341,17 +353,24 @@ def train_step(
         grad_hidden_bias = torch.mean(hidden_prior_grad, dim=0, keepdim=True)
 
         self._update_chemical(hidden_state)
-        self._update_importance_ema(grad_hidden_output)
+        reward_scale = self._compute_reward_scale(output_error)
+        self._last_reward_scale = reward_scale
+        self._update_importance_ema(grad_hidden_output, reward_scale=reward_scale)
         plasticity = self._plasticity()
 
         grad_hidden_output = grad_hidden_output * plasticity[:, None]
         grad_feature_hidden = grad_feature_hidden * plasticity[None, :]
         grad_hidden_bias = grad_hidden_bias * plasticity[None, :]
+        effective_learning_rate = learning_rate * reward_scale
 
-        self.weight_hidden_output = self.weight_hidden_output - learning_rate * grad_hidden_output
-        self.bias_output = self.bias_output - learning_rate * grad_output_bias
-        self.weight_feature_hidden = self.weight_feature_hidden - learning_rate * grad_feature_hidden
-        self.bias_hidden = self.bias_hidden - learning_rate * grad_hidden_bias
+        self.weight_hidden_output = (
+            self.weight_hidden_output - effective_learning_rate * grad_hidden_output
+        )
+        self.bias_output = self.bias_output - effective_learning_rate * grad_output_bias
+        self.weight_feature_hidden = (
+            self.weight_feature_hidden - effective_learning_rate * grad_feature_hidden
+        )
+        self.bias_hidden = self.bias_hidden - effective_learning_rate * grad_hidden_bias
 
         self._traffic_sum = self._traffic_sum + torch.mean(torch.abs(hidden_state), dim=0)
         self._traffic_steps += 1
@@ -444,6 +463,8 @@ def snapshot_state(self) -> dict[str, Any]:
             "prune_cooldown": self._prune_cooldown.clone(),
             "steps_since_sleep": self._steps_since_sleep,
             "energy_history": list(self._energy_history),
+            "reward_error_ema": self._reward_error_ema,
+            "last_reward_scale": self._last_reward_scale,
         }
 
     def restore_state(self, state: dict[str, Any]) -> None:
@@ -462,10 +483,16 @@ def restore_state(self, state: dict[str, Any]) -> None:
         self._prune_cooldown = cast(Any, state["prune_cooldown"]).clone()
         self._steps_since_sleep = int(state["steps_since_sleep"])
         self._energy_history = list(cast(list[float], state["energy_history"]))
+        reward_error_ema = state.get("reward_error_ema")
+        self._reward_error_ema = None if reward_error_ema is None else float(reward_error_ema)
+        self._last_reward_scale = float(state.get("last_reward_scale", 1.0))
 
     def mean_chemical(self) -> Any:
         return self._chemical.clone()
 
+    def last_reward_scale(self) -> float:
+        return float(self._last_reward_scale)
+
     def _plasticity(self) -> Any:
         torch = self._torch
         sensitivity = self._plasticity_sensitivity_vector()
@@ -485,12 +512,28 @@ def _plasticity_sensitivity_vector(self) -> Any:
         base = torch.full_like(self._chemical, self.config.plasticity_sensitivity_min)
         return base + span * stability
 
-    def _update_importance_ema(self, grad_hidden_output: Any) -> None:
+    def _update_importance_ema(self, grad_hidden_output: Any, reward_scale: float) -> None:
         torch = self._torch
-        importance = torch.mean(torch.abs(grad_hidden_output), dim=1)
+        importance = torch.mean(torch.abs(grad_hidden_output), dim=1) * reward_scale
         decay = self.config.importance_ema_decay
         self._importance_ema = decay * self._importance_ema + (1.0 - decay) * importance
 
+    def _compute_reward_scale(self, output_error: Any) -> float:
+        torch = self._torch
+        if not self.config.use_reward_modulated_learning:
+            return 1.0
+
+        batch_error = float(torch.mean(torch.abs(output_error)).item())
+        baseline = batch_error if self._reward_error_ema is None else self._reward_error_ema
+        difficulty_ratio = batch_error / max(float(baseline), 1e-8)
+        raw_scale = difficulty_ratio ** self.config.reward_difficulty_exponent
+        reward_scale = float(
+            max(self.config.reward_scale_min, min(self.config.reward_scale_max, raw_scale))
+        )
+        decay = self.config.reward_baseline_decay
+        self._reward_error_ema = decay * float(baseline) + (1.0 - decay) * batch_error
+        return reward_scale
+
     def _update_chemical(self, hidden_state: Any) -> None:
         torch = self._torch
         activity = torch.mean(torch.abs(hidden_state), dim=0)
@@ -601,8 +644,13 @@ def _resolve_split_prune_thresholds(self) -> tuple[float, float]:
     def _resolve_sleep_budgets(
         self, current_step: int | None, total_steps: int | None
     ) -> tuple[int, int, bool]:
-        split_budget = self._resolve_structural_budget(self.config.max_split_per_sleep)
-        prune_budget = self._resolve_structural_budget(self.config.max_prune_per_sleep)
+        budget_scale = self._compute_adaptive_sleep_budget_scale()
+        split_budget = self._resolve_structural_budget(
+            self.config.max_split_per_sleep, budget_scale=budget_scale
+        )
+        prune_budget = self._resolve_structural_budget(
+            self.config.max_prune_per_sleep, budget_scale=budget_scale
+        )
         if current_step is None or total_steps is None or total_steps <= 0:
             return split_budget, prune_budget, False
 
@@ -615,15 +663,56 @@ def _resolve_sleep_budgets(
             split_budget = 0
         return split_budget, prune_budget, False
 
-    def _resolve_structural_budget(self, configured_limit: int) -> int:
+    def _compute_adaptive_sleep_budget_scale(self) -> float:
+        torch = self._torch
+        if not self.config.use_adaptive_sleep_budget:
+            return 1.0
+        if len(self._energy_history) < self.config.sleep_energy_window:
+            return 1.0
+
+        recent = self._energy_history[-self.config.sleep_energy_window :]
+        energy_improvement = max(0.0, float(recent[0] - recent[-1]))
+        plateau_delta = self.config.sleep_plateau_delta
+        if plateau_delta <= 1e-12:
+            plateau_score = 1.0 if energy_improvement <= 0.0 else 0.0
+        else:
+            plateau_score = max(0.0, min(1.0, (plateau_delta - energy_improvement) / plateau_delta))
+
+        variance_threshold = self.config.sleep_chemical_variance_threshold
+        if variance_threshold <= 1e-12:
+            variance_score = 1.0
+        else:
+            chemical_variance = float(torch.var(self._chemical).item())
+            variance_score = max(0.0, min(1.0, chemical_variance / variance_threshold))
+
+        plateau_weight = max(0.0, self.config.adaptive_sleep_budget_plateau_weight)
+        variance_weight = max(0.0, self.config.adaptive_sleep_budget_variance_weight)
+        total_weight = plateau_weight + variance_weight
+        if total_weight <= 1e-12:
+            combined_signal = 1.0
+        else:
+            combined_signal = (
+                plateau_weight * plateau_score + variance_weight * variance_score
+            ) / total_weight
+
+        min_scale = self.config.adaptive_sleep_budget_min_scale
+        max_scale = self.config.adaptive_sleep_budget_max_scale
+        return float(min_scale + (max_scale - min_scale) * combined_signal)
+
+    def _resolve_structural_budget(self, configured_limit: int, budget_scale: float) -> int:
         if configured_limit <= 0:
             return 0
+        if budget_scale <= 0.0:
+            return 0
+        scaled_limit = int(float(configured_limit) * budget_scale)
+        if scaled_limit <= 0:
+            scaled_limit = 1
         fraction = self.config.sleep_max_change_fraction
         if fraction <= 0.0:
             return 0
         by_fraction = int(float(self.hidden_dim) * fraction)
         by_fraction = max(by_fraction, int(self.config.sleep_min_change_count))
-        return min(int(configured_limit), int(by_fraction))
+        return min(int(scaled_limit), int(by_fraction))
 
     def _compute_split_scores(self) -> Any:
         norm_scores = self._normalize_tensor(self._row_norm(self.weight_hidden_output))
@@ -828,6 +917,14 @@ def _validate_config(self, config: CircadianHeadConfig) -> None:
             raise ValueError("plasticity_importance_mix must be between 0 and 1.")
         if not (0.0 < config.min_plasticity <= 1.0):
             raise ValueError("min_plasticity must be in (0, 1].")
+        if not (0.0 <= config.reward_baseline_decay < 1.0):
+            raise ValueError("reward_baseline_decay must be in [0, 1).")
+        if config.reward_difficulty_exponent <= 0.0:
+            raise ValueError("reward_difficulty_exponent must be positive.")
+        if config.reward_scale_min <= 0.0:
+            raise ValueError("reward_scale_min must be positive.")
+        if config.reward_scale_max < config.reward_scale_min:
+            raise ValueError("reward_scale_max must be >= reward_scale_min.")
         if not (0.0 <= config.adaptive_split_percentile <= 100.0):
             raise ValueError("adaptive_split_percentile must be between 0 and 100.")
         if not (0.0 <= config.adaptive_prune_percentile <= 100.0):
@@ -882,6 +979,16 @@ def _validate_config(self, config: CircadianHeadConfig) -> None:
             raise ValueError("sleep_plateau_delta must be non-negative.")
         if config.sleep_chemical_variance_threshold < 0.0:
             raise ValueError("sleep_chemical_variance_threshold must be non-negative.")
+        if config.adaptive_sleep_budget_min_scale <= 0.0:
+            raise ValueError("adaptive_sleep_budget_min_scale must be positive.")
+        if config.adaptive_sleep_budget_max_scale < config.adaptive_sleep_budget_min_scale:
+            raise ValueError("adaptive_sleep_budget_max_scale must be >= min scale.")
+        if config.adaptive_sleep_budget_max_scale > 1.0:
+            raise ValueError("adaptive_sleep_budget_max_scale must be <= 1.0.")
+        if config.adaptive_sleep_budget_plateau_weight < 0.0:
+            raise ValueError("adaptive_sleep_budget_plateau_weight must be non-negative.")
+        if config.adaptive_sleep_budget_variance_weight < 0.0:
+            raise ValueError("adaptive_sleep_budget_variance_weight must be non-negative.")
 
 
 class PredictiveCodingResNet50Classifier:
diff --git a/tests/test_circadian_predictive_coding.py b/tests/test_circadian_predictive_coding.py
index 18c5b29..a03c0c3 100644
--- a/tests/test_circadian_predictive_coding.py
+++ b/tests/test_circadian_predictive_coding.py
@@ -317,3 +317,88 @@ def test_should_reduce_plasticity_for_high_importance_when_adaptive_sensitivity_
     plasticity = model.get_plasticity_state()
 
     assert float(plasticity[0]) < float(plasticity[1])
+
+
+def test_should_scale_learning_rate_by_reward_signal_for_easy_vs_hard_batches() -> None:
+    config = CircadianConfig(
+        use_reward_modulated_learning=True,
+        reward_baseline_decay=0.95,
+        reward_scale_min=0.8,
+        reward_scale_max=1.6,
+    )
+    model = CircadianPredictiveCodingNetwork(
+        input_dim=2,
+        hidden_dim=5,
+        seed=54,
+        circadian_config=config,
+    )
+    train_input = np.array(
+        [[0.2, -0.1], [0.3, 0.4], [-0.2, 0.8], [0.7, -0.5]],
+        dtype=np.float64,
+    )
+
+    model._reward_error_ema = 1.0
+    easy_target = model.predict_proba(train_input)
+    model.train_epoch(
+        input_batch=train_input,
+        target_batch=easy_target,
+        learning_rate=0.03,
+        inference_steps=8,
+        inference_learning_rate=0.2,
+    )
+    easy_scale = model.get_last_reward_scale()
+
+    model._reward_error_ema = 0.05
+    hard_target = 1.0 - easy_target
+    model.train_epoch(
+        input_batch=train_input,
+        target_batch=hard_target,
+        learning_rate=0.03,
+        inference_steps=8,
+        inference_learning_rate=0.2,
+    )
+    hard_scale = model.get_last_reward_scale()
+
+    assert easy_scale <= config.reward_scale_min + 1e-6
+    assert hard_scale > 1.0
+
+
+def test_should_expand_sleep_budget_when_plateau_and_chemical_variance_are_high() -> None:
+    config = CircadianConfig(
+        use_adaptive_sleep_budget=True,
+        max_split_per_sleep=4,
+        max_prune_per_sleep=4,
+        sleep_energy_window=3,
+        sleep_plateau_delta=0.1,
+        sleep_chemical_variance_threshold=0.05,
+        adaptive_sleep_budget_min_scale=0.25,
+        adaptive_sleep_budget_max_scale=1.0,
+        adaptive_sleep_budget_plateau_weight=0.5,
+        adaptive_sleep_budget_variance_weight=0.5,
+    )
+    model = CircadianPredictiveCodingNetwork(
+        input_dim=2,
+        hidden_dim=10,
+        seed=71,
+        circadian_config=config,
+        max_hidden_dim=20,
+    )
+
+    model._energy_history = [1.0, 0.7, 0.3]
+    model.set_chemical_state(np.full(10, 0.2, dtype=np.float64))
+    low_split_budget, low_prune_budget, _ = model._resolve_sleep_budgets(
+        current_step=None,
+        total_steps=None,
+    )
+
+    model._energy_history = [0.5, 0.5, 0.5]
+    model.set_chemical_state(np.array([0.0, 1.0] * 5, dtype=np.float64))
+    high_split_budget, high_prune_budget, _ = model._resolve_sleep_budgets(
+        current_step=None,
+        total_steps=None,
+    )
+
+    assert low_split_budget == 1
+    assert low_prune_budget == 1
+    assert high_split_budget == 4
+    assert high_prune_budget == 4
diff --git a/tests/test_resnet50_variants.py b/tests/test_resnet50_variants.py
index 118807d..dbdcd71 100644
--- a/tests/test_resnet50_variants.py
+++ b/tests/test_resnet50_variants.py
@@ -159,3 +159,96 @@ def test_should_restore_snapshot_after_structure_change_in_torch_head() -> None:
     assert torch.allclose(head.weight_feature_hidden, snapshot["weight_feature_hidden"])
     assert torch.allclose(head.weight_hidden_output, snapshot["weight_hidden_output"])
     assert torch.allclose(head._chemical, snapshot["chemical"])
+
+
+def test_should_scale_learning_rate_by_reward_signal_in_torch_head() -> None:
+    torch = pytest.importorskip("torch")
+    device = torch.device("cpu")
+    config = CircadianHeadConfig(
+        use_reward_modulated_learning=True,
+        reward_baseline_decay=0.95,
+        reward_scale_min=0.8,
+        reward_scale_max=1.6,
+    )
+    head = CircadianPredictiveCodingHead(
+        feature_dim=6,
+        hidden_dim=5,
+        num_classes=2,
+        device=device,
+        seed=37,
+        config=config,
+        min_hidden_dim=4,
+        max_hidden_dim=10,
+    )
+    features = torch.tensor(
+        [[0.3, -0.2, 0.5, 0.1, -0.4, 0.2], [-0.1, 0.4, 0.2, -0.6, 0.8, 0.3]],
+        dtype=torch.float32,
+    )
+    labels = torch.tensor([0, 1], dtype=torch.long)
+
+    head._reward_error_ema = 10.0
+    head.train_step(
+        features=features,
+        targets=labels,
+        learning_rate=0.03,
+        inference_steps=8,
+        inference_learning_rate=0.2,
+    )
+    easy_scale = head.last_reward_scale()
+
+    head._reward_error_ema = 0.01
+    head.train_step(
+        features=features,
+        targets=labels,
+        learning_rate=0.03,
+        inference_steps=8,
+        inference_learning_rate=0.2,
+    )
+    hard_scale = head.last_reward_scale()
+
+    assert easy_scale <= config.reward_scale_min + 1e-6
+    assert hard_scale > 1.0
+
+
+def test_should_expand_sleep_budget_when_plateau_and_variance_are_high_in_torch_head() -> None:
+    torch = pytest.importorskip("torch")
+    device = torch.device("cpu")
+    config = CircadianHeadConfig(
+        use_adaptive_sleep_budget=True,
+        max_split_per_sleep=4,
+        max_prune_per_sleep=4,
+        sleep_energy_window=3,
+        sleep_plateau_delta=0.1,
+        sleep_chemical_variance_threshold=0.05,
+        adaptive_sleep_budget_min_scale=0.25,
+        adaptive_sleep_budget_max_scale=1.0,
+        adaptive_sleep_budget_plateau_weight=0.5,
+        adaptive_sleep_budget_variance_weight=0.5,
+    )
+    head = CircadianPredictiveCodingHead(
+        feature_dim=6,
+        hidden_dim=10,
+        num_classes=2,
+        device=device,
+        seed=41,
+        config=config,
+        min_hidden_dim=4,
+        max_hidden_dim=20,
+    )
+
+    head._energy_history = [1.0, 0.7, 0.3]
+    head._chemical = torch.full((10,), 0.2, dtype=torch.float32)
+    low_split_budget, low_prune_budget, _ = head._resolve_sleep_budgets(
+        current_step=None, total_steps=None
+    )
+
+    head._energy_history = [0.5, 0.5, 0.5]
+    head._chemical = torch.tensor([0.0, 1.0] * 5, dtype=torch.float32)
+    high_split_budget, high_prune_budget, _ = head._resolve_sleep_budgets(
+        current_step=None, total_steps=None
+    )
+
+    assert low_split_budget == 1
+    assert low_prune_budget == 1
+    assert high_split_budget == 4
+    assert high_prune_budget == 4