diff --git a/CHANGELOG.md b/CHANGELOG.md index 95183eb..b39abda 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,14 @@ for versioning even while in research-stage development. ### Added +- Review-driven circadian updates in NumPy and ResNet circadian cores: + - optional reward-modulated wake learning (`use_reward_modulated_learning`) + - optional adaptive sleep budget scaling (`use_adaptive_sleep_budget`) + - `get_last_reward_scale()` telemetry helper +- Baseline and ResNet benchmark CLI flags for reward modulation and adaptive sleep budget controls. +- Review follow-up docs: + - `docs/circadian-model-review-notes.md` + - `docs/adr/ADR-0004-reward-modulated-wake-and-adaptive-sleep-budget.md` - Open-source community baseline files: - `LICENSE` (MIT) - `CODE_OF_CONDUCT.md` @@ -41,6 +49,9 @@ for versioning even while in research-stage development. ### Changed +- ResNet benchmark defaults now enable adaptive sleep budget scaling by default while keeping reward-modulated learning disabled by default. +- Updated circadian unit tests (NumPy + Torch) with coverage for reward scaling and adaptive budget behavior. +- Updated README, model card, and core module docs to document new circadian controls. - Repositioned repository messaging to Circadian Predictive Coding as the primary focus. - Updated `README.md` with: - circadian-first project framing diff --git a/README.md b/README.md index 410bf63..163f9b0 100644 --- a/README.md +++ b/README.md @@ -48,6 +48,7 @@ This lets model capacity adapt over time instead of staying fixed. - NumPy circadian predictive coding baseline for small-scale experiments - Torch ResNet-50 benchmark pipeline for speed and accuracy comparisons - Adaptive sleep triggers, adaptive split/prune thresholds, dual-timescale chemical dynamics +- Reward-modulated wake learning and adaptive sleep budget scaling (NumPy + ResNet circadian head) - Function-preserving split behavior and guarded sleep rollback - Multi-seed benchmark runner with JSON/CSV output @@ -141,6 +142,12 @@ Toy baseline: python predictive_coding_experiment.py ``` +Toy baseline with review-driven circadian controls: + +```powershell +python predictive_coding_experiment.py --adaptive-sleep-trigger --adaptive-sleep-budget --reward-modulated-learning --reward-scale-min 0.8 --reward-scale-max 1.4 +``` + ResNet benchmark (all 3 models): ```powershell @@ -183,6 +190,7 @@ pytest -q - Governance: [GOVERNANCE.md](GOVERNANCE.md) - Support process: [SUPPORT.md](SUPPORT.md) - Model Card: [docs/model-card.md](docs/model-card.md) +- Review Notes: [docs/circadian-model-review-notes.md](docs/circadian-model-review-notes.md) ## Citation diff --git a/docs/adr/ADR-0004-reward-modulated-wake-and-adaptive-sleep-budget.md b/docs/adr/ADR-0004-reward-modulated-wake-and-adaptive-sleep-budget.md new file mode 100644 index 0000000..ffa8110 --- /dev/null +++ b/docs/adr/ADR-0004-reward-modulated-wake-and-adaptive-sleep-budget.md @@ -0,0 +1,47 @@ +# ADR-0004: Add Reward-Modulated Wake Updates and Adaptive Sleep Budget Scaling + +## Context + +External model review highlighted two practical gaps in the circadian learning loop: + +- wake learning lacks an explicit task-relevance modulation signal +- sleep structural budgets rely heavily on fixed schedules/hyperparameters + +We need incremental improvements that keep the model deterministic, lightweight, and easy to test. + +## Decision + +Add two optional mechanisms to both circadian implementations: + +- `CircadianPredictiveCodingNetwork` (NumPy baseline) +- `CircadianPredictiveCodingHead` (ResNet benchmark path) + +1. Reward-modulated wake updates +- Compute batch difficulty from mean absolute output error. +- Compare difficulty against an EMA baseline. +- Scale learning updates by a clipped reward factor. + +2. Adaptive sleep budget scaling +- Compute a budget scale from: + - recent energy plateau severity + - current hidden chemical variance +- Apply the scale before enforcing configured split/prune limits and fraction caps. + +Expose both controls through baseline CLI and ResNet benchmark CLI flags. Keep defaults conservative (`off`) for backward compatibility. + +## Alternatives Considered + +1. Add RL/Bayesian meta-controller for sleep scheduling. +- Rejected for now: larger complexity and harder reproducibility. + +2. Modulate split/prune ranking directly with reward first. +- Deferred: needs clearer attribution of neuron-level contribution and stronger evaluation harness. + +3. Apply reward/adaptive budget in ResNet path first. +- Initially rejected for sequencing reasons, then implemented after NumPy validation. + +## Consequences + +- Circadian wake updates can prioritize harder batches without changing model topology. +- Sleep events become less dependent on manual tuning while staying deterministic. +- New behavior is opt-in, preserving existing experiments by default. diff --git a/docs/circadian-model-review-notes.md b/docs/circadian-model-review-notes.md new file mode 100644 index 0000000..7bb9670 --- /dev/null +++ b/docs/circadian-model-review-notes.md @@ -0,0 +1,38 @@ +# Circadian Model Review Notes + +Review source: `C:\Users\Avery\Downloads\Circadian model review.pdf` + +## Scope + +This note maps review recommendations to concrete repository changes and follow-up work. + +## Implemented in this pass + +1. Reward-modulated wake learning +- Added optional reward scaling in `CircadianPredictiveCodingNetwork` wake updates. +- Batch difficulty is measured from mean absolute output error relative to an EMA baseline. +- Scale is clipped via config to keep updates stable. +- Why this: gives a simple task-relevance signal without changing the core predictive-coding math. + +2. Adaptive sleep budget scaling +- Added optional scaling for split/prune budgets based on: + - energy plateau severity + - hidden chemical variance +- Preserves `max_split_per_sleep` and `max_prune_per_sleep` as hard caps. +- Why this: reduces manual schedule sensitivity while staying deterministic and lightweight. + +3. CLI exposure for new controls +- Added baseline CLI flags to toggle reward modulation and adaptive sleep budget behavior. + +4. ResNet circadian parity +- Added the same reward-modulated wake learning and adaptive sleep budget scaling to `CircadianPredictiveCodingHead`. +- Exposed benchmark CLI/config knobs so ResNet benchmark runs can enable the mechanisms. + +5. Test coverage +- Added unit tests for reward scaling behavior and adaptive budget expansion/contraction in both NumPy and Torch circadian paths. + +## Still pending (next iterations) + +1. Explore deeper-layer structural plasticity (current ResNet adaptation remains head-focused). +2. Prototype faster inference variants (reduced steps or amortized inference). +3. Evaluate reward-biased split/prune ranking directly (currently reward influences wake updates and importance EMA). diff --git a/docs/model-card.md b/docs/model-card.md index e25817c..aeef8ed 100644 --- a/docs/model-card.md +++ b/docs/model-card.md @@ -21,7 +21,9 @@ It tracks per-neuron chemical usage, modulates plasticity during wake, and appli - Extension: circadian dynamics: - chemical accumulation and decay - plasticity gating + - reward-modulated wake learning (optional) - adaptive sleep triggers + - adaptive sleep budget scaling (optional) - structural split/prune - optional rollback and homeostatic controls diff --git a/docs/modules/core.md b/docs/modules/core.md index c300e88..1070e4b 100644 --- a/docs/modules/core.md +++ b/docs/modules/core.md @@ -4,6 +4,7 @@ - Define model behavior (`BackpropMLP`, `PredictiveCodingNetwork`, `CircadianPredictiveCodingNetwork`) - Define ResNet-50 benchmark variants (`BackpropResNet50Classifier`, `PredictiveCodingResNet50Classifier`, `CircadianPredictiveCodingResNet50Classifier`) +- Implement circadian mechanisms such as chemical gating, reward-modulated wake updates, and adaptive sleep budgeting - Provide activation utilities - Define neuron adaptation interfaces and traffic summaries diff --git a/src/adapters/cli.py b/src/adapters/cli.py index 590037c..1f7dfcc 100644 --- a/src/adapters/cli.py +++ b/src/adapters/cli.py @@ -117,6 +117,37 @@ def build_argument_parser() -> argparse.ArgumentParser: type=float, default=circadian_defaults.sleep_chemical_variance_threshold, ) + parser.add_argument( + "--adaptive-sleep-budget", + action="store_true", + help="Scale split/prune budgets based on plateau severity and chemical variance.", + ) + parser.add_argument( + "--adaptive-sleep-budget-min-scale", + type=float, + default=circadian_defaults.adaptive_sleep_budget_min_scale, + ) + parser.add_argument( + "--adaptive-sleep-budget-max-scale", + type=float, + default=circadian_defaults.adaptive_sleep_budget_max_scale, + ) + + parser.add_argument( + "--reward-modulated-learning", + action="store_true", + help="Scale wake learning rate by batch difficulty relative to recent error baseline.", + ) + parser.add_argument( + "--reward-scale-min", + type=float, + default=circadian_defaults.reward_scale_min, + ) + parser.add_argument( + "--reward-scale-max", + type=float, + default=circadian_defaults.reward_scale_max, + ) parser.add_argument( "--split-weight-norm-mix", @@ -188,6 +219,12 @@ def main() -> None: sleep_energy_window=arguments.sleep_energy_window, sleep_plateau_delta=arguments.sleep_plateau_delta, sleep_chemical_variance_threshold=arguments.sleep_chemical_variance_threshold, + use_adaptive_sleep_budget=arguments.adaptive_sleep_budget, + adaptive_sleep_budget_min_scale=arguments.adaptive_sleep_budget_min_scale, + adaptive_sleep_budget_max_scale=arguments.adaptive_sleep_budget_max_scale, + use_reward_modulated_learning=arguments.reward_modulated_learning, + reward_scale_min=arguments.reward_scale_min, + reward_scale_max=arguments.reward_scale_max, split_weight_norm_mix=arguments.split_weight_norm_mix, prune_weight_norm_mix=arguments.prune_weight_norm_mix, prune_decay_steps=arguments.prune_decay_steps, diff --git a/src/adapters/resnet_benchmark_cli.py b/src/adapters/resnet_benchmark_cli.py index df50e20..8807ad0 100644 --- a/src/adapters/resnet_benchmark_cli.py +++ b/src/adapters/resnet_benchmark_cli.py @@ -113,6 +113,23 @@ def build_argument_parser() -> argparse.ArgumentParser: parser.add_argument("--circ-sleep-energy-window", type=int, default=48) parser.add_argument("--circ-sleep-plateau-delta", type=float, default=5e-5) parser.add_argument("--circ-sleep-chemical-variance-threshold", type=float, default=0.02) + parser.add_argument( + "--circ-use-adaptive-sleep-budget", + dest="circ_use_adaptive_sleep_budget", + action="store_true", + help="Enable adaptive split/prune budget scaling by plateau and chemical variance.", + ) + parser.add_argument( + "--circ-disable-adaptive-sleep-budget", + dest="circ_use_adaptive_sleep_budget", + action="store_false", + help="Disable adaptive split/prune budget scaling.", + ) + parser.set_defaults(circ_use_adaptive_sleep_budget=True) + parser.add_argument("--circ-adaptive-sleep-budget-min-scale", type=float, default=0.25) + parser.add_argument("--circ-adaptive-sleep-budget-max-scale", type=float, default=1.0) + parser.add_argument("--circ-adaptive-sleep-budget-plateau-weight", type=float, default=0.6) + parser.add_argument("--circ-adaptive-sleep-budget-variance-weight", type=float, default=0.4) parser.add_argument( "--circ-force-sleep", dest="circ_force_sleep", @@ -168,6 +185,15 @@ def build_argument_parser() -> argparse.ArgumentParser: parser.add_argument("--circ-plasticity-sensitivity-max", type=float, default=0.55) parser.add_argument("--circ-plasticity-importance-mix", type=float, default=0.50) parser.add_argument("--circ-min-plasticity", type=float, default=0.5) + parser.add_argument( + "--circ-use-reward-modulated-learning", + action="store_true", + help="Scale wake learning rate by batch difficulty relative to a moving baseline.", + ) + parser.add_argument("--circ-reward-baseline-decay", type=float, default=0.95) + parser.add_argument("--circ-reward-difficulty-exponent", type=float, default=1.0) + parser.add_argument("--circ-reward-scale-min", type=float, default=0.75) + parser.add_argument("--circ-reward-scale-max", type=float, default=1.5) parser.add_argument("--circ-use-adaptive-thresholds", action="store_true", default=None) parser.add_argument("--circ-adaptive-split-percentile", type=float, default=92.0) parser.add_argument("--circ-adaptive-prune-percentile", type=float, default=8.0) @@ -260,6 +286,15 @@ def main() -> None: circadian_sleep_chemical_variance_threshold=( args.circ_sleep_chemical_variance_threshold ), + circadian_use_adaptive_sleep_budget=args.circ_use_adaptive_sleep_budget, + circadian_adaptive_sleep_budget_min_scale=args.circ_adaptive_sleep_budget_min_scale, + circadian_adaptive_sleep_budget_max_scale=args.circ_adaptive_sleep_budget_max_scale, + circadian_adaptive_sleep_budget_plateau_weight=( + args.circ_adaptive_sleep_budget_plateau_weight + ), + circadian_adaptive_sleep_budget_variance_weight=( + args.circ_adaptive_sleep_budget_variance_weight + ), circadian_enable_sleep_rollback=args.circ_enable_sleep_rollback, circadian_sleep_rollback_tolerance=args.circ_sleep_rollback_tolerance, circadian_sleep_rollback_metric=args.circ_sleep_rollback_metric, @@ -291,6 +326,11 @@ def main() -> None: circadian_plasticity_sensitivity_max=args.circ_plasticity_sensitivity_max, circadian_plasticity_importance_mix=args.circ_plasticity_importance_mix, circadian_min_plasticity=args.circ_min_plasticity, + circadian_use_reward_modulated_learning=args.circ_use_reward_modulated_learning, + circadian_reward_baseline_decay=args.circ_reward_baseline_decay, + circadian_reward_difficulty_exponent=args.circ_reward_difficulty_exponent, + circadian_reward_scale_min=args.circ_reward_scale_min, + circadian_reward_scale_max=args.circ_reward_scale_max, circadian_use_adaptive_thresholds=( True if args.circ_use_adaptive_thresholds is None diff --git a/src/app/resnet50_benchmark.py b/src/app/resnet50_benchmark.py index af88f29..f3e7c06 100644 --- a/src/app/resnet50_benchmark.py +++ b/src/app/resnet50_benchmark.py @@ -70,6 +70,11 @@ class ResNet50BenchmarkConfig: circadian_sleep_energy_window: int = 48 circadian_sleep_plateau_delta: float = 5e-5 circadian_sleep_chemical_variance_threshold: float = 0.02 + circadian_use_adaptive_sleep_budget: bool = True + circadian_adaptive_sleep_budget_min_scale: float = 0.25 + circadian_adaptive_sleep_budget_max_scale: float = 1.0 + circadian_adaptive_sleep_budget_plateau_weight: float = 0.6 + circadian_adaptive_sleep_budget_variance_weight: float = 0.4 circadian_enable_sleep_rollback: bool = True circadian_sleep_rollback_tolerance: float = 0.002 circadian_sleep_rollback_metric: str = "cross_entropy" @@ -91,6 +96,11 @@ class ResNet50BenchmarkConfig: circadian_plasticity_sensitivity_max: float = 0.55 circadian_plasticity_importance_mix: float = 0.50 circadian_min_plasticity: float = 0.5 + circadian_use_reward_modulated_learning: bool = False + circadian_reward_baseline_decay: float = 0.95 + circadian_reward_difficulty_exponent: float = 1.0 + circadian_reward_scale_min: float = 0.75 + circadian_reward_scale_max: float = 1.5 circadian_use_adaptive_thresholds: bool = True circadian_adaptive_split_percentile: float = 92.0 circadian_adaptive_prune_percentile: float = 8.0 @@ -484,6 +494,11 @@ def _benchmark_circadian( plasticity_sensitivity_max=config.circadian_plasticity_sensitivity_max, plasticity_importance_mix=config.circadian_plasticity_importance_mix, min_plasticity=config.circadian_min_plasticity, + use_reward_modulated_learning=config.circadian_use_reward_modulated_learning, + reward_baseline_decay=config.circadian_reward_baseline_decay, + reward_difficulty_exponent=config.circadian_reward_difficulty_exponent, + reward_scale_min=config.circadian_reward_scale_min, + reward_scale_max=config.circadian_reward_scale_max, use_adaptive_thresholds=config.circadian_use_adaptive_thresholds, adaptive_split_percentile=config.circadian_adaptive_split_percentile, adaptive_prune_percentile=config.circadian_adaptive_prune_percentile, @@ -517,6 +532,11 @@ def _benchmark_circadian( sleep_energy_window=config.circadian_sleep_energy_window, sleep_plateau_delta=config.circadian_sleep_plateau_delta, sleep_chemical_variance_threshold=config.circadian_sleep_chemical_variance_threshold, + use_adaptive_sleep_budget=config.circadian_use_adaptive_sleep_budget, + adaptive_sleep_budget_min_scale=config.circadian_adaptive_sleep_budget_min_scale, + adaptive_sleep_budget_max_scale=config.circadian_adaptive_sleep_budget_max_scale, + adaptive_sleep_budget_plateau_weight=config.circadian_adaptive_sleep_budget_plateau_weight, + adaptive_sleep_budget_variance_weight=config.circadian_adaptive_sleep_budget_variance_weight, ) model = CircadianPredictiveCodingResNet50Classifier( num_classes=loaders.num_classes, @@ -879,6 +899,14 @@ def _validate_benchmark_config(config: ResNet50BenchmarkConfig) -> None: ) if not (0.0 <= config.circadian_plasticity_importance_mix <= 1.0): raise ValueError("circadian_plasticity_importance_mix must be between 0 and 1.") + if not (0.0 <= config.circadian_reward_baseline_decay < 1.0): + raise ValueError("circadian_reward_baseline_decay must be in [0, 1).") + if config.circadian_reward_difficulty_exponent <= 0.0: + raise ValueError("circadian_reward_difficulty_exponent must be positive.") + if config.circadian_reward_scale_min <= 0.0: + raise ValueError("circadian_reward_scale_min must be positive.") + if config.circadian_reward_scale_max < config.circadian_reward_scale_min: + raise ValueError("circadian_reward_scale_max must be >= circadian_reward_scale_min.") if config.circadian_sleep_warmup_steps < 0: raise ValueError("circadian_sleep_warmup_steps must be non-negative.") if not (0.0 <= config.circadian_sleep_split_only_until_fraction <= 1.0): @@ -907,6 +935,22 @@ def _validate_benchmark_config(config: ResNet50BenchmarkConfig) -> None: raise ValueError("circadian_sleep_plateau_delta must be non-negative.") if config.circadian_sleep_chemical_variance_threshold < 0.0: raise ValueError("circadian_sleep_chemical_variance_threshold must be non-negative.") + if config.circadian_adaptive_sleep_budget_min_scale <= 0.0: + raise ValueError("circadian_adaptive_sleep_budget_min_scale must be positive.") + if ( + config.circadian_adaptive_sleep_budget_max_scale + < config.circadian_adaptive_sleep_budget_min_scale + ): + raise ValueError( + "circadian_adaptive_sleep_budget_max_scale must be >= " + "circadian_adaptive_sleep_budget_min_scale." + ) + if config.circadian_adaptive_sleep_budget_max_scale > 1.0: + raise ValueError("circadian_adaptive_sleep_budget_max_scale must be <= 1.0.") + if config.circadian_adaptive_sleep_budget_plateau_weight < 0.0: + raise ValueError("circadian_adaptive_sleep_budget_plateau_weight must be non-negative.") + if config.circadian_adaptive_sleep_budget_variance_weight < 0.0: + raise ValueError("circadian_adaptive_sleep_budget_variance_weight must be non-negative.") if config.circadian_sleep_rollback_tolerance < 0.0: raise ValueError("circadian_sleep_rollback_tolerance must be non-negative.") if config.circadian_sleep_rollback_eval_batches < 0: diff --git a/src/core/circadian_predictive_coding.py b/src/core/circadian_predictive_coding.py index a59ea32..e41c404 100644 --- a/src/core/circadian_predictive_coding.py +++ b/src/core/circadian_predictive_coding.py @@ -53,6 +53,13 @@ class CircadianConfig: plasticity_importance_mix: float = 0.50 min_plasticity: float = 0.20 + # Optional reward modulation: scale wake updates toward harder batches. + use_reward_modulated_learning: bool = False + reward_baseline_decay: float = 0.95 + reward_difficulty_exponent: float = 1.0 + reward_scale_min: float = 0.75 + reward_scale_max: float = 1.5 + # Static thresholds are still available, but adaptive percentile thresholds # can be enabled to react to changing chemical distributions over training. use_adaptive_thresholds: bool = False @@ -90,6 +97,13 @@ class CircadianConfig: sleep_plateau_delta: float = 1e-3 sleep_chemical_variance_threshold: float = 0.02 + # Optional adaptive budget scaling for split/prune counts during sleep. + use_adaptive_sleep_budget: bool = False + adaptive_sleep_budget_min_scale: float = 0.25 + adaptive_sleep_budget_max_scale: float = 1.0 + adaptive_sleep_budget_plateau_weight: float = 0.6 + adaptive_sleep_budget_variance_weight: float = 0.4 + # Gradual pruning: marked neurons decay for a few epochs before removal. prune_decay_steps: int = 1 prune_decay_factor: float = 0.60 @@ -174,6 +188,8 @@ def __init__( self._epoch_count = 0 self._epochs_since_sleep = 0 self._energy_history: list[float] = [] + self._reward_error_ema: float | None = None + self._last_reward_scale = 1.0 self._replay_memory: deque[ReplaySnapshot] = deque( maxlen=self.config.replay_memory_size ) @@ -251,17 +267,20 @@ def _run_training_step( grad_hidden_bias = np.sum(hidden_prior_gradient, axis=0, keepdims=True) / sample_count self._update_chemical_layer(hidden_state) - self._update_importance_ema(grad_hidden_output) + reward_scale = self._compute_reward_scale(output_error) + self._last_reward_scale = reward_scale + self._update_importance_ema(grad_hidden_output, reward_scale=reward_scale) plasticity = self.get_plasticity_state() gated_input_hidden = grad_input_hidden * plasticity[np.newaxis, :] gated_hidden_output = grad_hidden_output * plasticity[:, np.newaxis] gated_hidden_bias = grad_hidden_bias * plasticity[np.newaxis, :] + effective_learning_rate = learning_rate * reward_scale - self.weight_hidden_output -= learning_rate * gated_hidden_output - self.bias_output -= learning_rate * grad_output_bias - self.weight_input_hidden -= learning_rate * gated_input_hidden - self.bias_hidden -= learning_rate * gated_hidden_bias + self.weight_hidden_output -= effective_learning_rate * gated_hidden_output + self.bias_output -= effective_learning_rate * grad_output_bias + self.weight_input_hidden -= effective_learning_rate * gated_input_hidden + self.bias_hidden -= effective_learning_rate * gated_hidden_bias self._record_hidden_traffic(hidden_state) energy = self._compute_energy( @@ -400,6 +419,10 @@ def get_plasticity_state(self) -> Array: plasticity = np.exp(-sensitivity * self._hidden_chemical) return np.clip(plasticity, self.config.min_plasticity, 1.0) + def get_last_reward_scale(self) -> float: + """Return most recent reward modulation scale from wake training.""" + return float(self._last_reward_scale) + def _compute_plasticity_sensitivity(self) -> Array: if not self.config.use_adaptive_plasticity_sensitivity: return np.full_like(self._hidden_chemical, self.config.plasticity_sensitivity) @@ -439,11 +462,29 @@ def _record_hidden_traffic(self, hidden_state: Array) -> None: self._traffic_sum += np.mean(np.abs(hidden_state), axis=0) self._traffic_steps += 1 - def _update_importance_ema(self, grad_hidden_output: Array) -> None: - importance = np.mean(np.abs(grad_hidden_output), axis=1) + def _update_importance_ema(self, grad_hidden_output: Array, reward_scale: float) -> None: + importance = np.mean(np.abs(grad_hidden_output), axis=1) * float(reward_scale) decay = self.config.importance_ema_decay self._importance_ema = decay * self._importance_ema + (1.0 - decay) * importance + def _compute_reward_scale(self, output_error: Array) -> float: + if not self.config.use_reward_modulated_learning: + return 1.0 + + batch_error = float(np.mean(np.abs(output_error))) + baseline = batch_error if self._reward_error_ema is None else self._reward_error_ema + difficulty_ratio = batch_error / max(float(baseline), 1e-8) + raw_scale = difficulty_ratio ** self.config.reward_difficulty_exponent + reward_scale = float( + np.clip(raw_scale, self.config.reward_scale_min, self.config.reward_scale_max) + ) + + # Why this: update baseline after computing ratio so scale reflects + # current surprise against past performance, not a blended present. + decay = self.config.reward_baseline_decay + self._reward_error_ema = decay * float(baseline) + (1.0 - decay) * batch_error + return reward_scale + def _update_chemical_layer(self, hidden_state: Array) -> None: activity = np.mean(np.abs(hidden_state), axis=0) if not self.config.use_dual_chemical: @@ -575,8 +616,13 @@ def _resolve_split_prune_thresholds(self) -> tuple[float, float]: def _resolve_sleep_budgets( self, current_step: int | None, total_steps: int | None ) -> tuple[int, int, bool]: - split_budget = self._resolve_structural_budget(self.config.max_split_per_sleep) - prune_budget = self._resolve_structural_budget(self.config.max_prune_per_sleep) + budget_scale = self._compute_adaptive_sleep_budget_scale() + split_budget = self._resolve_structural_budget( + self.config.max_split_per_sleep, budget_scale=budget_scale + ) + prune_budget = self._resolve_structural_budget( + self.config.max_prune_per_sleep, budget_scale=budget_scale + ) if current_step is None or total_steps is None or total_steps <= 0: return split_budget, prune_budget, False @@ -590,15 +636,58 @@ def _resolve_sleep_budgets( split_budget = 0 return split_budget, prune_budget, False - def _resolve_structural_budget(self, configured_limit: int) -> int: + def _compute_adaptive_sleep_budget_scale(self) -> float: + if not self.config.use_adaptive_sleep_budget: + return 1.0 + if len(self._energy_history) < self.config.sleep_energy_window: + return 1.0 + + recent = self._energy_history[-self.config.sleep_energy_window :] + energy_improvement = max(0.0, float(recent[0] - recent[-1])) + + plateau_delta = self.config.sleep_plateau_delta + if plateau_delta <= 1e-12: + plateau_score = 1.0 if energy_improvement <= 0.0 else 0.0 + else: + plateau_score = float( + np.clip((plateau_delta - energy_improvement) / plateau_delta, 0.0, 1.0) + ) + + variance_threshold = self.config.sleep_chemical_variance_threshold + if variance_threshold <= 1e-12: + variance_score = 1.0 + else: + chemical_variance = float(np.var(self._hidden_chemical)) + variance_score = float(np.clip(chemical_variance / variance_threshold, 0.0, 1.0)) + + plateau_weight = max(0.0, self.config.adaptive_sleep_budget_plateau_weight) + variance_weight = max(0.0, self.config.adaptive_sleep_budget_variance_weight) + total_weight = plateau_weight + variance_weight + if total_weight <= 1e-12: + combined_signal = 1.0 + else: + combined_signal = ( + plateau_weight * plateau_score + variance_weight * variance_score + ) / total_weight + + min_scale = self.config.adaptive_sleep_budget_min_scale + max_scale = self.config.adaptive_sleep_budget_max_scale + return float(min_scale + (max_scale - min_scale) * combined_signal) + + def _resolve_structural_budget(self, configured_limit: int, budget_scale: float) -> int: if configured_limit <= 0: return 0 + if budget_scale <= 0.0: + return 0 + scaled_limit = int(np.floor(float(configured_limit) * budget_scale)) + if scaled_limit <= 0: + scaled_limit = 1 fraction = self.config.sleep_max_change_fraction if fraction <= 0.0: return 0 by_fraction = int(np.floor(float(self.hidden_dim) * fraction)) by_fraction = max(by_fraction, int(self.config.sleep_min_change_count)) - return min(int(configured_limit), int(by_fraction)) + return min(int(scaled_limit), int(by_fraction)) def _compute_split_scores(self) -> Array: chemical_component = self._normalize_vector(self._hidden_chemical) @@ -958,6 +1047,14 @@ def _validate_config(self, config: CircadianConfig) -> None: raise ValueError("plasticity_importance_mix must be between 0 and 1") if not (0.0 < config.min_plasticity <= 1.0): raise ValueError("min_plasticity must be in (0, 1]") + if not (0.0 <= config.reward_baseline_decay < 1.0): + raise ValueError("reward_baseline_decay must be in [0, 1)") + if config.reward_difficulty_exponent <= 0.0: + raise ValueError("reward_difficulty_exponent must be positive") + if config.reward_scale_min <= 0.0: + raise ValueError("reward_scale_min must be positive") + if config.reward_scale_max < config.reward_scale_min: + raise ValueError("reward_scale_max must be >= reward_scale_min") if not (0.0 <= config.split_weight_norm_mix <= 1.0): raise ValueError("split_weight_norm_mix must be between 0 and 1") if not (0.0 <= config.prune_weight_norm_mix <= 1.0): @@ -984,6 +1081,16 @@ def _validate_config(self, config: CircadianConfig) -> None: raise ValueError("sleep_plateau_delta must be non-negative") if config.sleep_chemical_variance_threshold < 0.0: raise ValueError("sleep_chemical_variance_threshold must be non-negative") + if config.adaptive_sleep_budget_min_scale <= 0.0: + raise ValueError("adaptive_sleep_budget_min_scale must be positive") + if config.adaptive_sleep_budget_max_scale < config.adaptive_sleep_budget_min_scale: + raise ValueError("adaptive_sleep_budget_max_scale must be >= min scale") + if config.adaptive_sleep_budget_max_scale > 1.0: + raise ValueError("adaptive_sleep_budget_max_scale must be <= 1.0") + if config.adaptive_sleep_budget_plateau_weight < 0.0: + raise ValueError("adaptive_sleep_budget_plateau_weight must be non-negative") + if config.adaptive_sleep_budget_variance_weight < 0.0: + raise ValueError("adaptive_sleep_budget_variance_weight must be non-negative") if config.max_split_per_sleep < 0 or config.max_prune_per_sleep < 0: raise ValueError("max split/prune per sleep must be non-negative") if config.split_noise_scale < 0.0: diff --git a/src/core/resnet50_variants.py b/src/core/resnet50_variants.py index 7fa3476..7f89194 100644 --- a/src/core/resnet50_variants.py +++ b/src/core/resnet50_variants.py @@ -28,6 +28,11 @@ class CircadianHeadConfig: plasticity_sensitivity_max: float = 1.20 plasticity_importance_mix: float = 0.50 min_plasticity: float = 0.20 + use_reward_modulated_learning: bool = False + reward_baseline_decay: float = 0.95 + reward_difficulty_exponent: float = 1.0 + reward_scale_min: float = 0.75 + reward_scale_max: float = 1.5 use_adaptive_thresholds: bool = False adaptive_split_percentile: float = 85.0 adaptive_prune_percentile: float = 20.0 @@ -61,6 +66,11 @@ class CircadianHeadConfig: sleep_energy_window: int = 32 sleep_plateau_delta: float = 1e-4 sleep_chemical_variance_threshold: float = 0.02 + use_adaptive_sleep_budget: bool = False + adaptive_sleep_budget_min_scale: float = 0.25 + adaptive_sleep_budget_max_scale: float = 1.0 + adaptive_sleep_budget_plateau_weight: float = 0.6 + adaptive_sleep_budget_variance_weight: float = 0.4 @dataclass(frozen=True) @@ -297,6 +307,8 @@ def __init__( self._prune_cooldown = torch.zeros(hidden_dim, dtype=torch.int32, device=device) self._steps_since_sleep = 0 self._energy_history: list[float] = [] + self._reward_error_ema: float | None = None + self._last_reward_scale = 1.0 generator_device = "cuda" if str(device).startswith("cuda") else "cpu" generator = torch.Generator(device=generator_device) generator.manual_seed(seed + 9_999) @@ -341,17 +353,24 @@ def train_step( grad_hidden_bias = torch.mean(hidden_prior_grad, dim=0, keepdim=True) self._update_chemical(hidden_state) - self._update_importance_ema(grad_hidden_output) + reward_scale = self._compute_reward_scale(output_error) + self._last_reward_scale = reward_scale + self._update_importance_ema(grad_hidden_output, reward_scale=reward_scale) plasticity = self._plasticity() grad_hidden_output = grad_hidden_output * plasticity[:, None] grad_feature_hidden = grad_feature_hidden * plasticity[None, :] grad_hidden_bias = grad_hidden_bias * plasticity[None, :] + effective_learning_rate = learning_rate * reward_scale - self.weight_hidden_output = self.weight_hidden_output - learning_rate * grad_hidden_output - self.bias_output = self.bias_output - learning_rate * grad_output_bias - self.weight_feature_hidden = self.weight_feature_hidden - learning_rate * grad_feature_hidden - self.bias_hidden = self.bias_hidden - learning_rate * grad_hidden_bias + self.weight_hidden_output = ( + self.weight_hidden_output - effective_learning_rate * grad_hidden_output + ) + self.bias_output = self.bias_output - effective_learning_rate * grad_output_bias + self.weight_feature_hidden = ( + self.weight_feature_hidden - effective_learning_rate * grad_feature_hidden + ) + self.bias_hidden = self.bias_hidden - effective_learning_rate * grad_hidden_bias self._traffic_sum = self._traffic_sum + torch.mean(torch.abs(hidden_state), dim=0) self._traffic_steps += 1 @@ -444,6 +463,8 @@ def snapshot_state(self) -> dict[str, Any]: "prune_cooldown": self._prune_cooldown.clone(), "steps_since_sleep": self._steps_since_sleep, "energy_history": list(self._energy_history), + "reward_error_ema": self._reward_error_ema, + "last_reward_scale": self._last_reward_scale, } def restore_state(self, state: dict[str, Any]) -> None: @@ -462,10 +483,16 @@ def restore_state(self, state: dict[str, Any]) -> None: self._prune_cooldown = cast(Any, state["prune_cooldown"]).clone() self._steps_since_sleep = int(state["steps_since_sleep"]) self._energy_history = list(cast(list[float], state["energy_history"])) + reward_error_ema = state.get("reward_error_ema") + self._reward_error_ema = None if reward_error_ema is None else float(reward_error_ema) + self._last_reward_scale = float(state.get("last_reward_scale", 1.0)) def mean_chemical(self) -> Any: return self._chemical.clone() + def last_reward_scale(self) -> float: + return float(self._last_reward_scale) + def _plasticity(self) -> Any: torch = self._torch sensitivity = self._plasticity_sensitivity_vector() @@ -485,12 +512,28 @@ def _plasticity_sensitivity_vector(self) -> Any: base = torch.full_like(self._chemical, self.config.plasticity_sensitivity_min) return base + span * stability - def _update_importance_ema(self, grad_hidden_output: Any) -> None: + def _update_importance_ema(self, grad_hidden_output: Any, reward_scale: float) -> None: torch = self._torch - importance = torch.mean(torch.abs(grad_hidden_output), dim=1) + importance = torch.mean(torch.abs(grad_hidden_output), dim=1) * reward_scale decay = self.config.importance_ema_decay self._importance_ema = decay * self._importance_ema + (1.0 - decay) * importance + def _compute_reward_scale(self, output_error: Any) -> float: + torch = self._torch + if not self.config.use_reward_modulated_learning: + return 1.0 + + batch_error = float(torch.mean(torch.abs(output_error)).item()) + baseline = batch_error if self._reward_error_ema is None else self._reward_error_ema + difficulty_ratio = batch_error / max(float(baseline), 1e-8) + raw_scale = difficulty_ratio ** self.config.reward_difficulty_exponent + reward_scale = float( + max(self.config.reward_scale_min, min(self.config.reward_scale_max, raw_scale)) + ) + decay = self.config.reward_baseline_decay + self._reward_error_ema = decay * float(baseline) + (1.0 - decay) * batch_error + return reward_scale + def _update_chemical(self, hidden_state: Any) -> None: torch = self._torch activity = torch.mean(torch.abs(hidden_state), dim=0) @@ -601,8 +644,13 @@ def _resolve_split_prune_thresholds(self) -> tuple[float, float]: def _resolve_sleep_budgets( self, current_step: int | None, total_steps: int | None ) -> tuple[int, int, bool]: - split_budget = self._resolve_structural_budget(self.config.max_split_per_sleep) - prune_budget = self._resolve_structural_budget(self.config.max_prune_per_sleep) + budget_scale = self._compute_adaptive_sleep_budget_scale() + split_budget = self._resolve_structural_budget( + self.config.max_split_per_sleep, budget_scale=budget_scale + ) + prune_budget = self._resolve_structural_budget( + self.config.max_prune_per_sleep, budget_scale=budget_scale + ) if current_step is None or total_steps is None or total_steps <= 0: return split_budget, prune_budget, False @@ -615,15 +663,56 @@ def _resolve_sleep_budgets( split_budget = 0 return split_budget, prune_budget, False - def _resolve_structural_budget(self, configured_limit: int) -> int: + def _compute_adaptive_sleep_budget_scale(self) -> float: + torch = self._torch + if not self.config.use_adaptive_sleep_budget: + return 1.0 + if len(self._energy_history) < self.config.sleep_energy_window: + return 1.0 + + recent = self._energy_history[-self.config.sleep_energy_window :] + energy_improvement = max(0.0, float(recent[0] - recent[-1])) + plateau_delta = self.config.sleep_plateau_delta + if plateau_delta <= 1e-12: + plateau_score = 1.0 if energy_improvement <= 0.0 else 0.0 + else: + plateau_score = max(0.0, min(1.0, (plateau_delta - energy_improvement) / plateau_delta)) + + variance_threshold = self.config.sleep_chemical_variance_threshold + if variance_threshold <= 1e-12: + variance_score = 1.0 + else: + chemical_variance = float(torch.var(self._chemical).item()) + variance_score = max(0.0, min(1.0, chemical_variance / variance_threshold)) + + plateau_weight = max(0.0, self.config.adaptive_sleep_budget_plateau_weight) + variance_weight = max(0.0, self.config.adaptive_sleep_budget_variance_weight) + total_weight = plateau_weight + variance_weight + if total_weight <= 1e-12: + combined_signal = 1.0 + else: + combined_signal = ( + plateau_weight * plateau_score + variance_weight * variance_score + ) / total_weight + + min_scale = self.config.adaptive_sleep_budget_min_scale + max_scale = self.config.adaptive_sleep_budget_max_scale + return float(min_scale + (max_scale - min_scale) * combined_signal) + + def _resolve_structural_budget(self, configured_limit: int, budget_scale: float) -> int: if configured_limit <= 0: return 0 + if budget_scale <= 0.0: + return 0 + scaled_limit = int(float(configured_limit) * budget_scale) + if scaled_limit <= 0: + scaled_limit = 1 fraction = self.config.sleep_max_change_fraction if fraction <= 0.0: return 0 by_fraction = int(float(self.hidden_dim) * fraction) by_fraction = max(by_fraction, int(self.config.sleep_min_change_count)) - return min(int(configured_limit), int(by_fraction)) + return min(int(scaled_limit), int(by_fraction)) def _compute_split_scores(self) -> Any: norm_scores = self._normalize_tensor(self._row_norm(self.weight_hidden_output)) @@ -828,6 +917,14 @@ def _validate_config(self, config: CircadianHeadConfig) -> None: raise ValueError("plasticity_importance_mix must be between 0 and 1.") if not (0.0 < config.min_plasticity <= 1.0): raise ValueError("min_plasticity must be in (0, 1].") + if not (0.0 <= config.reward_baseline_decay < 1.0): + raise ValueError("reward_baseline_decay must be in [0, 1).") + if config.reward_difficulty_exponent <= 0.0: + raise ValueError("reward_difficulty_exponent must be positive.") + if config.reward_scale_min <= 0.0: + raise ValueError("reward_scale_min must be positive.") + if config.reward_scale_max < config.reward_scale_min: + raise ValueError("reward_scale_max must be >= reward_scale_min.") if not (0.0 <= config.adaptive_split_percentile <= 100.0): raise ValueError("adaptive_split_percentile must be between 0 and 100.") if not (0.0 <= config.adaptive_prune_percentile <= 100.0): @@ -882,6 +979,16 @@ def _validate_config(self, config: CircadianHeadConfig) -> None: raise ValueError("sleep_plateau_delta must be non-negative.") if config.sleep_chemical_variance_threshold < 0.0: raise ValueError("sleep_chemical_variance_threshold must be non-negative.") + if config.adaptive_sleep_budget_min_scale <= 0.0: + raise ValueError("adaptive_sleep_budget_min_scale must be positive.") + if config.adaptive_sleep_budget_max_scale < config.adaptive_sleep_budget_min_scale: + raise ValueError("adaptive_sleep_budget_max_scale must be >= min scale.") + if config.adaptive_sleep_budget_max_scale > 1.0: + raise ValueError("adaptive_sleep_budget_max_scale must be <= 1.0.") + if config.adaptive_sleep_budget_plateau_weight < 0.0: + raise ValueError("adaptive_sleep_budget_plateau_weight must be non-negative.") + if config.adaptive_sleep_budget_variance_weight < 0.0: + raise ValueError("adaptive_sleep_budget_variance_weight must be non-negative.") class PredictiveCodingResNet50Classifier: diff --git a/tests/test_circadian_predictive_coding.py b/tests/test_circadian_predictive_coding.py index 18c5b29..a03c0c3 100644 --- a/tests/test_circadian_predictive_coding.py +++ b/tests/test_circadian_predictive_coding.py @@ -317,3 +317,88 @@ def test_should_reduce_plasticity_for_high_importance_when_adaptive_sensitivity_ plasticity = model.get_plasticity_state() assert float(plasticity[0]) < float(plasticity[1]) + + +def test_should_scale_learning_rate_by_reward_signal_for_easy_vs_hard_batches() -> None: + config = CircadianConfig( + use_reward_modulated_learning=True, + reward_baseline_decay=0.95, + reward_scale_min=0.8, + reward_scale_max=1.6, + ) + model = CircadianPredictiveCodingNetwork( + input_dim=2, + hidden_dim=5, + seed=54, + circadian_config=config, + ) + train_input = np.array( + [[0.2, -0.1], [0.3, 0.4], [-0.2, 0.8], [0.7, -0.5]], + dtype=np.float64, + ) + + model._reward_error_ema = 1.0 + easy_target = model.predict_proba(train_input) + model.train_epoch( + input_batch=train_input, + target_batch=easy_target, + learning_rate=0.03, + inference_steps=8, + inference_learning_rate=0.2, + ) + easy_scale = model.get_last_reward_scale() + + model._reward_error_ema = 0.05 + hard_target = 1.0 - easy_target + model.train_epoch( + input_batch=train_input, + target_batch=hard_target, + learning_rate=0.03, + inference_steps=8, + inference_learning_rate=0.2, + ) + hard_scale = model.get_last_reward_scale() + + assert easy_scale <= config.reward_scale_min + 1e-6 + assert hard_scale > 1.0 + + +def test_should_expand_sleep_budget_when_plateau_and_chemical_variance_are_high() -> None: + config = CircadianConfig( + use_adaptive_sleep_budget=True, + max_split_per_sleep=4, + max_prune_per_sleep=4, + sleep_energy_window=3, + sleep_plateau_delta=0.1, + sleep_chemical_variance_threshold=0.05, + adaptive_sleep_budget_min_scale=0.25, + adaptive_sleep_budget_max_scale=1.0, + adaptive_sleep_budget_plateau_weight=0.5, + adaptive_sleep_budget_variance_weight=0.5, + ) + model = CircadianPredictiveCodingNetwork( + input_dim=2, + hidden_dim=10, + seed=71, + circadian_config=config, + max_hidden_dim=20, + ) + + model._energy_history = [1.0, 0.7, 0.3] + model.set_chemical_state(np.full(10, 0.2, dtype=np.float64)) + low_split_budget, low_prune_budget, _ = model._resolve_sleep_budgets( + current_step=None, + total_steps=None, + ) + + model._energy_history = [0.5, 0.5, 0.5] + model.set_chemical_state(np.array([0.0, 1.0] * 5, dtype=np.float64)) + high_split_budget, high_prune_budget, _ = model._resolve_sleep_budgets( + current_step=None, + total_steps=None, + ) + + assert low_split_budget == 1 + assert low_prune_budget == 1 + assert high_split_budget == 4 + assert high_prune_budget == 4 diff --git a/tests/test_resnet50_variants.py b/tests/test_resnet50_variants.py index 118807d..dbdcd71 100644 --- a/tests/test_resnet50_variants.py +++ b/tests/test_resnet50_variants.py @@ -159,3 +159,96 @@ def test_should_restore_snapshot_after_structure_change_in_torch_head() -> None: assert torch.allclose(head.weight_feature_hidden, snapshot["weight_feature_hidden"]) assert torch.allclose(head.weight_hidden_output, snapshot["weight_hidden_output"]) assert torch.allclose(head._chemical, snapshot["chemical"]) + + +def test_should_scale_learning_rate_by_reward_signal_in_torch_head() -> None: + torch = pytest.importorskip("torch") + device = torch.device("cpu") + config = CircadianHeadConfig( + use_reward_modulated_learning=True, + reward_baseline_decay=0.95, + reward_scale_min=0.8, + reward_scale_max=1.6, + ) + head = CircadianPredictiveCodingHead( + feature_dim=6, + hidden_dim=5, + num_classes=2, + device=device, + seed=37, + config=config, + min_hidden_dim=4, + max_hidden_dim=10, + ) + features = torch.tensor( + [[0.3, -0.2, 0.5, 0.1, -0.4, 0.2], [-0.1, 0.4, 0.2, -0.6, 0.8, 0.3]], + dtype=torch.float32, + ) + labels = torch.tensor([0, 1], dtype=torch.long) + + head._reward_error_ema = 10.0 + head.train_step( + features=features, + targets=labels, + learning_rate=0.03, + inference_steps=8, + inference_learning_rate=0.2, + ) + easy_scale = head.last_reward_scale() + + head._reward_error_ema = 0.01 + head.train_step( + features=features, + targets=labels, + learning_rate=0.03, + inference_steps=8, + inference_learning_rate=0.2, + ) + hard_scale = head.last_reward_scale() + + assert easy_scale <= config.reward_scale_min + 1e-6 + assert hard_scale > 1.0 + + +def test_should_expand_sleep_budget_when_plateau_and_variance_are_high_in_torch_head() -> None: + torch = pytest.importorskip("torch") + device = torch.device("cpu") + config = CircadianHeadConfig( + use_adaptive_sleep_budget=True, + max_split_per_sleep=4, + max_prune_per_sleep=4, + sleep_energy_window=3, + sleep_plateau_delta=0.1, + sleep_chemical_variance_threshold=0.05, + adaptive_sleep_budget_min_scale=0.25, + adaptive_sleep_budget_max_scale=1.0, + adaptive_sleep_budget_plateau_weight=0.5, + adaptive_sleep_budget_variance_weight=0.5, + ) + head = CircadianPredictiveCodingHead( + feature_dim=6, + hidden_dim=10, + num_classes=2, + device=device, + seed=41, + config=config, + min_hidden_dim=4, + max_hidden_dim=20, + ) + + head._energy_history = [1.0, 0.7, 0.3] + head._chemical = torch.full((10,), 0.2, dtype=torch.float32) + low_split_budget, low_prune_budget, _ = head._resolve_sleep_budgets( + current_step=None, total_steps=None + ) + + head._energy_history = [0.5, 0.5, 0.5] + head._chemical = torch.tensor([0.0, 1.0] * 5, dtype=torch.float32) + high_split_budget, high_prune_budget, _ = head._resolve_sleep_budgets( + current_step=None, total_steps=None + ) + + assert low_split_budget == 1 + assert low_prune_budget == 1 + assert high_split_budget == 4 + assert high_prune_budget == 4