Add agent instruction files for notebook-to-Hyrax conversion#8
Add agent instruction files for notebook-to-Hyrax conversion#8
Conversation
Adds AGENTS.md (primary agent instruction file covering the three-file output layout, HyraxDataset/hyrax_model requirements, prepare_inputs pattern, TOML namespacing, and anti-patterns), HYRAX_CONTRACTS.md (typed interface reference), CONVERSION_PROMPT.md (fill-in-the-blanks user prompt template), and docs/notebook_to_hyrax_guide.md (four before/after worked examples). Also adds machine-targeted inline comments in galaxy10_dataset.py and vgg11.py at the three most failure-prone points: super().__init__ ordering, @hyrax_model decorator, and prepare_inputs responsibility. https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8 +/- ##
===========================================
+ Coverage 25.00% 95.91% +70.91%
===========================================
Files 2 3 +1
Lines 72 98 +26
===========================================
+ Hits 18 94 +76
+ Misses 54 4 -50 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a set of agent- and user-facing documentation files to standardize “notebook → Hyrax” conversions, plus inline “HYRAX REQUIREMENT” comments in the existing example dataset/model at common failure points.
Changes:
- Added agent instruction/spec files:
AGENTS.md,HYRAX_CONTRACTS.md, andCONVERSION_PROMPT.md. - Added a worked conversion guide with before/after examples:
docs/notebook_to_hyrax_guide.md. - Added inline requirement comments in the canonical example dataset/model to reinforce Hyrax integration constraints.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/external_hyrax_example/models/vgg11.py |
Adds inline comments emphasizing @hyrax_model and prepare_inputs responsibilities. |
src/external_hyrax_example/datasets/galaxy10_dataset.py |
Adds inline comment emphasizing super().__init__(config) ordering in dataset init. |
AGENTS.md |
New primary agent conversion instruction document (three-file layout, data flow, config rules, anti-patterns). |
HYRAX_CONTRACTS.md |
New typed contracts/spec reference for dataset/model interfaces and config namespacing. |
CONVERSION_PROMPT.md |
New fill-in conversion prompt template for users. |
docs/notebook_to_hyrax_guide.md |
New documentation with four notebook-to-Hyrax mapping examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [<package_name>.<DatasetClassName>] | ||
| channels_first = false | ||
|
|
||
| # libpath to specify in runtime config when using this dataset | ||
| # name = "<package_name>.datasets.<module>.<ClassName>" | ||
| ``` | ||
|
|
||
| Key rules: | ||
| - The outer section `[<package_name>]` must exist even if empty. | ||
| - Section names use the **exact Python class name** (e.g. `VGG11`, not `vgg11`). | ||
| - The commented-out `name = "..."` lines are **not active config**. They are | ||
| documentation showing the libpath a user would paste into their runtime config | ||
| to activate this class. Do not uncomment them in the default config. | ||
| - Access config values in code as: | ||
| `config["<package_name>"]["<ClassName>"]["<key>"]` | ||
| - Do not use flat keys like `config["key"]` or `self.config["dropout"]`. |
There was a problem hiding this comment.
The config namespacing rules here require using the exact Python class name for sections/lookup (e.g. config["<package_name>"]["<ClassName>"]). However, the repository’s canonical dataset example referenced below uses config["external_hyrax_example"]["galaxy10_dataset"]["channels_first"] and the TOML section [external_hyrax_example.galaxy10_dataset] (module name, not class name). Please reconcile these so agents don’t generate configs that disagree with the in-repo example—either update the example to use the class name (e.g. Galaxy10Dataset) or relax/clarify the rule (e.g. “use a consistent section key; this repo uses class names for models and module names for datasets”).
| - `config`: The full Hyrax config dict. Access your values as | ||
| `config["<package_name>"]["<ClassName>"]["<key>"]`. | ||
| - `data_location`: Path to the directory (or file) containing the dataset. | ||
| Typically convert with `Path(data_location)` immediately. | ||
| - **`super().__init__(config)` must be the last statement** in `__init__`. | ||
| The base class introspects the subclass after all attributes are set. |
There was a problem hiding this comment.
The dataset config access pattern specified here (config["<package_name>"]["<ClassName>"]["<key>"]) conflicts with the repo’s canonical dataset example (src/external_hyrax_example/datasets/galaxy10_dataset.py) which reads config["external_hyrax_example"]["galaxy10_dataset"]["channels_first"] and has a TOML section [external_hyrax_example.galaxy10_dataset]. Please update this contract doc to match the actual conventions used in this repository, or update the example code/config so they conform to the contract.
| ## Config namespacing | ||
|
|
||
| All config keys are accessed via three-level namespacing: | ||
|
|
||
| ```python | ||
| config["<package_name>"]["<ClassName>"]["<key>"] | ||
| ``` | ||
|
|
||
| Example for a package `my_survey_cnn` with a model class `ResNet18`: | ||
|
|
||
| ```python | ||
| dropout = config["my_survey_cnn"]["ResNet18"]["dropout"] | ||
| num_classes = config["my_survey_cnn"]["ResNet18"]["num_classes"] | ||
| ``` | ||
|
|
||
| The corresponding TOML section: | ||
|
|
||
| ```toml | ||
| [my_survey_cnn.ResNet18] | ||
| dropout = 0.3 | ||
| num_classes = 5 | ||
| ``` | ||
|
|
||
| Never access config with flat keys (`config["dropout"]`). The flat access pattern | ||
| will raise a `KeyError` at runtime because Hyrax always nests config under the | ||
| package and class names. |
There was a problem hiding this comment.
The “Config namespacing” section states that all config keys are nested under the package and class name. In this repository’s canonical dataset example, the dataset config is nested under the module key galaxy10_dataset (see src/external_hyrax_example/default_config.toml and src/external_hyrax_example/datasets/galaxy10_dataset.py), not the dataset class name. Please clarify the rule (or update the canonical example) so agents don’t generate configs that won’t work with the example patterns.
AGENTS.md
Outdated
|
|
||
| Return images from `get_image` as `np.float32` normalized to `[0, 1]`. | ||
| If `channels_first` is configured, return shape `(C, H, W)`; otherwise `(H, W, C)`. | ||
| Check `config["<package_name>"]["<dataset_class_name>"]["channels_first"]`. |
There was a problem hiding this comment.
channels_first lookup guidance uses config["<package_name>"]["<dataset_class_name>"], but the in-repo dataset example uses config["external_hyrax_example"]["galaxy10_dataset"] (module key). Please align this guidance with the canonical example or explicitly document the expected section key for datasets.
| Check `config["<package_name>"]["<dataset_class_name>"]["channels_first"]`. | |
| Check `config["<module_name>"]["<dataset_name>"]["channels_first"]`, e.g. `config["external_hyrax_example"]["galaxy10_dataset"]["channels_first"]`. |
Adds test_dataset_contracts.py (12 tests) and test_model_contracts.py (17 tests) that verify the HyraxDataset and @hyrax_model interface contracts without requiring the real Galaxy10.h5 file. Dataset tests use a mocked h5py.File; model tests use synthetic random tensors. Both files are structured as templates that users can adapt when verifying their own converted classes. https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4
Reverting the addition of test_dataset_contracts.py and test_model_contracts.py to reconsider the approach — tests were too specific to Galaxy10Dataset/VGG11 rather than being general enough to apply to any converted class. https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4
- Add tests/external_hyrax_example/hyrax_contract_helpers/ with two abstract pytest mixin classes: HyraxDatasetContractTests and HyraxModelContractTests. These are copy-paste templates users and agents can drop into any project to verify a HyraxDataset subclass or @hyrax_model class satisfies the interface. - HyraxDatasetContractTests: configurable via getter_names (list of getter method names to verify) and n_samples. Tests inheritance, __len__, get_object_id uniqueness, and that each getter returns numpy-based data. - HyraxModelContractTests: configurable via has_validate_batch / has_test_batch. Tests prepare_inputs is a staticmethod, forward/train_batch/infer_batch return expected types, and that all scalar values in *_batch dicts are finite. - Add worked examples: test_galaxy10_contracts.py and test_vgg11_contracts.py. - Update AGENTS.md: replace image-centric "Required methods" table and "Image format" section with field-derived getter guidance and a general "Data format" section. Add a "Verifying your conversion" section with copy-paste setup instructions for the contract test helpers. - Update CONVERSION_PROMPT.md: generalize the data-fields section to encourage listing all per-sample fields, and add a verification step. https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4
Summary
AGENTS.md— primary agent instruction file covering the three-file output layout,HyraxDataset/@hyrax_modelrequirements, theprepare_inputs/forward/train_batchdata-flow pattern, TOML namespacing rules, and an explicit anti-pattern tableHYRAX_CONTRACTS.md— typed interface reference (signatures, return types,data_dictshape) for agents and advanced usersCONVERSION_PROMPT.md— fill-in-the-blanks prompt template for users converting their own notebooksdocs/notebook_to_hyrax_guide.md— four before/after worked examples (data loading, training loop, hardcoded hyperparameters, Dataset base class)galaxy10_dataset.pyandvgg11.pyat the three highest-failure-risk points:super().__init__ordering,@hyrax_modeldecorator, andprepare_inputsresponsibilityTest plan
AGENTS.md+CONVERSION_PROMPT.mdhttps://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4