Skip to content

Add agent instruction files for notebook-to-Hyrax conversion#8

Open
drewoldag wants to merge 4 commits intomainfrom
claude/add-agent-instructions-0DgUK
Open

Add agent instruction files for notebook-to-Hyrax conversion#8
drewoldag wants to merge 4 commits intomainfrom
claude/add-agent-instructions-0DgUK

Conversation

@drewoldag
Copy link
Copy Markdown
Collaborator

Summary

  • Adds AGENTS.md — primary agent instruction file covering the three-file output layout, HyraxDataset/@hyrax_model requirements, the prepare_inputs/forward/train_batch data-flow pattern, TOML namespacing rules, and an explicit anti-pattern table
  • Adds HYRAX_CONTRACTS.md — typed interface reference (signatures, return types, data_dict shape) for agents and advanced users
  • Adds CONVERSION_PROMPT.md — fill-in-the-blanks prompt template for users converting their own notebooks
  • Adds docs/notebook_to_hyrax_guide.md — four before/after worked examples (data loading, training loop, hardcoded hyperparameters, Dataset base class)
  • Adds machine-targeted inline comments in galaxy10_dataset.py and vgg11.py at the three highest-failure-risk points: super().__init__ ordering, @hyrax_model decorator, and prepare_inputs responsibility

Test plan

  • Verify all four new Markdown files render correctly on GitHub
  • Check that inline comments in source files are clear and correctly placed
  • Optionally: test an agent conversion using AGENTS.md + CONVERSION_PROMPT.md

https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4

Adds AGENTS.md (primary agent instruction file covering the three-file output
layout, HyraxDataset/hyrax_model requirements, prepare_inputs pattern, TOML
namespacing, and anti-patterns), HYRAX_CONTRACTS.md (typed interface reference),
CONVERSION_PROMPT.md (fill-in-the-blanks user prompt template), and
docs/notebook_to_hyrax_guide.md (four before/after worked examples).

Also adds machine-targeted inline comments in galaxy10_dataset.py and vgg11.py
at the three most failure-prone points: super().__init__ ordering, @hyrax_model
decorator, and prepare_inputs responsibility.

https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4
Copilot AI review requested due to automatic review settings April 2, 2026 00:41
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.91%. Comparing base (9ed73b5) to head (cb0c0dc).

Additional details and impacted files
@@             Coverage Diff             @@
##             main       #8       +/-   ##
===========================================
+ Coverage   25.00%   95.91%   +70.91%     
===========================================
  Files           2        3        +1     
  Lines          72       98       +26     
===========================================
+ Hits           18       94       +76     
+ Misses         54        4       -50     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a set of agent- and user-facing documentation files to standardize “notebook → Hyrax” conversions, plus inline “HYRAX REQUIREMENT” comments in the existing example dataset/model at common failure points.

Changes:

  • Added agent instruction/spec files: AGENTS.md, HYRAX_CONTRACTS.md, and CONVERSION_PROMPT.md.
  • Added a worked conversion guide with before/after examples: docs/notebook_to_hyrax_guide.md.
  • Added inline requirement comments in the canonical example dataset/model to reinforce Hyrax integration constraints.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/external_hyrax_example/models/vgg11.py Adds inline comments emphasizing @hyrax_model and prepare_inputs responsibilities.
src/external_hyrax_example/datasets/galaxy10_dataset.py Adds inline comment emphasizing super().__init__(config) ordering in dataset init.
AGENTS.md New primary agent conversion instruction document (three-file layout, data flow, config rules, anti-patterns).
HYRAX_CONTRACTS.md New typed contracts/spec reference for dataset/model interfaces and config namespacing.
CONVERSION_PROMPT.md New fill-in conversion prompt template for users.
docs/notebook_to_hyrax_guide.md New documentation with four notebook-to-Hyrax mapping examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +171 to +186
[<package_name>.<DatasetClassName>]
channels_first = false

# libpath to specify in runtime config when using this dataset
# name = "<package_name>.datasets.<module>.<ClassName>"
```

Key rules:
- The outer section `[<package_name>]` must exist even if empty.
- Section names use the **exact Python class name** (e.g. `VGG11`, not `vgg11`).
- The commented-out `name = "..."` lines are **not active config**. They are
documentation showing the libpath a user would paste into their runtime config
to activate this class. Do not uncomment them in the default config.
- Access config values in code as:
`config["<package_name>"]["<ClassName>"]["<key>"]`
- Do not use flat keys like `config["key"]` or `self.config["dropout"]`.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config namespacing rules here require using the exact Python class name for sections/lookup (e.g. config["<package_name>"]["<ClassName>"]). However, the repository’s canonical dataset example referenced below uses config["external_hyrax_example"]["galaxy10_dataset"]["channels_first"] and the TOML section [external_hyrax_example.galaxy10_dataset] (module name, not class name). Please reconcile these so agents don’t generate configs that disagree with the in-repo example—either update the example to use the class name (e.g. Galaxy10Dataset) or relax/clarify the rule (e.g. “use a consistent section key; this repo uses class names for models and module names for datasets”).

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +28
- `config`: The full Hyrax config dict. Access your values as
`config["<package_name>"]["<ClassName>"]["<key>"]`.
- `data_location`: Path to the directory (or file) containing the dataset.
Typically convert with `Path(data_location)` immediately.
- **`super().__init__(config)` must be the last statement** in `__init__`.
The base class introspects the subclass after all attributes are set.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset config access pattern specified here (config["<package_name>"]["<ClassName>"]["<key>"]) conflicts with the repo’s canonical dataset example (src/external_hyrax_example/datasets/galaxy10_dataset.py) which reads config["external_hyrax_example"]["galaxy10_dataset"]["channels_first"] and has a TOML section [external_hyrax_example.galaxy10_dataset]. Please update this contract doc to match the actual conventions used in this repository, or update the example code/config so they conform to the contract.

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +236
## Config namespacing

All config keys are accessed via three-level namespacing:

```python
config["<package_name>"]["<ClassName>"]["<key>"]
```

Example for a package `my_survey_cnn` with a model class `ResNet18`:

```python
dropout = config["my_survey_cnn"]["ResNet18"]["dropout"]
num_classes = config["my_survey_cnn"]["ResNet18"]["num_classes"]
```

The corresponding TOML section:

```toml
[my_survey_cnn.ResNet18]
dropout = 0.3
num_classes = 5
```

Never access config with flat keys (`config["dropout"]`). The flat access pattern
will raise a `KeyError` at runtime because Hyrax always nests config under the
package and class names.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Config namespacing” section states that all config keys are nested under the package and class name. In this repository’s canonical dataset example, the dataset config is nested under the module key galaxy10_dataset (see src/external_hyrax_example/default_config.toml and src/external_hyrax_example/datasets/galaxy10_dataset.py), not the dataset class name. Please clarify the rule (or update the canonical example) so agents don’t generate configs that won’t work with the example patterns.

Copilot uses AI. Check for mistakes.
AGENTS.md Outdated

Return images from `get_image` as `np.float32` normalized to `[0, 1]`.
If `channels_first` is configured, return shape `(C, H, W)`; otherwise `(H, W, C)`.
Check `config["<package_name>"]["<dataset_class_name>"]["channels_first"]`.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

channels_first lookup guidance uses config["<package_name>"]["<dataset_class_name>"], but the in-repo dataset example uses config["external_hyrax_example"]["galaxy10_dataset"] (module key). Please align this guidance with the canonical example or explicitly document the expected section key for datasets.

Suggested change
Check `config["<package_name>"]["<dataset_class_name>"]["channels_first"]`.
Check `config["<module_name>"]["<dataset_name>"]["channels_first"]`, e.g. `config["external_hyrax_example"]["galaxy10_dataset"]["channels_first"]`.

Copilot uses AI. Check for mistakes.
claude added 3 commits April 2, 2026 15:04
Adds test_dataset_contracts.py (12 tests) and test_model_contracts.py (17 tests)
that verify the HyraxDataset and @hyrax_model interface contracts without
requiring the real Galaxy10.h5 file. Dataset tests use a mocked h5py.File;
model tests use synthetic random tensors. Both files are structured as templates
that users can adapt when verifying their own converted classes.

https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4
Reverting the addition of test_dataset_contracts.py and test_model_contracts.py
to reconsider the approach — tests were too specific to Galaxy10Dataset/VGG11
rather than being general enough to apply to any converted class.

https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4
- Add tests/external_hyrax_example/hyrax_contract_helpers/ with two abstract
  pytest mixin classes: HyraxDatasetContractTests and HyraxModelContractTests.
  These are copy-paste templates users and agents can drop into any project to
  verify a HyraxDataset subclass or @hyrax_model class satisfies the interface.

- HyraxDatasetContractTests: configurable via getter_names (list of getter
  method names to verify) and n_samples. Tests inheritance, __len__,
  get_object_id uniqueness, and that each getter returns numpy-based data.

- HyraxModelContractTests: configurable via has_validate_batch / has_test_batch.
  Tests prepare_inputs is a staticmethod, forward/train_batch/infer_batch
  return expected types, and that all scalar values in *_batch dicts are finite.

- Add worked examples: test_galaxy10_contracts.py and test_vgg11_contracts.py.

- Update AGENTS.md: replace image-centric "Required methods" table and
  "Image format" section with field-derived getter guidance and a general
  "Data format" section. Add a "Verifying your conversion" section with
  copy-paste setup instructions for the contract test helpers.

- Update CONVERSION_PROMPT.md: generalize the data-fields section to
  encourage listing all per-sample fields, and add a verification step.

https://claude.ai/code/session_01AheUY6fsJYKynV5zY3w3t4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants