Conversation
|
I think this is ready for review. I added support for key-values in the metadata of Parquet file and updated the docs. |
There was a problem hiding this comment.
Pull request overview
This PR introduces Parquet support to the sdss-datamodel generation/design pipeline by adding a Parquet filetype implementation, YAML/markdown rendering support, changelog support, test fixtures/data, and documentation updates.
Changes:
- Add Parquet filetype support for stub generation/design (Polars-based) and Parquet-specific changelog computation.
- Extend tests and test data to include
.parquetalongside existing FITS/PAR/HDF5 coverage. - Update docs, API references, workflows, and packaging extras to document/install Parquet support.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
python/datamodel/generate/filetypes/parquet.py |
New generator filetype for reading/designing Parquet-backed dataframe metadata. |
python/datamodel/models/filetypes/parquet.py |
New Pydantic models for Parquet dataframe columns/metadata and changelog fields. |
python/datamodel/templates/md/parquet.md |
New markdown template for rendering Parquet dataframe content. |
python/datamodel/generate/changelog/filetypes/parquet.py |
New YAML changelog diff support for Parquet columns. |
python/datamodel/models/yaml.py |
Wire Parquet change/model types into the YAML schema. |
python/datamodel/generate/datamodel.py |
Add design_parquet() API and general refactor/tidy. |
python/datamodel/generate/filetypes/__init__.py |
Export Parquet generator filetype. |
python/datamodel/models/filetypes/__init__.py |
Export Parquet model types. |
python/datamodel/generate/changelog/filetypes/__init__.py |
Export Parquet changelog filetype. |
python/datamodel/generate/changelog/yaml.py |
Minor signature default adjustment / import ordering. |
tests/conftest.py |
Add Parquet test file creation and include parquet in parametrized suffix list. |
tests/generate/test_datamodel.py |
Include Parquet in filetype suffix map used by generation/validation tests. |
tests/data/test_valid_parquet.yaml |
New “known-good” YAML fixture for Parquet validation tests. |
setup.cfg |
Add parquet extra (Polars) and adjust pytest coverage reporting options. |
docs/sphinx/index.rst |
Document Parquet dependency/extras in supported filetypes table. |
docs/sphinx/generate.rst |
Add Parquet section and example YAML/docs. |
docs/sphinx/examples_generate.rst |
Add a Parquet generation example. |
docs/sphinx/design.rst |
Add Parquet design workflow documentation. |
docs/sphinx/api.rst |
Include Parquet filetype module in API docs. |
CHANGELOG.rst |
Note Parquet support addition. |
.github/workflows/sphinx.yml |
Install system deps for docs build (HDF5 dev libs). |
.github/workflows/build.yml |
Install system deps for CI build (HDF5 dev libs). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
havok2063
left a comment
There was a problem hiding this comment.
This looks good to me.
As you are the first to add a new format besides myself, did you find the docs for adding new file formats, https://sdss.github.io/datamodel/adding_files.html, easy to understand? Is there anything you would change about it or the process?
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Brian Cherinka <havok2063@hotmail.com>
|
The docs were fairly easy to follow. I think there are a few things out of date, like the instructions about adding the file type to class DataModel:
supported_filetypes = ['.fits', '.par', '.h5']which now is computed automatically from imports, but those were fairly easy to track. |
|
Looks good. |
Support generating models from Parquet files, including designing and changelogs.