Skip to content

Support Parquet files#206

Merged
albireox merged 20 commits intomainfrom
albireox-parquet-filetype
Mar 16, 2026
Merged

Support Parquet files#206
albireox merged 20 commits intomainfrom
albireox-parquet-filetype

Conversation

@albireox
Copy link
Copy Markdown
Member

@albireox albireox commented Mar 6, 2026

Support generating models from Parquet files, including designing and changelogs.

@albireox albireox marked this pull request as ready for review March 6, 2026 21:14
@albireox
Copy link
Copy Markdown
Member Author

albireox commented Mar 7, 2026

I think this is ready for review. I added support for key-values in the metadata of Parquet file and updated the docs.

@albireox albireox requested a review from havok2063 March 7, 2026 03:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Parquet support to the sdss-datamodel generation/design pipeline by adding a Parquet filetype implementation, YAML/markdown rendering support, changelog support, test fixtures/data, and documentation updates.

Changes:

  • Add Parquet filetype support for stub generation/design (Polars-based) and Parquet-specific changelog computation.
  • Extend tests and test data to include .parquet alongside existing FITS/PAR/HDF5 coverage.
  • Update docs, API references, workflows, and packaging extras to document/install Parquet support.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
python/datamodel/generate/filetypes/parquet.py New generator filetype for reading/designing Parquet-backed dataframe metadata.
python/datamodel/models/filetypes/parquet.py New Pydantic models for Parquet dataframe columns/metadata and changelog fields.
python/datamodel/templates/md/parquet.md New markdown template for rendering Parquet dataframe content.
python/datamodel/generate/changelog/filetypes/parquet.py New YAML changelog diff support for Parquet columns.
python/datamodel/models/yaml.py Wire Parquet change/model types into the YAML schema.
python/datamodel/generate/datamodel.py Add design_parquet() API and general refactor/tidy.
python/datamodel/generate/filetypes/__init__.py Export Parquet generator filetype.
python/datamodel/models/filetypes/__init__.py Export Parquet model types.
python/datamodel/generate/changelog/filetypes/__init__.py Export Parquet changelog filetype.
python/datamodel/generate/changelog/yaml.py Minor signature default adjustment / import ordering.
tests/conftest.py Add Parquet test file creation and include parquet in parametrized suffix list.
tests/generate/test_datamodel.py Include Parquet in filetype suffix map used by generation/validation tests.
tests/data/test_valid_parquet.yaml New “known-good” YAML fixture for Parquet validation tests.
setup.cfg Add parquet extra (Polars) and adjust pytest coverage reporting options.
docs/sphinx/index.rst Document Parquet dependency/extras in supported filetypes table.
docs/sphinx/generate.rst Add Parquet section and example YAML/docs.
docs/sphinx/examples_generate.rst Add a Parquet generation example.
docs/sphinx/design.rst Add Parquet design workflow documentation.
docs/sphinx/api.rst Include Parquet filetype module in API docs.
CHANGELOG.rst Note Parquet support addition.
.github/workflows/sphinx.yml Install system deps for docs build (HDF5 dev libs).
.github/workflows/build.yml Install system deps for CI build (HDF5 dev libs).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/datamodel/models/filetypes/parquet.py
Comment thread python/datamodel/generate/datamodel.py
Comment thread docs/sphinx/examples_generate.rst Outdated
Comment thread python/datamodel/generate/changelog/filetypes/parquet.py
Comment thread python/datamodel/generate/datamodel.py
Comment thread python/datamodel/generate/datamodel.py
Comment thread python/datamodel/models/yaml.py
Comment thread python/datamodel/models/yaml.py
Comment thread setup.cfg
Comment thread python/datamodel/generate/datamodel.py
Copy link
Copy Markdown
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

As you are the first to add a new format besides myself, did you find the docs for adding new file formats, https://sdss.github.io/datamodel/adding_files.html, easy to understand? Is there anything you would change about it or the process?

Comment thread docs/sphinx/examples_generate.rst Outdated
Comment thread python/datamodel/generate/changelog/filetypes/parquet.py
Comment thread python/datamodel/generate/filetypes/parquet.py Outdated
Comment thread python/datamodel/generate/filetypes/parquet.py
albireox and others added 6 commits March 16, 2026 10:37
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Brian Cherinka <havok2063@hotmail.com>
@albireox albireox requested a review from havok2063 March 16, 2026 17:00
@albireox
Copy link
Copy Markdown
Member Author

The docs were fairly easy to follow. I think there are a few things out of date, like the instructions about adding the file type to

class DataModel:
    supported_filetypes = ['.fits', '.par', '.h5']

which now is computed automatically from imports, but those were fairly easy to track.

@havok2063
Copy link
Copy Markdown
Contributor

Looks good.

@albireox albireox merged commit 79df7b5 into main Mar 16, 2026
10 of 12 checks passed
@albireox albireox deleted the albireox-parquet-filetype branch March 16, 2026 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants