Add NestedFrame.split() for splitting nested columns by categorical value#473
Conversation
|
Is this PR intended as a submission for Google Summer of Code? If so, see relevant notes from our guidelines:
I also want to make you aware that we will begin to review GSOC pull requests the week of March 23, 2026. |
|
@delucchi-cmu, @Ebraam-Ashraf has already submitted the GSoC26-related PR (astronomy-commons/hats#648), which was approved. Since we have a single-PR limit for GSoC26, we should consider this PR as a normal open-source contribution. |
dougbrn
left a comment
There was a problem hiding this comment.
Thanks @Ebraam-Ashraf , this is a really nice first implementation, and I appreciate the work spent on the docstring and the unit test suite. I have one primary suggestion which I split into two comments.
Click here to view all benchmarks. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #473 +/- ##
==========================================
- Coverage 96.03% 95.96% -0.08%
==========================================
Files 20 20
Lines 2247 2278 +31
==========================================
+ Hits 2158 2186 +28
- Misses 89 92 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Updated split() to handle empty frames and ensure consistent columns when |
|
hi @delucchi-cmu thanks @hombit for clarification <3 |
Change Description
Closes #470
Solution Description
Adds
NestedFrame.split()as aNestedFramemethod rather thanNestedSeriessinceNestedFramealready handles nested column management and leads to a cleaner workflow.Parameters
nested_col: the nested column to splitby: the sub-column to split onvalues: controls which values to split on:None(default): uses all unique values found in the columnliste.g.values=['r', 'g']splits on a specific subset onlystre.g.values='rg': iterated as characters, same as['r', 'g'][]or empty string'': no new columns createddrop_by_col: ifTrue, removes the splitting sub-column from each new nested columndrop_nested: ifTrue, removes the original nested column from the resultUsage (to be documented)
Tests
Three test functions added to
test_nestedframe.py:test_split: covers all behavior: basic split, filtering correctness, allvalues=cases (None, list subset, string as chars, empty list, empty string, missing/mixed-type values),drop_by_col=True,drop_nested=True, both drop options together, and immutability of the original frametest_split_errors: coversValueErrorfor invalidnested_coland invalidbysub-columntest_split_empty_frame: ensures an emptyNestedFrameis handled correctly and returns an emptyNestedFrameCode Quality