Skip to content

Use flox for median in groupby/resample operations#11239

Open
sdiebolt wants to merge 2 commits intopydata:mainfrom
sdiebolt:median-flox
Open

Use flox for median in groupby/resample operations#11239
sdiebolt wants to merge 2 commits intopydata:mainfrom
sdiebolt:median-flox

Conversation

@sdiebolt
Copy link

Add flox support to median() methods in:

  • DataArrayGroupByAggregations
  • DatasetGroupByAggregations
  • DataArrayResampleAggregations
  • DatasetResampleAggregations

This aligns the implementation with the documentation which already claimed flox was used when available. The fix provides significant performance improvements when flox can process the data. A fallback to the non-flox implementation is included for cases where flox's median aggregation requires blockwise processing but the data chunking doesn't support it.

Closes #11238

  • User visible changes (including notable bug fixes) are documented in whats-new.rst

Add flox support to median() methods in:
- DataArrayGroupByAggregations
- DatasetGroupByAggregations
- DataArrayResampleAggregations
- DatasetResampleAggregations

This aligns the implementation with the documentation which already
claimed flox was used when available. The fix provides significant
performance improvements when flox can process the data. A fallback to
the non-flox implementation is included for cases where flox's median
aggregation requires blockwise processing but the data chunking doesn't
support it.

Closes pydata#11238
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

Let's raise FutureWarning here asking the user to shuffle_to_chunks or rechunk appropriately and that this behaviour will be deprecated in 6 months (please open an issue for that). Now that I think about it; we can do this in flox automatically: xarray-contrib/flox#501

The reason I hadn't done this so far is that dask will do some auto-rechunking to make it work; and so some workloads will break.

Also this file is generated by generate_aggregations.py. Please make the edit there instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use flox for median-aggregation in groupbys

2 participants