Applying GroupBy.map along chunks instead of dimensions #8076
Unanswered
sadsimulation
asked this question in
Q&A
Replies: 1 comment
-
|
If Can you create a small example to show how your groups are patterned? It seems like each group is sequential? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm processing larger than memory datasets using dask-backed xarrays.
However I often need to perform indexing using
dataset1.where(array2, drop=True).The intermediate computations to get
dataset1andarray2are fairly computationally intensive and make use of thexarrayindexing and dimension name features, so I feel likexr.apply_ufuncwithdask='allowed'wouldn't be a good fit here. Unfortunately the masking causes the output sizes of the outputxr.Datasetdimensions to change based on the data (peak detection), soxr.map_blocksis difficult to apply because I don't know the output template shape.A workaround that I have used to get things working at all is to use
dataset.groupby('mydim').map(myfunc)on adatasetthat is not backed by dask arrays. This is not great because the groups alongmydimvary in size and don't fit a simple uniform chunking along that dimension and predictably only utilizes a single core as the groups are processed sequentially.Is there an easy way to do something like
dataset.chunk(mydim=100).groupby('mydim').map(myfunc)that would utilize my machines CPU better?Beta Was this translation helpful? Give feedback.
All reactions