-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simultaneously read multiple Datasets into an Xarray-Beam pipeline #68
Comments
I'm looking into this now. I have a design question, though: What is the best PCollection interface? WDYT? |
A small update -- the former does seem to offer a better typing story (python/typing#180), I now am leaning that way. |
Here is an initial implementation of google#68.
Here is an initial implementation of google#68.
Here is an initial implementation of google#68.
Here is an initial implementation of google#68.
It is relatively common to need to load multiple xarray.Dataset objects, e.g., to compare two different models.
This currently can be done by loading data with separate calls to
xbeam.DatasetToChunks
, and by joining together the result withbeam.CoGroupBykey
. This works but is rather inefficient, involving an extra write of the data to disk. Ideally we could load the data in a single beam transform instead, e.g.,xbeam.DatasetToChunks([ds1, ds2], chunks)
would return a PCollection with elements of typetuple[xbeam.Key, tuple[xarray.Dataset, xarray.Dataset]]
.CC @alxmrs
The text was updated successfully, but these errors were encountered: