Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simultaneously read multiple Datasets into an Xarray-Beam pipeline #68

Open
shoyer opened this issue Dec 7, 2022 · 2 comments
Open

Comments

@shoyer
Copy link
Member

shoyer commented Dec 7, 2022

It is relatively common to need to load multiple xarray.Dataset objects, e.g., to compare two different models.

This currently can be done by loading data with separate calls to xbeam.DatasetToChunks, and by joining together the result with beam.CoGroupBykey. This works but is rather inefficient, involving an extra write of the data to disk. Ideally we could load the data in a single beam transform instead, e.g., xbeam.DatasetToChunks([ds1, ds2], chunks) would return a PCollection with elements of type tuple[xbeam.Key, tuple[xarray.Dataset, xarray.Dataset]].

CC @alxmrs

@alxmrs
Copy link
Contributor

alxmrs commented Dec 15, 2022

I'm looking into this now.

I have a design question, though: What is the best PCollection interface? tuple[xbeam.Key, tuple[xarray.Dataset, xarray.Dataset]] or tuple[xbeam.Key, xarray.Dataset, xarray.Dataset]? I have a slight preference for the latter (and, this implementation would not be so bad). This version also seems fairly natural for operations like beam.MapTuple() (can handle n-ary tuples) and beam.GroupByKey() (will just use the first value in the tuple). It feels more zen to me, too. :)

WDYT?

@alxmrs
Copy link
Contributor

alxmrs commented Dec 15, 2022

A small update -- the former does seem to offer a better typing story (python/typing#180), I now am leaning that way.

alxmrs added a commit to alxmrs/xarray-beam that referenced this issue Dec 15, 2022
alxmrs added a commit to alxmrs/xarray-beam that referenced this issue Dec 22, 2022
alxmrs added a commit to alxmrs/xarray-beam that referenced this issue Jan 25, 2023
alxmrs added a commit to alxmrs/xarray-beam that referenced this issue Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants