-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
open_mfdataset, function provided to combine_attrs has confusing behavior: multiple calls? separate last group of attributes? #6679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report, @gg2
The context object will contain more information about the context of the call, for example the method that was used to combine the datasets ( I did plan to create a dedicated user guide that explains all of this, but didn't manage to find the time to actually complete that.
not sure why it's missing from |
as far as I can tell, Could you maybe try a bare |
OK. That makes sense that there could be an error. Especially because even in the wrapping act-atmos, they do some fall-through to "nested" for certain conditions on parameters. I did see the dataset's "global" attributes coming through in the first pass. I didn't pay attention if they also came through in the 2nd pass. Lastly, I did get my "simpler" combine_attrs callable working, even with the 2 passes. I think between all things (mainly, me being confused and trying to get things working and making some bad interpretations), the 2nd pass didn't necessarily throw it off. I either had some incorrect logic, or I used an operator that didn't get interpreted as expected (+= originally to combine previous + new values; vs switching to f"{previous}, {new}"). I'm not sure. But, I'll try a direct call to open_mfdataset and see what that does. This, for reference, is the function I'm now passing in. It successfully does what I wanted:
(I am happy that attribute values aren't forced to be the particular data type they might be declared as in netCDF. :D It is nice I can turn a float attribute into a string attribute.) |
I ran the following:
I do see 2 sets coming through still for len(files_list) == 3:
I didn't see any exceptions. |
The same issue comes up when passing a function for My workaround was to add an attribute to the datasets as a flag, and have the function I passed act differently if that flag was present (dataset attrs) or not (data variable attrs). Is this what the 'context' argument will do? I couldn't tell what it was supposed to do by looking at the languishing PR #5668. Being able to pass separate arguments for |
What is your issue?
I'm attempting to use the
combine_attrs
parameter on open_mfdataset with a function to generate and preserve a list of values from specific attributes on specific variables.I use a library (act-atmos) that uses xarray as a dependency.
I use act-atmos' act.io.armfiles.read_netcdf method to read in the data from a list of files.
( https://github.com/ARM-DOE/ACT/blob/main/act/io/armfiles.py )
I provide to read_netcdf the function to use for the combine_attrs parameter.
I am fairly confident that act-atmos does nothing unexpected with the list of files or parameters.
It sets combine = 'by_coords', use_cftime = True, and combine_attrs = untouched.
The above parameters are provided to open_mfdataset via **kwargs; as well as passing through the list of filenames untouched.
I see unexpected behavior in the combine_attrs function.
To test, I read in 3 netCDF files, with records starting at 6:00am day1, ending 6:00am day2.
The resultant combined data has 4 days. Really I expect 3 days of data, but since the files overlap the next day from midnight to 6:00am, it makes sense I end up with 4 days. But the last, 4th day will always have no data, because the data starts during daylight hours.
4320
['2009-01-01T06:00:00.000000000' '2009-01-01T06:01:00.000000000'
'2009-01-01T06:02:00.000000000' ... '2009-01-04T05:57:00.000000000'
'2009-01-04T05:58:00.000000000' '2009-01-04T05:59:00.000000000']
Coordinates: time (time) datetime64[ns] 2009-01-01T06:00:00 ... 2009-01-04T05:59:00
So, I expect
combine_attrs
to receive a list of 3-4 sets of attributes to iterate through all at once.Instead, it apparently gets called twice: once with a list of 3 sets of attributes, and a 2nd time with a list of 1 set of the attributes.
The last lone set does contain the combined attributes from the first call.
But given I was expecting to iterate through a single list of all attributes, I have to implement special logic to watch for that last set. If I don't, then the last set in the 2nd call obliterates the results of the 1st set of 3, and the result ends up looking similar to combine_attrs="override".
Without further experimentation, I don't know yet if this behavior will remain consistent for larger sets of files. I will experiment soon to see if it does remain consistent.
Is this behavior of
combine_attrs
expected? Why would it be set up to make 2 (or multiple) separate calls like that?Is it somehow a side-effect of the files having times that overlap days? Is it somehow because the last dataset is essentially empty? If I end up with one combined set of data, I would expect one combined list of attributes to iterate through.
If this behavior is unexpected, I will provide more specifics about the data as requested, to keep this initial post shorter.
Also, as a side note, it would be nice to improve the documentation for
combine_attrs
.The details about the parameter don't even show up here:
https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html
And the description of the callable signature in the source code could be clearer:
combine_attrs
?With experimentation I can find some of these things out (e.g. context is
None
, in my case); but it would be nice if these were clearer up front.The text was updated successfully, but these errors were encountered: