Description
What is your issue?
I'm attempting to use the combine_attrs
parameter on open_mfdataset with a function to generate and preserve a list of values from specific attributes on specific variables.
I use a library (act-atmos) that uses xarray as a dependency.
I use act-atmos' act.io.armfiles.read_netcdf method to read in the data from a list of files.
( https://github.com/ARM-DOE/ACT/blob/main/act/io/armfiles.py )
I provide to read_netcdf the function to use for the combine_attrs parameter.
I am fairly confident that act-atmos does nothing unexpected with the list of files or parameters.
It sets combine = 'by_coords', use_cftime = True, and combine_attrs = untouched.
The above parameters are provided to open_mfdataset via **kwargs; as well as passing through the list of filenames untouched.
I see unexpected behavior in the combine_attrs function.
To test, I read in 3 netCDF files, with records starting at 6:00am day1, ending 6:00am day2.
The resultant combined data has 4 days. Really I expect 3 days of data, but since the files overlap the next day from midnight to 6:00am, it makes sense I end up with 4 days. But the last, 4th day will always have no data, because the data starts during daylight hours.
# Using the combined xarray data returned by open_mfdataset:
len(xarray_data['time'].values) # 1440 * 3
xarray_data['time'].values)
xarray_data['time'].coords
4320
['2009-01-01T06:00:00.000000000' '2009-01-01T06:01:00.000000000'
'2009-01-01T06:02:00.000000000' ... '2009-01-04T05:57:00.000000000'
'2009-01-04T05:58:00.000000000' '2009-01-04T05:59:00.000000000']
Coordinates: time (time) datetime64[ns] 2009-01-01T06:00:00 ... 2009-01-04T05:59:00
So, I expect combine_attrs
to receive a list of 3-4 sets of attributes to iterate through all at once.
Instead, it apparently gets called twice: once with a list of 3 sets of attributes, and a 2nd time with a list of 1 set of the attributes.
The last lone set does contain the combined attributes from the first call.
But given I was expecting to iterate through a single list of all attributes, I have to implement special logic to watch for that last set. If I don't, then the last set in the 2nd call obliterates the results of the 1st set of 3, and the result ends up looking similar to combine_attrs="override".
Without further experimentation, I don't know yet if this behavior will remain consistent for larger sets of files. I will experiment soon to see if it does remain consistent.
Is this behavior of combine_attrs
expected? Why would it be set up to make 2 (or multiple) separate calls like that?
Is it somehow a side-effect of the files having times that overlap days? Is it somehow because the last dataset is essentially empty? If I end up with one combined set of data, I would expect one combined list of attributes to iterate through.
If this behavior is unexpected, I will provide more specifics about the data as requested, to keep this initial post shorter.
Also, as a side note, it would be nice to improve the documentation for combine_attrs
.
The details about the parameter don't even show up here:
https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html
And the description of the callable signature in the source code could be clearer:
- What is a "sequence of" attrs dicts? Is "sequence of" the reason I'm seeing multiple calls to
combine_attrs
? - What might the context object possibly contain? When or why might it vary?
With experimentation I can find some of these things out (e.g. context is None
, in my case); but it would be nice if these were clearer up front.