Taking the diagonal of each 2-D array along a third dimension? Current approach uses deprecated groupby-squeeze #9173
-
I have a DataArray of three dimensions - The approach I came up with involves doing >>> fits
<xarray.Dataset> Size: 488B
Dimensions: (location_id: 4, parameter: 3, cov_i: 3, cov_j: 3)
Coordinates:
* parameter (parameter) <U2 24B 'b0' 'b1' 'b2'
* cov_i (cov_i) <U2 24B 'b0' 'b1' 'b2'
* cov_j (cov_j) <U2 24B 'b0' 'b1' 'b2'
* location_id (location_id) int64 32B 3068918 3071502 ... 3114697
Data variables:
curvefit_coefficients (location_id, parameter) float64 96B -7.477 ... -0...
curvefit_covariance (location_id, cov_i, cov_j) float64 288B 3.793e-05...
>>> fits.curvefit_covariance
<xarray.DataArray 'curvefit_covariance' (location_id: 4, cov_i: 3, cov_j: 3)> Size: 288B
array([[[ 3.79298999e-05, -1.87955006e-06, 6.07566477e-10],
[-1.87955006e-06, 9.85906372e-07, -4.07836737e-08],
[ 6.07566477e-10, -4.07836737e-08, 2.00518116e-09]],
[[ 3.82335861e-05, -1.81900026e-06, -3.32938668e-09],
[-1.81900026e-06, 9.83693989e-07, -4.09497824e-08],
[-3.32938668e-09, -4.09497824e-08, 2.03417609e-09]],
[[ 3.56544048e-05, -2.02137272e-06, 1.32904069e-08],
[-2.02137272e-06, 9.45842659e-07, -3.82653117e-08],
[ 1.32904069e-08, -3.82653117e-08, 1.82177935e-09]],
[[ 2.89862030e-05, -1.30506244e-06, -6.21364362e-09],
[-1.30506244e-06, 7.39687867e-07, -3.09647204e-08],
[-6.21364362e-09, -3.09647204e-08, 1.55403796e-09]]])
Coordinates:
* cov_i (cov_i) <U2 24B 'b0' 'b1' 'b2'
* cov_j (cov_j) <U2 24B 'b0' 'b1' 'b2'
* location_id (location_id) int64 32B 3068918 3071502 3113100 3114697
>>> cov_to_std = lambda cov_arr: np.sqrt(np.diag(cov_arr))
>>> fits["curvefit_std_dev"] = xr.apply_ufunc(cov_to_std, fits.curvefit_covariance.groupby("location_id"), input_core_dims=[["cov_i", "cov_j"]], output_core_dims=[["parameter"]])
<stdin>:1: UserWarning: The `squeeze` kwarg to GroupBy is being removed.Pass .groupby(..., squeeze=False) to disable squeezing, which is the new default, and to silence this warning.
>>> fits.curvefit_std_dev
<xarray.DataArray 'curvefit_std_dev' (location_id: 4, parameter: 3)> Size: 96B
array([[6.15872551e-03, 9.92928181e-04, 4.47792492e-05],
[6.18333131e-03, 9.91813485e-04, 4.51018413e-05],
[5.97113095e-03, 9.72544425e-04, 4.26823073e-05],
[5.38388364e-03, 8.60051084e-04, 3.94212881e-05]])
Coordinates:
* parameter (parameter) <U2 24B 'b0' 'b1' 'b2'
* location_id (location_id) int64 32B 3068918 3071502 3113100 3114697 This approach works great, except for the squeeze warning. From the looks of things, squeezing in >>> cov_to_std = lambda cov_arr: np.sqrt(np.diag(cov_arr.squeeze()))
>>> fits["curvefit_std_dev"] = xr.apply_ufunc(cov_to_std, fits.curvefit_covariance.groupby("location_id", squeeze=False), input_core_dims=[["cov_i", "cov_j"]], output_core_dims=[["parameter"]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/charriso/micromamba/envs/ascat_env/lib/python3.9/site-packages/xarray/core/computation.py", line 1254, in apply_ufunc
return apply_groupby_func(this_apply, *args)
File "/home/charriso/micromamba/envs/ascat_env/lib/python3.9/site-packages/xarray/core/computation.py", line 616, in apply_groupby_func
applied_example, applied = peek_at(applied)
File "/home/charriso/micromamba/envs/ascat_env/lib/python3.9/site-packages/xarray/core/utils.py", line 205, in peek_at
peek = next(gen)
File "/home/charriso/micromamba/envs/ascat_env/lib/python3.9/site-packages/xarray/core/computation.py", line 615, in <genexpr>
applied: Iterator = (func(*zipped_args) for zipped_args in zip(*iterators))
File "/home/charriso/micromamba/envs/ascat_env/lib/python3.9/site-packages/xarray/core/computation.py", line 1270, in apply_ufunc
return apply_dataarray_vfunc(
File "/home/charriso/micromamba/envs/ascat_env/lib/python3.9/site-packages/xarray/core/computation.py", line 316, in apply_dataarray_vfunc
result_var = func(*data_vars)
File "/home/charriso/micromamba/envs/ascat_env/lib/python3.9/site-packages/xarray/core/computation.py", line 850, in apply_variable_ufunc
raise ValueError(
ValueError: applied function returned data with an unexpected number of dimensions. Received 1 dimension(s) but expected 2 dimensions with names ('location_id', 'parameter'), from:
array([6.158726e-03, 9.929282e-04, 4.477925e-05]) No actually, because I have one way around this that works but I don't like it as much because I have to build another DataArray and re-add the coordinates for >>> def cov_to_std(cov_arr):
... stdevs = np.sqrt(np.diag(cov_arr.squeeze().data))
... return xr.DataArray(
... stdevs,
... dims=["parameter"],
... coords={"parameter": ["b0", "b1", "b2"]},
... name="curvefit_std_dev",
... )
...
>>> fits["curvefit_std_dev"] = fits.curvefit_covariance.groupby("location_id", squeeze=False).apply(cov_to_std)
>>> fits.curvefit_std_dev
<xarray.DataArray 'curvefit_std_dev' (location_id: 4, parameter: 3)> Size: 96B
array([[6.15872551e-03, 9.92928181e-04, 4.47792492e-05],
[6.18333131e-03, 9.91813485e-04, 4.51018413e-05],
[5.97113095e-03, 9.72544425e-04, 4.26823073e-05],
[5.38388364e-03, 8.60051084e-04, 3.94212881e-05]])
Coordinates:
* parameter (parameter) <U2 24B 'b0' 'b1' 'b2'
* location_id (location_id) int64 32B 3068918 3071502 3113100 3114697 It would be great if there was some other relatively "automatic" approach to this problem that avoids the need for squeezing. I feel like it should be relatively simple but I've been at it for some hours now. I'm probably missing something. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
If I understand correctly, the groupby is supposed to work around the fact that def extract_variances(arr, dims=None, new_dim="cov"):
if dims is None:
dims = ["cov_i", "cov_j"]
cov_size = min(size for name, size in arr.sizes.items() if name in dims)
return xr.apply_ufunc(
np.diag,
arr,
input_core_dims=[dims],
output_core_dims=[[new_dim]],
vectorize=True,
dask="parallelized",
dask_gufunc_kwargs={"output_sizes": {"cov": cov_size}},
)
arr = xr.DataArray([[[1, 0], [0, 1]], [[4, 0], [0, 4]]], dims=["dim0", "cov_i", "cov_j"])
np.sqrt(extract_variances(arr))
np.sqrt(extract_variances(arr.chunk({"dim0": 1}))).compute() The magic here is |
Beta Was this translation helpful? Give feedback.
If I understand correctly, the groupby is supposed to work around the fact that
diag
doesn't take anaxis
parameter. If so, try this: