Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality between datetime64[s] and datetime64[ns] is not consistent for coordinates vs data variables #10045

Open
5 tasks done
mjwillson opened this issue Feb 12, 2025 · 2 comments
Labels

Comments

@mjwillson
Copy link

mjwillson commented Feb 12, 2025

What happened?

When the datetime resolution of coordinates differs, this breaks equality via .equals:

datetime_ns = np.array(['2025-01-01'], dtype='datetime64[ns]')
datetime_s = np.array(['2025-01-01'], dtype='datetime64[s]')
da_ns_coords = xarray.DataArray(dims=('x',), data=[0], coords={'x': datetime_ns})
da_s_coords = xarray.DataArray(dims=('x',), data=[0], coords={'x': datetime_s})
da_ns_coords.equals(da_s_coords)
=> False

But not via .broadcast_equals:

da_ns_coords.broadcast_equals(da_s_coords)
=> True

For data variables, equality via .equals holds even when the datetime resolution differs:

da_ns = xarray.DataArray(dims=('x',), data=datetime_ns)
da_s = xarray.DataArray(dims=('x',), data=datetime_s)
da_ns.equals(da_s)
=> True

What did you expect to happen?

I expected the two values to be equal even when the dtypes are datetime64[ns] vs datetime64[s].
At a minimum, I expected the treatment of dtypes in equality to be consistent for coordinates vs data variables, and for .equals vs .broadcast_equals.

Minimal Complete Verifiable Example

See above.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Anything else we need to know?

I suspect the cause is that pandas DatetimeIndex is not .equals when the dtype differs:

index_ns = da_ns_coords.indexes['x']
index_s = da_s_coords.indexes['x']
index_ns
=> DatetimeIndex(['2025-01-01'], dtype='datetime64[ns]', name='x', freq=None)
index_s
=> DatetimeIndex(['2025-01-01'], dtype='datetime64[s]', name='x', freq=None)
index_ns.equals(index_s)
=> False

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.8 (stable, redacted, redacted) [Clang 9999.0.0 (4018317407006b2c632fbb75729de624a2426439)] python-bits: 64 OS: Linux OS-release: 6.10.11-1rodete2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.3 libnetcdf: 4.6.1

xarray: 2025.01.2
pandas: 2.2.3
numpy: 2.2.1
scipy: 1.13.1
netCDF4: 1.4.1
pydap: None
h5netcdf: 999
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.9.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 0.dev0+unknown
pip: None
conda: None
pytest: None
mypy: None
IPython: 7.34.0
sphinx: None

@mjwillson mjwillson added bug needs triage Issue that has not been reviewed by xarray team member labels Feb 12, 2025
@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Feb 13, 2025
@shoyer
Copy link
Member

shoyer commented Feb 13, 2025

Indeed, the underlying issue here seems to be that pandas.DatetimeIndex.equals() requires matching dtypes, whereas .equals() on xarray.Variable (like NumPy) does not.

This is arguably a bug upstream in pandas, because the handling of other Index types is not sensitive to dtype:

pd.Index(np.array([1, 2], dtype='float32')).equals(pd.Index(np.array([1, 2], dtype='float64')))
=> True

When the datetime resolution of coordinates differs, this breaks equality via .equals:

datetime_ns = np.array(['2025-01-01'], dtype='datetime64[ns]')
datetime_s = np.array(['2025-01-01'], dtype='datetime64[s]')
da_ns_coords = xarray.DataArray(dims=('x',), data=[0], coords={'x': datetime_ns})
da_s_coords = xarray.DataArray(dims=('x',), data=[0], coords={'x': datetime_s})
da_ns_coords.equals(da_s_coords)
=> False

If this was only coming from inequality of the associated indices, then perhaps this would be correct behavior for DataArray.

But even the associated xarray.Variable object are not equal, which is troubling indeed:

da_ns_coords.coords['x'].variable.equals(da_s_coords.coords['x'].variable)
=> False

I think this is because they use xarray.IndexVariable, which delegates equality checks to pandas.Index.equals.

But not via .broadcast_equals:

da_ns_coords.broadcast_equals(da_s_coords)
=> True

This definitely looks wrong to me. .equals() should always be true when .broadcast_equals() is true.

@spencerkclark
Copy link
Member

I agree, it is awkward that the DatetimeIndex behavior in this context does not follow the behavior of other types of indexes. These are some relevant upstream issues: pandas-dev/pandas#55694, pandas-dev/pandas#33940. Maybe we can spur some more discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants