Extend `rolling_exp` to support `pd.Timedelta` objects with window `halflife` #10237

abiasiol · 2025-04-20T00:38:45Z

Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

Description

Extended rolling_exp to support pd.Timedelta objects for the window size when using window_type="halflife" along datetime dimensions, similar to pandas' ewm. This allows expressions like da.rolling_exp(time=pd.Timedelta(days=1), window_type="halflife").mean().

Implementation

Matches pandas implementation, allowing the operation only when:
- window is a pd.Timedelta object
- window_type is "halflife"
- dimension is a datetime index
- operation is mean
Take advantage of numbagg's implementation of nanmean which allows alpha to be an array
Ported over _calculate_deltas function rather than relying on pandas' private implementation

Behavior Note

One difference from pandas' behavior: when dealing with nan values and a very short timedelta, this implementation returns nan while pandas appears to carry forward the previous value. This behavior seems more appropriate to me (user can fill it later, if they need to).

Example demonstrating the difference:

times = pd.date_range("2000-01-01", freq="1D", periods=21)
da = DataArray(
    np.random.random((21, 4)),
    dims=("time", "x"),
    coords=dict(time=times),
)
da = da.where(da > 0.2)
da.to_pandas().ewm(halflife=pd.Timedelta(minutes=1), times=da.time.values).mean()
da.rolling_exp(time=pd.Timedelta(minutes=1), window_type="halflife").mean().to_pandas()

Added validation and calculation functions for halflife operations. Updated docstrings and type hints accordingly. Moved _calculate_deltas literally from pandas/window/core/ewm.py to not rely on internal pandas function.

Introduced new test cases to validate the behavior of rolling_exp when using Timedelta windows, specifically for the halflife window type. Checks for compatibility between window type, window, index, and operation. Check results match pandas.

…delta windows

welcome · 2025-04-20T00:38:49Z

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

…compatibility with pandas < 2.2.0 pandas ewm can work with non-ns resolution from >= 2.2.0. Here we just test that this PR rolling_exp can work with non-ns resolution.

max-sixty · 2025-04-21T06:36:21Z

thanks @abiasiol !

couple of quick questions:

why limit to halflife?
does it raise / handle indexes with uneven spacing?
why limit to mean?

abiasiol · 2025-04-21T17:08:39Z

thanks @abiasiol !

couple of quick questions:
* does it raise / handle indexes with uneven spacing?

Hi @max-sixty !

It works with uneven spacing (the way that Pandas does):

times = pd.date_range("2000-01-01", freq="1D", periods=21)
times_delta = pd.to_timedelta(np.random.randint(0, 12, size=len(times)), unit="h")
times = times + times_delta

da = DataArray(
    np.random.random((21, 4)),
    dims=("time", "x"),
    coords=dict(time=times, x=["a", "b", "c", "d"]),
)

np.allclose(
    da.rolling_exp(time=pd.Timedelta(hours=2), window_type="halflife").mean().values,
    da.to_pandas()
    .ewm(halflife=pd.Timedelta(hours=2), times=da.time.values)
    .mean()
    .values,
) # True

abiasiol · 2025-04-21T17:35:18Z

thanks @abiasiol !

couple of quick questions:
* why limit to halflife?
* why limit to mean?

Reading the docstring of Pandas ewm, mean() should be the only "supported" operation, so I kept it simple and followed that.

If times is provided, halflife and one of com, span or alpha may be provided.
halflife: If times is specified, a timedelta convertible unit over which an observation decays to half its value. Only applicable to mean(), and halflife value will not apply to the other functions.

But let me take another look, and I'll get back to you.

max-sixty · 2025-04-21T23:16:25Z

ah, great, it uses the numbagg feature which takes an array of alphas — happy to see that being used! I wrote it for myself but hadn't really integrated it into xarray

I don't fully understand why we're limited to halflife — all the window types are freely convertible to one another; though possibly I'm misunderstanding something. (and same thing with mean vs other ops, though am even less confident) — does pandas have a reason for this specificity?

I haven't looked in enough detail at the calcs, but assuming we're well-tested against the pandas implementation, that's sufficient

abiasiol · 2025-05-04T22:09:18Z

I saw that the type hints for alphas indicated it could be an array, which was very helpful for this PR!

Regarding the window parameters (span, com, halflife): while they are related, applying timedeltas directly to span or com feels less intuitive to me, as these parameters seem more count-based. Pandas' API behavior is a bit different, as in when times is specified, it requires halflife (for scaling time deltas) and optionally allows one of com/span/alpha. Currently, I believe our implementation in Xarray only permits specifying a single window_type parameter (If only halflife is provided with times, Pandas effectively defaults to a calculation equivalent to com=1 (equivalent to alpha=0.5, like in this PR).

Concerning the EWM operations: Pandas has some inconsistent behavior. For operations other than mean, when times and halflife are provided, Pandas seems to ignore those and defaults to using a fixed com=1 for the calculation (as in the example below). Our current implementation does apply the time-scaling correctly to these other operations as well, but this leads to results that are different from Pandas' output for those specific operations (e.g., std, var).

Simple example (with sum, or std, ...) on equally spaced data:

import xarray as xr
import pandas as pd

n = 20
times = pd.date_range("2020-01-01", periods=n, freq="1D")
da = xr.DataArray(np.arange(n), dims="time", coords={"time": times})
df = da.to_pandas()

pandas_1 = df.ewm(halflife=pd.Timedelta(days=1), times=df.index).sum()
pandas_2 = df.ewm(halflife=pd.Timedelta(days=2), times=df.index).sum()
pandas_com = df.ewm(com=1).sum()

print(np.allclose(pandas_1.values, pandas_com.values)) # True
print(np.allclose(pandas_2.values, pandas_com.values)) # True
print(np.allclose(pandas_1.values, pandas_2.values)) # True

# to do this, you need to comment out the operation kill-switch in this PR
xr_1 = da.rolling_exp(time=pd.Timedelta(days=1), window_type="halflife").sum().to_pandas()
xr_2 = da.rolling_exp(time=pd.Timedelta(days=2), window_type="halflife").sum().to_pandas()

print(np.allclose(xr_1.values, pandas_1.values)) # True
print(np.allclose(xr_2.values, pandas_2.values)) # False

I find the pandas.ewm API and its behavior in these cases somewhat confusing. My goal in this PR was to avoid that ambiguity by initially enabling the time-aware calculations (using halflife with time axes) only for the mean operation where the behavior is well-defined and consistent with Pandas. I would like to avoid the potentially confusing behavior for the other operations where parameters might be ignored.

We could enable other operations with time-aware calculations later. However, we would need to validate their results from scratch, and highlight that results will be different from current Pandas versions for those operations.

What are your thoughts on proceeding with this more limited implementation for now? We can expand the functionality later based on user requests for specific, currently missing operations. This approach feels safer for a first contribution by limiting the initial scope.

abiasiol and others added 4 commits April 19, 2025 16:34

doc: update computation guide to include rolling_exp support for Time…

e45eee4

…delta windows

doc: update what's new

8ec2b3c

github-actions bot added topic-documentation topic-rolling labels Apr 20, 2025

abiasiol and others added 3 commits April 20, 2025 11:20

Merge branch 'main' into timedelta_rolling_exp

f7deb3c

test: adjust rolling_exp test to avoid re-assigning pandas_array for …

e2c41e0

…compatibility with pandas < 2.2.0 pandas ewm can work with non-ns resolution from >= 2.2.0. Here we just test that this PR rolling_exp can work with non-ns resolution.

Merge remote changes into local branch

5ae7e9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend `rolling_exp` to support `pd.Timedelta` objects with window `halflife` #10237

Extend `rolling_exp` to support `pd.Timedelta` objects with window `halflife` #10237

abiasiol commented Apr 20, 2025

welcome bot commented Apr 20, 2025

max-sixty commented Apr 21, 2025

abiasiol commented Apr 21, 2025

abiasiol commented Apr 21, 2025

max-sixty commented Apr 21, 2025

abiasiol commented May 4, 2025

Extend rolling_exp to support pd.Timedelta objects with window halflife #10237

Are you sure you want to change the base?

Extend rolling_exp to support pd.Timedelta objects with window halflife #10237

Conversation

abiasiol commented Apr 20, 2025

Description

Implementation

Behavior Note

welcome bot commented Apr 20, 2025

max-sixty commented Apr 21, 2025

abiasiol commented Apr 21, 2025

abiasiol commented Apr 21, 2025

max-sixty commented Apr 21, 2025

abiasiol commented May 4, 2025

Extend `rolling_exp` to support `pd.Timedelta` objects with window `halflife` #10237

Extend `rolling_exp` to support `pd.Timedelta` objects with window `halflife` #10237