Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation_df + n_lags in fit #1684

Open
ekpog200 opened this issue Mar 19, 2025 · 0 comments
Open

validation_df + n_lags in fit #1684

ekpog200 opened this issue Mar 19, 2025 · 0 comments

Comments

@ekpog200
Copy link

Hello, when using n_lags, the question arises how does it interact with validation_df when fit?

Example:
dates = pd.date_range(start="2017-01-01", end="2024-12-01", freq="MS")
values = np.random.rand(len(dates)) * 100
data = pd.DataFrame({"ds": dates, "y": values})

train_data = data.iloc[:-12]
val_data = data.iloc[-12:] # !!!

model = NeuralProphet(
n_forecast=12,
n_lags=12,
yearly_seasonality=True,
weekly_seasonality=False,
daily_seasonality=False,
)

model.fit(train_data, freq="MS", validation_df=val_data)

When using n_lags = 12, validation_df must have a length of n_forecast + n_lags (in this case 12 + 12 = 24)

How would it be more correct to split the data to avoid a data leak?:

  1. train_data = data.iloc[:-24] # -> n_forecast + n_lags for val_data
    val_data = data.iloc[-24:]

  2. train_data = data.iloc[:-12]
    val_data = data.iloc[-24:]

In the second case, for example, the entire year 2023 will be in both train_df and val_df. For val_df, 2023 is considered n_lags, which in fact should only be used for predict on n_forecast and should not be used for validation in fit (for train)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant