You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the second case, for example, the entire year 2023 will be in both train_df and val_df. For val_df, 2023 is considered n_lags, which in fact should only be used for predict on n_forecast and should not be used for validation in fit (for train)
The text was updated successfully, but these errors were encountered:
Hello, when using n_lags, the question arises how does it interact with validation_df when fit?
Example:
dates = pd.date_range(start="2017-01-01", end="2024-12-01", freq="MS")
values = np.random.rand(len(dates)) * 100
data = pd.DataFrame({"ds": dates, "y": values})
train_data = data.iloc[:-12]
val_data = data.iloc[-12:] # !!!
model = NeuralProphet(
n_forecast=12,
n_lags=12,
yearly_seasonality=True,
weekly_seasonality=False,
daily_seasonality=False,
)
model.fit(train_data, freq="MS", validation_df=val_data)
When using n_lags = 12, validation_df must have a length of n_forecast + n_lags (in this case 12 + 12 = 24)
How would it be more correct to split the data to avoid a data leak?:
train_data = data.iloc[:-24] # -> n_forecast + n_lags for val_data
val_data = data.iloc[-24:]
train_data = data.iloc[:-12]
val_data = data.iloc[-24:]
In the second case, for example, the entire year 2023 will be in both train_df and val_df. For val_df, 2023 is considered n_lags, which in fact should only be used for predict on n_forecast and should not be used for validation in fit (for train)
The text was updated successfully, but these errors were encountered: