-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Open
Labels
Description
Description
If you pass a non-existent file via parameter forcedsplits_filename
, lightgbm
appears to silently ignore it.
It should raise an informative if reading that file fails, or at least log a warning.
Reproducible Example
Using lightgbm==4.6.0
installed from PyPI.
import json
import lightgbm as lgb
import numpy as np
from sklearn.datasets import make_regression
X, y = make_regression(
n_samples=10_000,
n_features=5,
n_informative=5,
random_state=42
)
# add a noise feature
noise_feature = np.random.random(size=(X.shape[0], 1))
X = np.concatenate((X, noise_feature), axis=1)
# force the use of that noise feature in every tree
forced_split = {
"feature": 5,
"threshold": np.mean(noise_feature),
}
with open("forced_splits.json", "w") as f:
f.write(json.dumps(forced_split))
# train another model, forcing it to use those splits
model = lgb.LGBMRegressor(
random_state=708,
n_estimators=10,
verbose=1,
forcedsplits_filename="forced_splits.json",
)
model.fit(X, y)
# noise feature was used exactly once in every tree
# (because we forced LightGBM to use it)
model.feature_importances_
# array([ 0, 109, 132, 0, 49, 10], dtype=int32)
# passing a non-existent file... no warning, no error
model2 = lgb.LGBMRegressor(
random_state=708,
n_estimators=10,
verbose=1,
forcedsplits_filename="does-not-exist.json",
)
model2.fit(X, y)
Logs from that second .fit()
:
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000568 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1530
[LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 6
[LightGBM] [Info] Start training from score -0.889445
LGBMRegressor(forcedsplits_filename='does-not-exist.json', n_estimators=10,
random_state=708, verbose=1)
Notes
Noticed this while working on https://stackoverflow.com/a/79435055/3986677.
I strongly suspect it is not specific to the Python package, and that changes need to be made in the C++ code.