Skip to content

Commit 8d5beca

Browse files
committed
Release
1 parent e07c2d4 commit 8d5beca

File tree

8 files changed

+313
-214
lines changed

8 files changed

+313
-214
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Like in tree-based algorithms, the data are split according to simple decision r
99

1010
**Linear Forests** generalize the well known Random Forests by combining Linear Models with the same Random Forests. The key idea is to use the strength of Linear Models to improve the nonparametric learning ability of tree-based algorithms. Firstly, a Linear Model is fitted on the whole dataset, then a Random Forest is trained on the same dataset but using the residuals of the previous steps as target. The final predictions are the sum of the raw linear predictions and the residuals modeled by the Random Forest.
1111

12-
**Linear Boosting** is a two stage learning process. Firstly, a linear model is trained on the initial dataset to obtains predictions. Secondly, the residuals of the previous step are modeled with a decision tree using all the available features. The tree identifies the path leading to highest error (i.e. the worst leaf). The leaf contributing to the error the most is used to generate a new binary feature to be used in the first stage. The iterations continue until a certain stopping criterion is met.
12+
**Linear Boosting** is a two stage learning process. Firstly, a linear model is trained on the initial dataset to obtain predictions. Secondly, the residuals of the previous step are modeled with a decision tree using all the available features. The tree identifies the path leading to highest error (i.e. the worst leaf). The leaf contributing to the error the most is used to generate a new binary feature to be used in the first stage. The iterations continue until a certain stopping criterion is met.
1313

1414
**linear-tree is developed to be fully integrable with scikit-learn**. ```LinearTreeRegressor``` and ```LinearTreeClassifier``` are provided as scikit-learn _BaseEstimator_ to build a decision tree using linear estimators. ```LinearForestRegressor``` and ```LinearForestClassifier``` use the _RandomForest_ from sklearn to model residuals. ```LinearBoostRegressor``` and ```LinearBoostClassifier``` are available also as _TransformerMixin_ in order to be integrated, in any pipeline, also for automated features engineering. All the models available in [sklearn.linear_model](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model) can be used as base learner.
1515

lineartree/_classes.py

Lines changed: 31 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,6 @@
1111

1212
from sklearn.base import is_regressor
1313
from sklearn.base import BaseEstimator, TransformerMixin
14-
15-
from sklearn.utils import check_array
1614
from sklearn.utils.validation import has_fit_parameter, check_is_fitted
1715

1816
from ._criterion import SCORING
@@ -123,7 +121,6 @@ def _parallel_binning_fit(split_feat, _self, X, y,
123121
model_right = DummyClassifier(strategy="most_frequent")
124122

125123
if weights is None:
126-
127124
model_left.fit(X[left_mesh], y[~mask])
128125
loss_left = feval(model_left, X[left_mesh], y[~mask],
129126
**largs_left)
@@ -135,17 +132,14 @@ def _parallel_binning_fit(split_feat, _self, X, y,
135132
wloss_right = loss_right * (n_right / n_sample)
136133

137134
else:
138-
139135
if support_sample_weight:
140-
141136
model_left.fit(X[left_mesh], y[~mask],
142137
sample_weight=weights[~mask])
143138

144139
model_right.fit(X[right_mesh], y[mask],
145140
sample_weight=weights[mask])
146141

147142
else:
148-
149143
model_left.fit(X[left_mesh], y[~mask])
150144

151145
model_right.fit(X[right_mesh], y[mask])
@@ -400,9 +394,7 @@ def _grow(self, X, y, weights=None):
400394
self._leaves[queue[-1]] = self._nodes[queue[-1]]
401395
del self._nodes[queue[-1]]
402396
queue.pop()
403-
404397
else:
405-
406398
model_left, loss_left, wloss_left, n_left, class_left = \
407399
left_node
408400
model_right, loss_right, wloss_right, n_right, class_right = \
@@ -700,10 +692,16 @@ def apply(self, X):
700692
"""
701693
check_is_fitted(self, attributes='_nodes')
702694

703-
X = check_array(
704-
X, accept_sparse=False, dtype=None,
705-
force_all_finite=False)
706-
self._check_n_features(X, reset=False)
695+
X = self._validate_data(
696+
X,
697+
reset=False,
698+
accept_sparse=False,
699+
dtype='float32',
700+
force_all_finite=True,
701+
ensure_2d=True,
702+
allow_nd=False,
703+
ensure_min_features=self.n_features_in_
704+
)
707705

708706
X_leaves = np.zeros(X.shape[0], dtype='int64')
709707

@@ -733,10 +731,16 @@ def decision_path(self, X):
733731
"""
734732
check_is_fitted(self, attributes='_nodes')
735733

736-
X = check_array(
737-
X, accept_sparse=False, dtype=None,
738-
force_all_finite=False)
739-
self._check_n_features(X, reset=False)
734+
X = self._validate_data(
735+
X,
736+
reset=False,
737+
accept_sparse=False,
738+
dtype='float32',
739+
force_all_finite=True,
740+
ensure_2d=True,
741+
allow_nd=False,
742+
ensure_min_features=self.n_features_in_
743+
)
740744

741745
indicator = np.zeros((X.shape[0], self.node_count), dtype='int64')
742746

@@ -976,8 +980,17 @@ def transform(self, X):
976980
`n_out` is equal to `n_features` + `n_estimators`
977981
"""
978982
check_is_fitted(self, attributes='base_estimator_')
979-
X = check_array(X, dtype=np.float32, accept_sparse=False)
980-
self._check_n_features(X, reset=False)
983+
984+
X = self._validate_data(
985+
X,
986+
reset=False,
987+
accept_sparse=False,
988+
dtype='float32',
989+
force_all_finite=True,
990+
ensure_2d=True,
991+
allow_nd=False,
992+
ensure_min_features=self.n_features_in_
993+
)
981994

982995
for tree, leaf in zip(self._trees, self._leaves):
983996
pred_tree = np.abs(tree.predict(X, check_input=False))

0 commit comments

Comments
 (0)