-
Notifications
You must be signed in to change notification settings - Fork 184
[enhancement] Enable Array API in ensemble algos #2201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/intelci: run |
Co-authored-by: david-cortes-intel <[email protected]>
|
/intelci: run |
|
/intelci: run |
|
I'm not sure if this is a bug in DPNP or an issue with this PR where not all inputs are converted as necessary, but I see this error if I try to make predictions on numpy arrays from a model that was fitted to dpnp arrays: Reproducer: import os, sys
os.environ["SCIPY_ARRAY_API"] = "1"
import numpy as np
import dpnp
from sklearnex import config_context, set_config
from sklearnex.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(random_state=123)
Xd = dpnp.array(X, dtype=np.float32, device="cpu")
yd = dpnp.array(y, dtype=np.float32, device="cpu")
set_config(array_api_dispatch=True)
model = RandomForestClassifier(n_estimators=1, max_depth=5).fit(Xd, yd)
model.predict(X[:5]) |
|
@icfaust Since the equivalent classes in sklearn do not have array API support: how is this meant to work for attributes? I see that the arrays in the I guess this might be intended, but if that is the case, please add this kind of behavior to the documentation. |
| ), | ||
| ( | ||
| not sp.issparse(data[2]), | ||
| "sample_weight is sparse. " "Sparse input is not supported.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this perhaps meant to have an additional check on 'X'?
@david-cortes-intel #2747 This is an issue beyond just this estimator, and should be addressed as a follow up with additional testing added in a central way. |
|
/intelci: run |
Description
This PR refactors the Ensemble algorithms (RandomForestRegressor, RandomForestClassifier, ExtraTreesRegressor and ExtraTreesClassifier) to follow repository standards and add array API support. This reduced the code by 500+ lines and required the following changes:
BaseEstimatorinheritance from onedal ensemble estimators__init__signatures to remove sklearn conformant kwargs in onedal ensemble estimatorsrandom_stateuse from onedal estimatorsclass_countkwarg tofitas calculating it in python is scikit-learn conformance (oneDAL expects it a priori)oneDALfor use by Classifiers and Regressors_create_modelfunctionpredictmethodForestRegressorandForestClasssifierobjects to minimize maintenancemax_samplestoobservations_per_tree_fractionto follow oneDAL values_save_attributesmethod to be specific to Classifiers vs Regressors_onedal_fit_ready,_onedal_cpu_supportedand_onedal_gpu_supportedto reduce code duplication via inheritance and make array API enabledenable_array_apidecorators to public-facing estimators_check_parametersfunction behindsklearn_check_versionfor future removalmin_impurity_splitwhich was removed in sklearn 0.25_validate_y_class_weightmethod designed specifically for sklearnex estimators (missing some functionality which is irrelevant to the sklearnex estimator)check_n_featuresfromsklearnex.utils.validationas it is no longer necessaryPR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance