[enhancement] Enable Array API in ensemble algos #2201

icfaust · 2024-12-02T22:31:09Z

Description

This PR refactors the Ensemble algorithms (RandomForestRegressor, RandomForestClassifier, ExtraTreesRegressor and ExtraTreesClassifier) to follow repository standards and add array API support. This reduced the code by 500+ lines and required the following changes:

Remove BaseEstimator inheritance from onedal ensemble estimators
Change estimator __init__ signatures to remove sklearn conformant kwargs in onedal ensemble estimators
Inline code comments added for function of various aspects for future maintenance
Remove random_state use from onedal estimators
Add class_count kwarg to fit as calculating it in python is scikit-learn conformance (oneDAL expects it a priori)
Remove input parameter checks from the onedal estimators
generalize return of out of bag values from oneDAL for use by Classifiers and Regressors
Remove unused _create_model function
Centralized predict method
Create ForestRegressor and ForestClasssifier objects to minimize maintenance
swap away from max_samples to observations_per_tree_fraction to follow oneDAL values
Modify tests for onedal to use numpy arrays (which can be consumed, where lists cannot)
Reorder warnings and errors based on type (e.g. parameter checks vs input checks etc.)
Refactor _save_attributes method to be specific to Classifiers vs Regressors
Refactor _onedal_fit_ready, _onedal_cpu_supported and _onedal_gpu_supported to reduce code duplication via inheritance and make array API enabled
Add enable_array_api decorators to public-facing estimators
Place _check_parameters function behind sklearn_check_version for future removal
Remove check for min_impurity_split which was removed in sklearn 0.25
Add array API-enabled _validate_y_class_weight method designed specifically for sklearnex estimators (missing some functionality which is irrelevant to the sklearnex estimator)
Remove check_n_features from sklearnex.utils.validation as it is no longer necessary

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

icfaust · 2025-10-22T21:48:09Z

/intelci: run

Co-authored-by: david-cortes-intel <[email protected]>

icfaust · 2025-10-22T22:38:17Z

/intelci: run

icfaust · 2025-10-23T08:44:46Z

/intelci: run

david-cortes-intel · 2025-10-23T09:21:26Z

I'm not sure if this is a bug in DPNP or an issue with this PR where not all inputs are converted as necessary, but I see this error if I try to make predictions on numpy arrays from a model that was fitted to dpnp arrays:

File /export/users/dcortes/repos/sklex-dpnp/sklearnex/ensemble/_forest.py:867, in ForestClassifier._onedal_predict(self, X, queue)
    864 res = self._onedal_estimator.predict(X, queue=queue)
    866 if is_array_api_compliant:
--> 867     return xp.take(self.classes_, xp.astype(xp.reshape(res, (-1,)), xp.int64))
    868 else:
    869     return xp.take(self.classes_, res.ravel().astype(xp.int64, casting="unsafe"))

Reproducer:

import os, sys
os.environ["SCIPY_ARRAY_API"] = "1"
import numpy as np
import dpnp
from sklearnex import config_context, set_config
from sklearnex.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(random_state=123)
Xd = dpnp.array(X, dtype=np.float32, device="cpu")
yd = dpnp.array(y, dtype=np.float32, device="cpu")

set_config(array_api_dispatch=True)
model = RandomForestClassifier(n_estimators=1, max_depth=5).fit(Xd, yd)
model.predict(X[:5])

david-cortes-intel · 2025-10-23T09:38:58Z

@icfaust Since the equivalent classes in sklearn do not have array API support: how is this meant to work for attributes? I see that the arrays in the Tree object class for example are always numpy regardless of what was passed as inputs, while other internal attributes appear to have the array API class that was used as input, with data being on the corresponding device.

I guess this might be intended, but if that is the case, please add this kind of behavior to the documentation.

david-cortes-intel · 2025-10-23T09:44:05Z

sklearnex/ensemble/_forest.py

+                    ),
+                    (
+                        not sp.issparse(data[2]),
+                        "sample_weight is sparse. " "Sparse input is not supported.",


Was this perhaps meant to have an additional check on 'X'?

icfaust · 2025-10-23T10:52:45Z

I'm not sure if this is a bug in DPNP or an issue with this PR where not all inputs are converted as necessary, but I see this error if I try to make predictions on numpy arrays from a model that was fitted to dpnp arrays:

File /export/users/dcortes/repos/sklex-dpnp/sklearnex/ensemble/_forest.py:867, in ForestClassifier._onedal_predict(self, X, queue)
    864 res = self._onedal_estimator.predict(X, queue=queue)
    866 if is_array_api_compliant:
--> 867     return xp.take(self.classes_, xp.astype(xp.reshape(res, (-1,)), xp.int64))
    868 else:
    869     return xp.take(self.classes_, res.ravel().astype(xp.int64, casting="unsafe"))

Reproducer:

import os, sys
os.environ["SCIPY_ARRAY_API"] = "1"
import numpy as np
import dpnp
from sklearnex import config_context, set_config
from sklearnex.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(random_state=123)
Xd = dpnp.array(X, dtype=np.float32, device="cpu")
yd = dpnp.array(y, dtype=np.float32, device="cpu")

set_config(array_api_dispatch=True)
model = RandomForestClassifier(n_estimators=1, max_depth=5).fit(Xd, yd)
model.predict(X[:5])

@david-cortes-intel #2747 This is an issue beyond just this estimator, and should be addressed as a follow up with additional testing added in a central way.

icfaust · 2025-10-23T10:55:30Z

/intelci: run

icfaust and others added 30 commits October 23, 2024 13:02

add finiteness_checker pybind11 bindings

32fe269

added finiteness checker

cdbf1b5

Update finiteness_checker.cpp

62674a2

Update finiteness_checker.cpp

c75c23b

Update finiteness_checker.cpp

6a20938

Update finiteness_checker.cpp

382d7a1

Update finiteness_checker.cpp

c8ffd9c

Update finiteness_checker.cpp

9aa13d5

Rename finiteness_checker.cpp to finiteness_checker.cpp

84e15d5

Update finiteness_checker.cpp

63073c6

Merge branch 'intel:main' into dev/new_assert_all_fininte

d915da5

add next step

3dddf2d

follow conventions

1e1213e

make xtable explicit

0531713

remove comment

e831167

Update validation.py

d6eb1d0

Update __init__.py

fb30d6e

Update validation.py

63a18c2

Update __init__.py

76c0856

Update __init__.py

7deb2bb

Update validation.py

ed46b29

Update _data_conversion.py

67d6273

Merge branch 'main' into dev/new_assert_all_fininte

054f0a1

Update _data_conversion.py

8abead9

Update policy_common.cpp

47d0f8b

Update policy_common.cpp

e48c2bd

Update _policy.py

c6751c4

Update policy_common.cpp

f3e4a3a

Rename finiteness_checker.cpp to finiteness_checker.cpp

39cdb5f

Create finiteness_checker.py

0f39613

icfaust added 2 commits October 22, 2025 22:41

try to fix classifiers for array API inputs

0c22a76

try again

c343239

icfaust and others added 2 commits October 23, 2025 00:10

Update array_api.rst

dddf2f3

Update sklearnex/ensemble/_forest.py

6dd1d20

Co-authored-by: david-cortes-intel <[email protected]>

icfaust changed the title ~~[enhancement] WIP: move finite check to sklearnex in ensemble algos~~ [enhancement] Enable Array API in ensemble algos Oct 22, 2025

icfaust added 4 commits October 23, 2025 08:33

Update _forest.py

71a30e9

Update _forest.py

afd0f3a

Update _forest.py

1ce8f81

Merge branch 'uxlfoundation:main' into dev/new_RF

5d84beb

icfaust marked this pull request as ready for review October 23, 2025 08:46

icfaust requested review from Alexsandruss, Vika-F, ahuber21, avolkov-intel, ethanglaser, maria-Petrova, razdoburdin, syakov-intel and yuejiaointel as code owners October 23, 2025 08:46

david-cortes-intel reviewed Oct 23, 2025

View reviewed changes

Update _forest.py

665b903

icfaust mentioned this pull request Oct 23, 2025

[maintenance] change logic in sklearnex get_namespace to follow sklearn array API expected behavior #2747

Draft

10 tasks

Merge branch 'uxlfoundation:main' into dev/new_RF

ff5d5cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[enhancement] Enable Array API in ensemble algos #2201

[enhancement] Enable Array API in ensemble algos #2201

Uh oh!

icfaust commented Dec 2, 2024 •

edited

Loading

Uh oh!

icfaust commented Oct 22, 2025

Uh oh!

icfaust commented Oct 22, 2025

Uh oh!

icfaust commented Oct 23, 2025

Uh oh!

david-cortes-intel commented Oct 23, 2025 •

edited

Loading

Uh oh!

david-cortes-intel commented Oct 23, 2025

Uh oh!

david-cortes-intel Oct 23, 2025

Uh oh!

icfaust commented Oct 23, 2025 •

edited

Loading

Uh oh!

icfaust commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[enhancement] Enable Array API in ensemble algos #2201

Are you sure you want to change the base?

[enhancement] Enable Array API in ensemble algos #2201

Uh oh!

Conversation

icfaust commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

icfaust commented Oct 22, 2025

Uh oh!

icfaust commented Oct 22, 2025

Uh oh!

icfaust commented Oct 23, 2025

Uh oh!

david-cortes-intel commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-cortes-intel commented Oct 23, 2025

Uh oh!

david-cortes-intel Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

icfaust commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icfaust commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

icfaust commented Dec 2, 2024 •

edited

Loading

david-cortes-intel commented Oct 23, 2025 •

edited

Loading

icfaust commented Oct 23, 2025 •

edited

Loading