[enhancement] Introduce DummyRegressor Estimator (prototype estimator for sklearnex design) #2534

icfaust · 2025-06-11T10:56:49Z

Description

First start by generating code which represents the basic requirements of a sklearnex and onedal estimator.

This PR serves two purposes: to ease understanding of the codebase for external development and standardize development
occurring for array API support.

Next will be to make the necessary doc page links to various aspects to act as a guide for array API development. Which will help in external user contribution.

My goal will be to see if I can get an LLM with this information to generate StandardScaler using BasicStatistics. If it can, that means an LLM can help guide a user with this starting prompt in more difficult scenarios.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

codecov · 2025-06-11T11:48:09Z

Codecov Report

❌ Patch coverage is 86.77686% with 16 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
sklearnex/dummy/_dummy.py	82.75%	11 Missing and 4 partials ⚠️
onedal/dummy/dummy.py	95.65%	0 Missing and 1 partial ⚠️

Flag	Coverage Δ
azure	`80.44% <84.29%> (+0.08%)`	⬆️
github	`81.98% <100.00%> (+8.96%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
onedal/__init__.py	`86.66% <ø> (ø)`
onedal/dummy/__init__.py	`100.00% <100.00%> (ø)`
sklearnex/__init__.py	`92.85% <ø> (ø)`
sklearnex/dispatcher.py	`91.13% <100.00%> (+0.31%)`	⬆️
sklearnex/dummy/__init__.py	`100.00% <100.00%> (ø)`
onedal/dummy/dummy.py	`95.65% <95.65%> (ø)`
sklearnex/dummy/_dummy.py	`82.75% <82.75%> (ø)`

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

david-cortes-intel · 2025-06-11T11:05:37Z

sklearnex/tests/prototypes.py

+#
+# 1) All sklearnex estimators must inherit oneDALestimator and the sklearn
+# estimator that it is replicating (i.e. before in the mro).  If there is
+# not an equivalent sklearn estimator, then sklearn's BaseEstimator must be


Should also inherit from the corresponding type for what the estimator does, like RegressorMixin.

Added snippet to refer to the Mixins, though in most cases that should be handled by the underlying sklearn estimator, we need to be careful with sklearnex-only versions (and therefore good call).

@icfaust It appears to have been missed after the latest commits.

david-cortes-intel · 2025-06-11T11:07:33Z

sklearnex/tests/prototypes.py

+# inherited.
+#
+# 2) ``check_is_fitted`` is required for any method in an estimator which
+# requires first calling ``fit`` or ``partial_fit``. This is a sklearn


It's actually not a requirement to call this specific sklearn function within .fit, only to make the estimator work correctly when that function is called on it:
https://scikit-learn.org/stable/developers/develop.html#developer-api-for-check-is-fitted

Fair point, though such a use should be considered way out of the norm.

sklearnex/tests/prototypes.py

david-cortes-intel · 2025-06-11T11:10:33Z

sklearnex/tests/prototypes.py

+# examples are ``fit`` and ``predict``. They use a direct equivalent oneDAL
+# function for evaluation. These methods are of highest priority and have


Suggested change

# examples are ``fit`` and ``predict``. They use a direct equivalent oneDAL

# function for evaluation. These methods are of highest priority and have

# examples are ``fit`` and ``predict``. They use a direct equivalent function

# from oneDAL. These methods are of highest priority and have

sklearnex/tests/prototypes.py

Vika-F

Thank you for adding this example! It makes many aspects of sklearnex implementation much clearer.

It would be also good to place a link to this file somewhere here:
https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/doc/sources/contribute.rst

Another [not super-important, but I have to say about it] thing that bothers me a bit is: how to maintain the validity of the recommendations here? For example, this get_namespace functionality was implemented several months ago. How the developer of a new product-wide decorator or method would know that this file also needs to be updated?

sklearnex/tests/prototypes.py

Vika-F · 2025-06-20T17:37:14Z

sklearnex/tests/prototypes.py

+    # Sklearnex estimators follow a Matryoshka doll pattern with respect to
+    # the underlying oneDAL library. The sklearnex estimator is a
+    # public-facing API which mimics sklearn. Sklearnex estimators will
+    # create another estimator, defined in the ``onedal`` module, for
+    # having a python interface with oneDAL. Finally, this python object
+    # will use pybind11 to call oneDAL directly via pybind11-generated
+    # objects and functions This is known as the ``backend``. These are
+    # separate entities and do not inherit from one another. The clear
+    # separation has utility so long that the following rules are followed:


Can you bring a bit more structure to this part?
Because I think it is very important for the understanding of overall sklearnex implementation. But it is rather hard to grasp the idea when it is written as a single text block. Though I like the Matryoshka doll association =)

It can be something like:

The sklearnex estimator is a public facing API ...

The onedal estimator ...

The pybind11 backend ...
These are separate entities...

sklearnex/tests/prototypes.py

onedal/dummy/dummy.py

icfaust · 2025-10-02T19:39:09Z

/intelci: run

icfaust · 2025-10-13T12:59:28Z

/intelci: run

icfaust · 2025-10-13T14:51:43Z

private CI failure due to infrastructure issues.

Vika-F

Thank you for presenting all this semi-hidden knowledge in a well structured and understandable way!

icfaust · 2025-10-14T13:46:14Z

/intelci: run

ethanglaser

Great work! I assume the codecoverage is not a concern

ethanglaser · 2025-10-14T15:29:13Z

onedal/dummy/dummy.cpp

+    // policy_list is defined elsewhere which is dependent on the backend
+    // which is being built. Placed within a macro-check in order to prevent
+    // use with an spmd policy.
+#ifndef ONEDAL_DATA_PARALLEL_SPMD


what about else (if spmd to be instantiated)?

added a comment

ethanglaser · 2025-10-14T15:32:35Z

onedal/dummy/dummy.cpp

+
+#include "onedal/common.hpp"
+#include "onedal/version.hpp"
+#include "onedal/dummy/dummy_onedal.hpp"


A comment here specifying that in practice this would instead look like #include oneapi/dal/algo/... would be useful

icfaust · 2025-10-15T04:57:33Z

/intelci: run

icfaust added 3 commits June 11, 2025 12:48

initial information

3829036

more changes

1a4627c

missing space

90df3e3

david-cortes-intel reviewed Jun 11, 2025

View reviewed changes

icfaust added 9 commits June 11, 2025 14:22

interim changes

c4a6fdc

interim changes

b54fe71

interim update

befe555

step before adding tags and explanation

8f08da0

this will probably require splitting into a separate PR

7c06762

forgot return

2ab1084

fixes for CI

b34e741

move to follow proper development

7a4df2c

add docs

23a15d3

Vika-F reviewed Jun 20, 2025

View reviewed changes

icfaust added 6 commits June 24, 2025 07:17

add file

15bf789

added documentation

dbfc635

finish sentence

eb48444

finish sentence

be72e87

updates

b987736

updates

e3fa31b

icfaust mentioned this pull request Jun 30, 2025

Add MinMaxScaler Estimator #2309

Open

icfaust and others added 8 commits July 18, 2025 00:42

Update prototypes.py

e8ceac5

Update prototypes.py

0cbe504

Update prototypes.py

c4e2283

move folders to re-orient the prototype\

abba51d

add many lines

a09e238

add many lines

60a8df1

add many lines

85bdf39

stopping point

cd54ace

icfaust commented Sep 25, 2025

View reviewed changes

onedal/dummy/dummy.py Outdated Show resolved Hide resolved

icfaust and others added 12 commits October 1, 2025 14:09

add basic tests to dummy sklearnex estimator

9ad56ee

fix more tests

8af7f77

Update test_dummy.py

2ce7f89

Update test_dummy.py

081296a

Update test_dummy.py

4414f9f

Update _dummy.py

cf225b8

try to fix new test

6175373

disable tests for non array API support

1214f0a

Update test_dummy.py

a29712b

Update test_svr.py

2beafbf

Update test_svc.py

a332d9b

Update test_nusvc.py

e11da6d

icfaust and others added 2 commits October 13, 2025 14:18

Merge branch 'main' into dev/estimator_design_docs

ccd55c3

make changes to reflect new design

e7fcb1a

icfaust requested a review from ahuber21 October 13, 2025 13:41

Vika-F approved these changes Oct 14, 2025

View reviewed changes

icfaust added 4 commits October 14, 2025 15:43

address last comments

73358ec

address last comments

d57ae03

grammar

617bec1

grammar

8af27d1

ethanglaser approved these changes Oct 14, 2025

View reviewed changes

make recommended changes

c81571e

icfaust merged commit 1531c63 into uxlfoundation:main Oct 15, 2025
31 checks passed

icfaust deleted the dev/estimator_design_docs branch October 15, 2025 07:50

		# examples are ``fit`` and ``predict``. They use a direct equivalent oneDAL
		# function for evaluation. These methods are of highest priority and have

[enhancement] Introduce DummyRegressor Estimator (prototype estimator for sklearnex design) #2534

[enhancement] Introduce DummyRegressor Estimator (prototype estimator for sklearnex design) #2534

Uh oh!

Conversation

icfaust commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vika-F left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icfaust commented Oct 2, 2025

Uh oh!

icfaust commented Oct 13, 2025

Uh oh!

icfaust commented Oct 13, 2025

Uh oh!

Vika-F left a comment

Choose a reason for hiding this comment

Uh oh!

icfaust commented Oct 14, 2025

Uh oh!

ethanglaser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

icfaust commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

icfaust commented Jun 11, 2025 •

edited

Loading

codecov bot commented Jun 11, 2025 •

edited

Loading