-
Notifications
You must be signed in to change notification settings - Fork 184
[enhancement] Introduce DummyRegressor Estimator (prototype estimator for sklearnex design) #2534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement] Introduce DummyRegressor Estimator (prototype estimator for sklearnex design) #2534
Conversation
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 40 files with indirect coverage changes 🚀 New features to boost your workflow:
|
sklearnex/tests/prototypes.py
Outdated
| # | ||
| # 1) All sklearnex estimators must inherit oneDALestimator and the sklearn | ||
| # estimator that it is replicating (i.e. before in the mro). If there is | ||
| # not an equivalent sklearn estimator, then sklearn's BaseEstimator must be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also inherit from the corresponding type for what the estimator does, like RegressorMixin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added snippet to refer to the Mixins, though in most cases that should be handled by the underlying sklearn estimator, we need to be careful with sklearnex-only versions (and therefore good call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icfaust It appears to have been missed after the latest commits.
sklearnex/tests/prototypes.py
Outdated
| # inherited. | ||
| # | ||
| # 2) ``check_is_fitted`` is required for any method in an estimator which | ||
| # requires first calling ``fit`` or ``partial_fit``. This is a sklearn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not a requirement to call this specific sklearn function within .fit, only to make the estimator work correctly when that function is called on it:
https://scikit-learn.org/stable/developers/develop.html#developer-api-for-check-is-fitted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, though such a use should be considered way out of the norm.
sklearnex/tests/prototypes.py
Outdated
| # examples are ``fit`` and ``predict``. They use a direct equivalent oneDAL | ||
| # function for evaluation. These methods are of highest priority and have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # examples are ``fit`` and ``predict``. They use a direct equivalent oneDAL | |
| # function for evaluation. These methods are of highest priority and have | |
| # examples are ``fit`` and ``predict``. They use a direct equivalent function | |
| # from oneDAL. These methods are of highest priority and have |
Vika-F
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this example! It makes many aspects of sklearnex implementation much clearer.
It would be also good to place a link to this file somewhere here:
https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/doc/sources/contribute.rst
Another [not super-important, but I have to say about it] thing that bothers me a bit is: how to maintain the validity of the recommendations here? For example, this get_namespace functionality was implemented several months ago. How the developer of a new product-wide decorator or method would know that this file also needs to be updated?
| # Sklearnex estimators follow a Matryoshka doll pattern with respect to | ||
| # the underlying oneDAL library. The sklearnex estimator is a | ||
| # public-facing API which mimics sklearn. Sklearnex estimators will | ||
| # create another estimator, defined in the ``onedal`` module, for | ||
| # having a python interface with oneDAL. Finally, this python object | ||
| # will use pybind11 to call oneDAL directly via pybind11-generated | ||
| # objects and functions This is known as the ``backend``. These are | ||
| # separate entities and do not inherit from one another. The clear | ||
| # separation has utility so long that the following rules are followed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you bring a bit more structure to this part?
Because I think it is very important for the understanding of overall sklearnex implementation. But it is rather hard to grasp the idea when it is written as a single text block. Though I like the Matryoshka doll association =)
It can be something like:
- The sklearnex estimator is a public facing API ...
- The
onedalestimator ... - The pybind11 backend ...
These are separate entities...
|
/intelci: run |
|
/intelci: run |
|
private CI failure due to infrastructure issues. |
Vika-F
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for presenting all this semi-hidden knowledge in a well structured and understandable way!
|
/intelci: run |
ethanglaser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I assume the codecoverage is not a concern
| // policy_list is defined elsewhere which is dependent on the backend | ||
| // which is being built. Placed within a macro-check in order to prevent | ||
| // use with an spmd policy. | ||
| #ifndef ONEDAL_DATA_PARALLEL_SPMD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about else (if spmd to be instantiated)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment
|
|
||
| #include "onedal/common.hpp" | ||
| #include "onedal/version.hpp" | ||
| #include "onedal/dummy/dummy_onedal.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here specifying that in practice this would instead look like #include oneapi/dal/algo/... would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
|
/intelci: run |
Description
First start by generating code which represents the basic requirements of a sklearnex and onedal estimator.
This PR serves two purposes: to ease understanding of the codebase for external development and standardize development
occurring for array API support.
Next will be to make the necessary doc page links to various aspects to act as a guide for array API development. Which will help in external user contribution.
My goal will be to see if I can get an LLM with this information to generate StandardScaler using BasicStatistics. If it can, that means an LLM can help guide a user with this starting prompt in more difficult scenarios.
PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing