Skip to content

Commit aaeee6d

Browse files
committed
add developer API post
1 parent b534b86 commit aaeee6d

File tree

1 file changed

+142
-0
lines changed

1 file changed

+142
-0
lines changed

_posts/2024-12-05-dev-api.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
---
2+
#### Blog Post Template ####
3+
4+
#### Post Information ####
5+
title: "Changes and development of scikit-learn's developer API"
6+
date: December 12, 2024
7+
8+
#### Post Category and Tags ####
9+
# Format in titlecase without dashes (Ex. "Open Source" instead of "open-source")
10+
categories:
11+
- Updates
12+
tags:
13+
- Open Source
14+
- Machine Learning
15+
- License
16+
17+
#### Featured Image ####
18+
featured-image: BSD_watermark.svg
19+
20+
#### Author Info ####
21+
# Can accomodate multiple authors
22+
# Add SQUARE Author Image to /assets/images/author_images/ folder
23+
postauthors:
24+
- name: Adrin Jalali
25+
website: https://adrin.info/
26+
image: adrin-jalali.jpeg
27+
---
28+
<div>
29+
<img src="/assets/images/posts_images/{{ page.featured-image }}" alt="">
30+
{% include postauthor.html %}
31+
</div>
32+
33+
Historically, scikit-learn's API has been divided into public and private. Public API is
34+
intended to be used by users, and private API is used internally in scikit-learn to
35+
develop new features and estimators. However, many of those functionalities have become
36+
essential to develop scikit-learn estimators by third parties who develop them outside
37+
the scikit-learn codebase.
38+
39+
When it comes to our public API, we have very strict and high standards on backward
40+
compatibility. The rule of thumb is that no change should cause a change in users'
41+
code unless we warn about it for two release cycles, which means we give users a year
42+
time to update their code.
43+
44+
On the other hand, we have no such guarantees or constraints on our private API. This
45+
brings an issue to third party developers who would like to use methods used by
46+
scikit-learn developers to develop their estimators. Constantly changing private API
47+
without prior warning brings certain challenges to third party developers which is not
48+
ideal.
49+
50+
As a result, we've been working on creating a developer API which would sit somewhere
51+
between our public and private API in terms of backward compatibility. That means we
52+
intend to try to keep that API stable, and if needed, introduce changes with one release
53+
cycle warning.
54+
55+
In the past few releases, we've slowly introduced more functionalities under this
56+
umbrella. `__sklearn_clone__` and `__sklearn_is_fitted__` are two examples.
57+
58+
In the latest release, at the time of writing this post, we focused on the testing
59+
infrasutructure and estimator tag system. Estimator tags used to be private, and we
60+
were not sure about their design. In the 1.6 release, new tags are introduced and
61+
using them looks like the following:
62+
63+
```python
64+
from sklearn.base import BaseEstimator, ClassifierMixin
65+
66+
class MyEstimator(ClassifierMixin, BaseEstimator):
67+
68+
...
69+
70+
def __sklearn_tags__(self):
71+
tags = super().__sklearn_tags__()
72+
# modify tags here
73+
tags.non_deterministic = True
74+
return tags
75+
```
76+
77+
The new tags mostly follow the same structure as the old tags, but there are certain
78+
changes to them. The main change is that the old `_xfail_checks` is no more present
79+
in the new tags. That tag was used to tell the common testing tools about the tests
80+
which are known to fail and are to be skipped. That information is now directly passed
81+
to the test functionalities. The old way of skipping a test was the following:
82+
83+
```python
84+
from sklearn.base import BaseEstimator, ClassifierMixin
85+
86+
class MyEstimator(ClassifierMixin, BaseEstimator):
87+
88+
...
89+
90+
def _more_tags(self):
91+
return {
92+
"_xfail_checks": {
93+
"check_to_skip_name": "this check is known to fail",
94+
...
95+
}
96+
}
97+
```
98+
99+
And then when calling `check_estimator` or using `parametrize_with_checks` with `pytest`
100+
would automatically ignore those tests for the estimator.
101+
102+
Instead, in this release, you pass that information directly to those methods:
103+
104+
```python
105+
from sklearn.utils.estimator_checks import check_estimator, parametrize_with_checks
106+
107+
CHECKS_EXPECTED_TO_FAIL = {
108+
"check_to_skip_name": "this check is known to fail",
109+
...
110+
}
111+
112+
# Using check_estimator
113+
def test_with_check_estimator():
114+
check_estimator(MyEstimator(), expected_failed_checks=CHECKS_EXPECTED_TO_FAIL)
115+
116+
# Using parametrize_with_checks
117+
@parametrize_with_checks(
118+
[MyEstimator()],
119+
expected_failed_checks=lambda est: CHECKS_EXPECTED_TO_FAIL
120+
)
121+
def test_with_parametrize_with_checks(estimator, check):
122+
check(estimator)
123+
```
124+
125+
While working on the testing infrastructure, we have also been working on improving our
126+
tests and that means in this release we had a particularly higher number of changes in
127+
their names and what they do. The changes should have made it easier for developers to
128+
fix issues with their estimators. Note that you can now pass `legacy=False` to both
129+
`check_estimator` and `parametrize_with_checks` to include only strictly API related
130+
tests.
131+
132+
The above changes means developers need to updated their estimators and depending on
133+
what they use, write scikit-learn version specific code to handle supporting multiple
134+
scikit-learn versions. To make that process easier, we've worked on a package called
135+
[`sklearn_compat`](https://github.com/sklearn-compat/sklearn-compat/). You can either
136+
depend on it as a package dependency, or vendor a single file inside your project. At
137+
the moment this project is in its infancy and might change in the future. But hopefully
138+
it helps developers out there.
139+
140+
If you think there are missing functionalities in the developer API, please let us know
141+
and give us feedback on your [issue tracker](
142+
https://github.com/scikit-learn/scikit-learn/issues).

0 commit comments

Comments
 (0)