MAINT: Update TSNE for sklearn1.8 #2793

david-cortes-intel · 2025-11-24T15:29:33Z

Description

This PR modifies the logic in the TSNE class to offload to base sklearn as early as possible when it receives unsupported inputs. With these changes, it now allows using the stock version for cases that they now support but oneDAL doesn't, such as PCA initialization with sparse inputs.

Along the way, it also makes a couple necessary changes that appear to have been scheduled for sklearn1.2 but were not updated here, and it updates the documentation about what is and isn't supported for this algorithm.

Note that a lot of the code added here mirrors scikit-learn, since for the most part this class is a copy-paste of it that's been getting out of synch:
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/manifold/_t_sne.py

Checklist:

Completeness and readability

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.

david-cortes-intel · 2025-11-24T15:29:53Z

/azp run Nightly

azure-pipelines · 2025-11-24T15:30:04Z

Azure Pipelines successfully started running 1 pipeline(s).

david-cortes-intel · 2025-11-25T08:53:41Z

/azp run Nightly

azure-pipelines · 2025-11-25T08:53:51Z

Azure Pipelines successfully started running 1 pipeline(s).

david-cortes-intel · 2025-11-25T09:22:01Z

/azp run Nightly

azure-pipelines · 2025-11-25T09:22:10Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2025-11-25T09:51:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
azure	`80.49% <ø> (ø)`
github	`82.10% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Vika-F · 2025-11-25T12:47:09Z

daal4py/sklearn/manifold/_t_sne.py

+                    "from 200.0 to 'auto' in 1.2.",
+                    FutureWarning,
+                )
+                self._learning_rate = 200.0


Looks like more versioning is needed here, as it fails on older versions of scikit:
AttributeError: 'TSNE' object has no attribute '_learning_rate'

david-cortes-intel · 2025-11-25T16:29:16Z

/azp run Nightly

azure-pipelines · 2025-11-25T16:29:26Z

Azure Pipelines successfully started running 1 pipeline(s).

avolkov-intel · 2025-11-26T08:28:32Z

daal4py/sklearn/manifold/_t_sne.py

+                (
+                    (
+                        isinstance(self.init, str)
+                        and self.init in ["random", "pca", "warn"]


What is the meaning of "warn" initialization method? Is it documented somewhere, because this is not present in stock sklearn?

It was the default parameter in sklearn1.0:
https://github.com/scikit-learn/scikit-learn/blob/baf828ca126bcb2c0ad813226963621cafe38adb/sklearn/manifold/_t_sne.py#L750

It means that it will issue a future warning if not changed from that default.

avolkov-intel · 2025-11-26T08:29:21Z

daal4py/sklearn/manifold/_t_sne.py

+                        and self.init in ["random", "pca", "warn"]
+                    )
+                    or isinstance(self.init, np.ndarray),
+                    "'init' must be 'exact', 'pca', or a numpy array.",


I think there's a mistake in error message

'warn' is no longer allowed in newer sklearn versions, so it's not referenced here.

but init can't be exact? Also it can't be numpy array according to this condition

No, there's no init 'exact'; and there's an 'or' condition where it allows numpy arrays.

avolkov-intel · 2025-11-26T09:15:40Z

doc/sources/guide/acceleration.rst

-  - ``method`` = `'exact'`
-  - ``verbose`` != `0`
+  - ``n_components`` > ``2``
+  - ``method`` = ``'exact'``


But in case of exact method we don't fallback to sklearn, should we add it as supported?

It falls back on the first condition in the patching chain.

avolkov-intel · 2025-11-26T09:16:06Z

daal4py/sklearn/manifold/_t_sne.py

-            skip_num_points=skip_num_points,
-        )
+        X_embedded = check_array(X_embedded, dtype=[np.float32, np.float64])
+        return self._daal_tsne(P, n_samples, X_embedded=X_embedded)


Is it correct that in case method == exact we would still call this function?

This code would not be reached with method 'exact'.

avolkov-intel · 2025-11-26T09:18:47Z

sklearnex/manifold/tests/test_tsne.py

        assert np.any(embedding != 0)


+# Note: since sklearn1.2, the PCA initialization divides by standard deviations of components.


Do we need to add another test case in the future instead of removed one?

No, what other case would you add?

Vika-F · 2025-11-26T13:25:29Z

/intelci: run

Vika-F

Lets wait for a (semi)green Pre-Commit and LGTM.

update tsne for sklearn1.8

db38d4c

david-cortes-intel requested review from avolkov-intel, ethanglaser and yuejiaointel November 24, 2025 15:29

david-cortes-intel added the sklearn-patch sklearn patching label Nov 24, 2025

david-cortes-intel added 2 commits November 25, 2025 09:52

fix test

9c01a8c

more corrections

b3f852f

more fixes for older sklearn

55a8529

Vika-F reviewed Nov 25, 2025

View reviewed changes

david-cortes-intel added 4 commits November 25, 2025 16:31

missing else

d960a14

more fixes for older sklearn

8ebab02

correction

42d8d50

remove redundant check

13ce364

david-cortes-intel marked this pull request as ready for review November 25, 2025 17:02

david-cortes-intel requested review from ahuber21, icfaust, maria-Petrova and syakov-intel as code owners November 25, 2025 17:02

avolkov-intel reviewed Nov 26, 2025

View reviewed changes

more clear conditions

2cbeeb9

avolkov-intel reviewed Nov 26, 2025

View reviewed changes

Vika-F approved these changes Nov 26, 2025

View reviewed changes

david-cortes-intel merged commit be23e7b into uxlfoundation:main Nov 26, 2025
30 of 31 checks passed

		assert np.any(embedding != 0)


		# Note: since sklearn1.2, the PCA initialization divides by standard deviations of components.

MAINT: Update TSNE for sklearn1.8 #2793

MAINT: Update TSNE for sklearn1.8 #2793

Uh oh!

Conversation

david-cortes-intel commented Nov 24, 2025

Description

Uh oh!

david-cortes-intel commented Nov 24, 2025

Uh oh!

azure-pipelines bot commented Nov 24, 2025

Uh oh!

david-cortes-intel commented Nov 25, 2025

Uh oh!

azure-pipelines bot commented Nov 25, 2025

Uh oh!

david-cortes-intel commented Nov 25, 2025

Uh oh!

azure-pipelines bot commented Nov 25, 2025

Uh oh!

codecov bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel commented Nov 25, 2025

Uh oh!

azure-pipelines bot commented Nov 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vika-F commented Nov 26, 2025

Uh oh!

Vika-F left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 25, 2025 •

edited

Loading