Skip to content

Conversation

@david-cortes-intel
Copy link
Contributor

Description

This PR modifies the logic in the TSNE class to offload to base sklearn as early as possible when it receives unsupported inputs. With these changes, it now allows using the stock version for cases that they now support but oneDAL doesn't, such as PCA initialization with sparse inputs.

Along the way, it also makes a couple necessary changes that appear to have been scheduled for sklearn1.2 but were not updated here, and it updates the documentation about what is and isn't supported for this algorithm.

Note that a lot of the code added here mirrors scikit-learn, since for the most part this class is a copy-paste of it that's been getting out of synch:
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/manifold/_t_sne.py


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.

@david-cortes-intel
Copy link
Contributor Author

/azp run Nightly

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@david-cortes-intel
Copy link
Contributor Author

/azp run Nightly

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@david-cortes-intel
Copy link
Contributor Author

/azp run Nightly

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure 80.49% <ø> (ø)
github 82.10% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

"from 200.0 to 'auto' in 1.2.",
FutureWarning,
)
self._learning_rate = 200.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like more versioning is needed here, as it fails on older versions of scikit:
AttributeError: 'TSNE' object has no attribute '_learning_rate'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@david-cortes-intel
Copy link
Contributor Author

/azp run Nightly

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

(
(
isinstance(self.init, str)
and self.init in ["random", "pca", "warn"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of "warn" initialization method? Is it documented somewhere, because this is not present in stock sklearn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the default parameter in sklearn1.0:
https://github.com/scikit-learn/scikit-learn/blob/baf828ca126bcb2c0ad813226963621cafe38adb/sklearn/manifold/_t_sne.py#L750

It means that it will issue a future warning if not changed from that default.

and self.init in ["random", "pca", "warn"]
)
or isinstance(self.init, np.ndarray),
"'init' must be 'exact', 'pca', or a numpy array.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a mistake in error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'warn' is no longer allowed in newer sklearn versions, so it's not referenced here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but init can't be exact? Also it can't be numpy array according to this condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's no init 'exact'; and there's an 'or' condition where it allows numpy arrays.

- ``method`` = `'exact'`
- ``verbose`` != `0`
- ``n_components`` > ``2``
- ``method`` = ``'exact'``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in case of exact method we don't fallback to sklearn, should we add it as supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It falls back on the first condition in the patching chain.

skip_num_points=skip_num_points,
)
X_embedded = check_array(X_embedded, dtype=[np.float32, np.float64])
return self._daal_tsne(P, n_samples, X_embedded=X_embedded)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that in case method == exact we would still call this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code would not be reached with method 'exact'.

assert np.any(embedding != 0)


# Note: since sklearn1.2, the PCA initialization divides by standard deviations of components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add another test case in the future instead of removed one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what other case would you add?

@Vika-F
Copy link
Contributor

Vika-F commented Nov 26, 2025

/intelci: run

Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets wait for a (semi)green Pre-Commit and LGTM.

@david-cortes-intel david-cortes-intel merged commit be23e7b into uxlfoundation:main Nov 26, 2025
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sklearn-patch sklearn patching

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants