-
Notifications
You must be signed in to change notification settings - Fork 184
MAINT: Update TSNE for sklearn1.8 #2793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Update TSNE for sklearn1.8 #2793
Conversation
|
/azp run Nightly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Nightly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Nightly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
| "from 200.0 to 'auto' in 1.2.", | ||
| FutureWarning, | ||
| ) | ||
| self._learning_rate = 200.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like more versioning is needed here, as it fails on older versions of scikit:
AttributeError: 'TSNE' object has no attribute '_learning_rate'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
/azp run Nightly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
daal4py/sklearn/manifold/_t_sne.py
Outdated
| ( | ||
| ( | ||
| isinstance(self.init, str) | ||
| and self.init in ["random", "pca", "warn"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of "warn" initialization method? Is it documented somewhere, because this is not present in stock sklearn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the default parameter in sklearn1.0:
https://github.com/scikit-learn/scikit-learn/blob/baf828ca126bcb2c0ad813226963621cafe38adb/sklearn/manifold/_t_sne.py#L750
It means that it will issue a future warning if not changed from that default.
| and self.init in ["random", "pca", "warn"] | ||
| ) | ||
| or isinstance(self.init, np.ndarray), | ||
| "'init' must be 'exact', 'pca', or a numpy array.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a mistake in error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'warn' is no longer allowed in newer sklearn versions, so it's not referenced here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but init can't be exact? Also it can't be numpy array according to this condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's no init 'exact'; and there's an 'or' condition where it allows numpy arrays.
| - ``method`` = `'exact'` | ||
| - ``verbose`` != `0` | ||
| - ``n_components`` > ``2`` | ||
| - ``method`` = ``'exact'`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in case of exact method we don't fallback to sklearn, should we add it as supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It falls back on the first condition in the patching chain.
| skip_num_points=skip_num_points, | ||
| ) | ||
| X_embedded = check_array(X_embedded, dtype=[np.float32, np.float64]) | ||
| return self._daal_tsne(P, n_samples, X_embedded=X_embedded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that in case method == exact we would still call this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code would not be reached with method 'exact'.
| assert np.any(embedding != 0) | ||
|
|
||
|
|
||
| # Note: since sklearn1.2, the PCA initialization divides by standard deviations of components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add another test case in the future instead of removed one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what other case would you add?
|
/intelci: run |
Vika-F
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets wait for a (semi)green Pre-Commit and LGTM.
be23e7b
into
uxlfoundation:main
Description
This PR modifies the logic in the TSNE class to offload to base sklearn as early as possible when it receives unsupported inputs. With these changes, it now allows using the stock version for cases that they now support but oneDAL doesn't, such as PCA initialization with sparse inputs.
Along the way, it also makes a couple necessary changes that appear to have been scheduled for sklearn1.2 but were not updated here, and it updates the documentation about what is and isn't supported for this algorithm.
Note that a lot of the code added here mirrors scikit-learn, since for the most part this class is a copy-paste of it that's been getting out of synch:
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/manifold/_t_sne.py
Checklist:
Completeness and readability
Testing