Skip to content

Dataset similarity #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Dataset similarity #122

wants to merge 7 commits into from

Conversation

JochenSiegWork
Copy link
Collaborator

@JochenSiegWork JochenSiegWork commented Feb 10, 2025

  • Add new class that performs k-nearest neighbor searches using
    Tanimoto similarity. The implementation uses sparse dot product
    making the algorithm 2-3x faster than RDKit's BulkTanimotoSimilarity
  • Add notebook illustrating NearestNeighborsRetrieverTanimoto for
    dataset similarity analysis, like train/test set comaparison.

Also addresses #117

    - Add new class that performs k-nearest neighbor searches using
      Tanimoto similarity. The implementation uses sparse dot product
      making the algorithm 2-3x faster than RDKit's BulkTanimotoSimilarity
    - Add notebook illustrating NearestNeighborsRetrieverTanimoto for
      dataset similarity analysis, like train/test set comaparison.
Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition as discussed: Make it an estimator

else:
self.k = k
self.batch_size = batch_size
if n_jobs == -1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JochenSiegWork
Copy link
Collaborator Author

In addition to the dot-product Tanimoto, we could also check out if its possible to add an implementation of iSim https://github.com/mqcomplab/bitbirch/blob/main/bitbirch.py

@c-w-feldmann c-w-feldmann marked this pull request as draft April 25, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants