Skip to content

speed of JaroWinkler #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
reza1615 opened this issue Sep 19, 2020 · 4 comments
Closed

speed of JaroWinkler #22

reza1615 opened this issue Sep 19, 2020 · 4 comments

Comments

@reza1615
Copy link

JaroWinkler is slower than jellyfish's implementation. Also, the results are different.

%%timeit
a = 'book egwrhgr rherh'
b = 'fvdaabavvvvvadvdvavavadfsfsdafvvav book teee'

import jellyfish
jellyfish.jaro_winkler(a,b) 
# 3.97 µs ± 169 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# result > 0.35942760942760943

%%timeit
a = 'book egwrhgr rherh'
b = 'fvdaabavvvvvadvdvavavadfsfsdafvvav book teee'
from strsimpy.jaro_winkler import JaroWinkler
jarowinkler = JaroWinkler()
jarowinkler.distance(a,b)
# 69.8 µs ± 706 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# result > 0.6405723905723906
@luozhouyang
Copy link
Owner

I am not sure why the results are different, but strsimpy gets the same results with java-string-similarity/JaroWinklerTest.java:

>>> jarowinkler.similarity('My string', 'My tsring')
0.9740740740740741
>>> jarowinkler.similarity('My string', 'My ntrisg')
0.8962962962962963

jellyfish has both Python and C implementation of JaroWinkler. Which implementation did you use for comparision?

@reza1615
Copy link
Author

I used python implementation

@luozhouyang
Copy link
Owner

I know why the two results are different. jellyfish.jaro_winkler(a, b) calculate the similarity between a and b, but jarowinkler.distance(a,b) calculate the distance between a and b. If you use jarowinkler.similarity(a,b), you can get the same result.

@luozhouyang
Copy link
Owner

luozhouyang commented Sep 23, 2020

jellyfish use the C implementation of JaroWinkler as default. Here is the code in jellyfish/__init__.py

import warnings

try:
    from .cjellyfish import *  # noqa

    library = "C"
except ImportError:
    from ._jellyfish import *  # noqa

    library = "Python"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants