You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 26, 2024. It is now read-only.
Using the process.extractOne and fuzz.ratio give different results in this case:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
stringToMatch = 'Florinia-SP'
possibleResults = ['São Bernado do Campo-SP', 'Florínea-SP']
print(fuzz.ratio(stringToMatch,possibleResults[0]))
print(fuzz.ratio(stringToMatch,possibleResults[1]))
print(process.extract(stringToMatch,possibleResults))
While the individual fuzz.ratio give correct results (41 for the lowest score and 82 for the highest score), the process.extract gives 86 for both of them.
Select the best match in a list or dictionary of choices.
Find best matches in a list or dictionary of choices, return a
list of tuples containing the match and its score. If a dictionary
is used, also returns the key for each match.
Arguments:
query: An object representing the thing we want to find.
choices: An iterable or dictionary-like object containing choices
to be matched against the query. Dictionary arguments of
{key: value} pairs will attempt to match the query against
each value.
processor: Optional function of the form f(a) -> b, where a is the query or
individual choice and b is the choice to be used in matching.
This can be used to match against, say, the first element of
a list:
lambda x: x[0]
Defaults to fuzzywuzzy.utils.full_process().
scorer: Optional function for scoring matches between the query and
an individual processed choice. This should be a function
of the form f(query, choice) -> int.
By default, fuzz.WRatio() is used and expects both query and
choice to be strings.
limit: Optional maximum for the number of elements returned. Defaults
to 5.
Returns:
List of tuples containing the match and its score.
If a list is used for choices, then the result will be 2-tuples.
If a dictionary is used, then the result will be 3-tuples containing
the key for each match.
For example, searching for 'bird' in the dictionary
{'bard': 'train', 'dog': 'man'}
may return
[('train', 22, 'bard'), ('man', 0, 'dog')]
They state, that the default scorer for process.extract is fuzz.WRatio, which will give different results than fuzz.ratio. If you want to use fuzz.ratio you can specify this using the scorer argument. Beside this fuzz.ratio does not preprocess strings before matching them, while process.extract does preprocess them by default using fuzzywuzzy.utils.full_process(). So if you want to have similar results to fuzz.ratio this behaviour should be disabled using the processor argument.
Using the process.extractOne and fuzz.ratio give different results in this case:
While the individual
fuzz.ratio
give correct results (41 for the lowest score and 82 for the highest score),the process.extract
gives 86 for both of them.teste.zip
The text was updated successfully, but these errors were encountered: