Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the difference between typos and spellos #716

Open
hippietrail opened this issue Feb 19, 2025 · 1 comment
Open

On the difference between typos and spellos #716

hippietrail opened this issue Feb 19, 2025 · 1 comment

Comments

@hippietrail
Copy link
Contributor

Permission to speak freely (-:

I see the term "mild error" used but not defined in the codebase.

In fact it seems to miss the difference between spelling mistakes and typos.
Typos are when we know how to spell a word but omit/double/flip/change a letter or two.
Spelling mistakes are when we think a word or phrase is spelled a different way than what is actually standard.
(Sometimes each word is a phrase is a legit dictionary spelling, but the overall phrase is misspelled.)

I see a recurring pattern of false positives due to treating spelling mistakes as typos.
i.e. we're looking for random edit-distance changes between a specific set of standard phrases and what's in the document, but in reality the phrases in this set are restricted to very specific patterns like "hunger pain" for "hunger pang" etc.

Going by edit distance causes legit phrases from the document, that just happen to be one or two changes away from a change in this set of phrases, to be flagged as mistakes or at least "did you mean".

False positives reflect poorly on the quality of the grammar checking.
Worse, it will confuse people with poor English and lead to accepting suggestions that are wrong, resulting in worse English. (Spell checkers already do this.)

I would suggest instead having separate clear concepts of spellos vs typos.

Apologies if this sounded "ranty".

@elijah-potter
Copy link
Collaborator

Apologies if this sounded "ranty".

Not at all. Those rules really need another look. I believe the fundamental problem is that the edit distance check operates on a per-word basis, rather than for the whole phrase. I'm going to spend some time tinkering and get back to you with a more scalable solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants