You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The typo dictionary words.csv previously contained
a bunch of problematic entries such as:
abouta,about
algorithmi,algorithm
attachen,attach
shouldbe,should
Which resulted in wrong corrections if the following
spaces (indicated by ␣) were accidentally missed:
about␣a
algorithm␣i developed
attach␣en masse
should␣be
Many of these entries were introduced by taking entries from the
codespell-dict and removing corrections containing spaces (since typos
currently doesn't support them), e.g the codespell dictionary contains:
abouta->about a, about,
shouldbe->should, should be,
This commit updates `tests/verify.rs` to automatically remove
entries in the form of `{correction}{common_word},{correction}`,
where `{common_word}` is one of the 1000 most frequent English words.
The top-1000-most-frequent-words.csv file was generated by running:
curl https://norvig.com/ngrams/count_1w.txt | head -n1024 | awk '{print $1;}' | grep -vE '^([^ia]|al|re)$' > top-1000-most-frequent-words.csv
0 commit comments