Commit ec32cf5
committed
fix(dict): Remove unsure corrections
The typo dictionary words.csv previously contained
a bunch of problematic entries such as:
abouta,about
algorithmi,algorithm
attachen,attach
shouldbe,should
Which resulted in wrong corrections if the following
spaces (indicated by ␣) were accidentally missed:
about␣a
algorithm␣i developed
attach␣en masse
should␣be
Many of these entries were introduced by taking entries from the
codespell-dict and removing corrections containing spaces (since typos
currently doesn't support them), e.g the codespell dictionary contains:
abouta->about a, about,
shouldbe->should, should be,
This commit updates `tests/verify.rs` to automatically remove
entries in the form of `{correction}{common_word},{correction}`,
where `{common_word}` is one of the 1000 most frequent English words.
The top-1000-most-frequent-words.csv file was generated by running:
curl https://norvig.com/ngrams/count_1w.txt \
| head -n1024 \
| awk '{print $1;}' \
| grep -vE '^([^ia]|al|re)$' \
> top-1000-most-frequent-words.csv1 parent 41ce6be commit ec32cf5
File tree
4 files changed
+1122
-369
lines changed- crates/typos-dict
- assets
- src
- tests
4 files changed
+1122
-369
lines changed
0 commit comments