Skip to content

Conversation

@bt2901
Copy link
Contributor

@bt2901 bt2901 commented Aug 13, 2023

(see also: pytries/DAWG-Python#4 )

The Russian (е≈ё) and Ukrainian/Belorussian/Rusyn (г≈ґ) orthographies allow some optional letter usage, but it is generally limited to a single "marked" variation of an "unmarked" letter.

However, the orthographies of other Slavic languages require slightly more sophisticated logic for diacritic restoration. Czech has u/ú/ů and e/é/ě; Slovak has o/ó/ô, a/á/ä, and l/ĺ/ľ; Polish has z/ź/ż. The Interslavic constructed language even has three "marked" variations of a single letter: e/ę/ě/ė. (not to mention tonal transcription systems used for Southern Slavic languages)

My fix is relatively straightforward: allow each element of replace_chars to be a list instead of a single char. The performance penalty should be negligible and the backward compatibility should be unaffected.

(EDIT: I hoped everything to be validated by tests via CI, but apparently it doesn't work currently)

@insolor
Copy link
Member

insolor commented Aug 13, 2023

Also, it would be nice if you add tests for this functionality

@bt2901
Copy link
Contributor Author

bt2901 commented Aug 13, 2023

Added tests. Please note, however, that I hadn't yet tested my changes locally (I have no tools needed to build C++ Python extensions), so someone needs to run tests before merging (ideally, GitHub Actions; I'll look into it later)

@insolor
Copy link
Member

insolor commented Aug 14, 2023

CI works, it just needed apporoval to run for a first-time contributor (now I've changed the settings a bit).

Ignore 2.7 tests for now, probably I'll just turn off testing on 2.7.

Other tests really fall after your changes (I didn't check, if that were new tests, or some old tests are broken). You can use github codespaces to run tests not locally.

@insolor
Copy link
Member

insolor commented Aug 17, 2023

Ok, I'll merge it today

@insolor insolor changed the title allow for more flexible char substitutes Allow more flexible char substitutes Aug 17, 2023
@insolor insolor merged commit 227c958 into pymorphy2-fork:master Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants