-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clicking the "Color" field in the card editor causes the "Pinyin" field's coloring to incorrectly change #55
Comments
There is an interesting unit test, called anki-chinese-support-3/tests/test_behavior.py Lines 37 to 48 in 42af602
It's not clear to me that this ever was "correctly" fixed. Running a Git Blame on the test shows that it was added in jdlorimer/chinese-support-redux@7a88d51 which seems to be a hack to workaround the issue. |
This issue can be reproduced by adding the following new unit test to the def test_issue_55(self):
note = {'Hanzi': '可能', 'Pinyin': 'kěnéng'}
expected = (
'<span class="tone3">kě</span>'
'<span class="tone2">néng</span> '
'<!-- ke neng -->'
)
reformat_transcript(note, 'pinyin', 'pinyin')
self.assertEqual(note['Pinyin'], expected)
reformat_transcript(note, 'pinyin', 'pinyin')
self.assertEqual(note['Pinyin'], expected) When tested using
|
Based on the above comment, if we look at the fact that the anki-chinese-support-3/chinese/transcribe.py Lines 200 to 242 in 42af602
It seems like what this function does it attempt to use regular expressions to split the pinyin word Perhaps we can avoid attempting to split the pinyin based on a regular expression and instead always rely on the |
…e the transcription The original transcription uses the hanzi as a source-of-truth which will typically be more accurate than the `reformat_transcript()` function which uses a regular-expression approach to splitting the pinyin. Certain edge-cases, such as 可能, will have the original pinyin correctly split into "kě néng" but the `reformat_transcript()` function will incorrectly change it to "kěn éng" as that is also a valid pinyin string. We attempt to avoid this issue by never changing the pinyin field if it matches the original transcription. If it does not match the original transcription, the user probably intentionally changed it, so running `reformat_transcript()` is desired. See Gustaf-C#55 for more information.
Before this change, there are two different ways that the code interacts with `pinyin` in a note: 1. If the `pinyin` field is empty, it uses the `hanzi` field as a source-of-truth to generate the `pinyin` field. 2. If the `pinyin` field is non-empty, it takes the contents of `pinyin` and runs `reformat_transcript()` on it. The idea of this function is that it will update (split, colorize, etc) the `pinyin` field with new information (that the user provides). This function does the splitting using a regular-expression. We have observed a bug in this logic that occurs for some words, such as 可能 which initially see the correct pinyin populated ("kě néng") but have this pinyin incorrectly change ("kěn éng") as a result of running the `reformat_transcript()` function on them. This bug can occur for any pinyin in which there are multiple acceptable regular expression splits. Bug report: Gustaf-C#55 This commit attempts to put a bandaid over the issue by avoiding repopulating the pinyin field for words if the user did not change the original hanzi transcription. Unit tests and documentation are also included in the commit.
Before this change, there are two different ways that the code interacts with `pinyin` in a note: 1. If the `pinyin` field is empty, it uses the `hanzi` field as a source-of-truth to generate the `pinyin` field. 2. If the `pinyin` field is non-empty, it takes the contents of `pinyin` and runs `reformat_transcript()` on it. The idea of this function is that it will update (split, colorize, etc) the `pinyin` field with new information (that the user provides). This function does the splitting using a regular-expression. We have observed a bug in this logic that occurs for some words, such as 可能 which initially see the correct pinyin populated ("kě néng") but have this pinyin incorrectly change ("kěn éng") as a result of running the `reformat_transcript()` function on them. This bug can occur for any pinyin in which there are multiple acceptable regular expression splits. Bug report: Gustaf-C#55 This commit attempts to put a bandaid over the issue by avoiding repopulating the pinyin field for words if the user did not change the original hanzi transcription. Unit tests and documentation are also included in the commit.
Before this change, there are two different ways that the code interacts with `pinyin` in a note: 1. If the `pinyin` field is empty, it uses the `hanzi` field as a source-of-truth to generate the `pinyin` field. 2. If the `pinyin` field is non-empty, it takes the contents of `pinyin` and runs `reformat_transcript()` on it. The idea of this function is that it will update (split, colorize, etc) the `pinyin` field with new information (that the user provides). This function does the splitting using a regular-expression. We have observed a bug in this logic that occurs for some words, such as 可能 which initially see the correct pinyin populated ("kě néng") but have this pinyin incorrectly change ("kěn éng") as a result of running the `reformat_transcript()` function on them. This bug can occur for any pinyin in which there are multiple acceptable regular expression splits. Bug report: Gustaf-C#55 This commit attempts to put a bandaid over the issue by avoiding repopulating the pinyin field for words if the user did not change the original hanzi transcription. Unit tests and documentation are also included in the commit.
This is actually kind of tricky to work around since we cannot always rely on the I think that the simplest way to work-around the issue for now it to treat the Please find my implementation of this fix in #56 |
* Add comment backlinking `test_issue_78` to original issue * Avoid `reformat_transcript()` on original pinyin Before this change, there are two different ways that the code interacts with `pinyin` in a note: 1. If the `pinyin` field is empty, it uses the `hanzi` field as a source-of-truth to generate the `pinyin` field. 2. If the `pinyin` field is non-empty, it takes the contents of `pinyin` and runs `reformat_transcript()` on it. The idea of this function is that it will update (split, colorize, etc) the `pinyin` field with new information (that the user provides). This function does the splitting using a regular-expression. We have observed a bug in this logic that occurs for some words, such as 可能 which initially see the correct pinyin populated ("kě néng") but have this pinyin incorrectly change ("kěn éng") as a result of running the `reformat_transcript()` function on them. This bug can occur for any pinyin in which there are multiple acceptable regular expression splits. Bug report: Gustaf-C#55 This commit attempts to put a bandaid over the issue by avoiding repopulating the pinyin field for words if the user did not change the original hanzi transcription. Unit tests and documentation are also included in the commit. --------- Co-authored-by: Jacob Budzis <[email protected]>
Describe the bug
For certain words, such as 宾馆 and 可能, the extension will generate a correctly colored set of pinyin initially, but if the user clicks onto the "Color" field in the card (maybe other fields reproduce this too?), the pinyin coloring will incorrectly change. Please find two demonstration videos attached below.
Screen.Recording.2024-01-15.at.2.47.53.PM.mov
Screen.Recording.2024-01-15.at.2.45.30.PM.mov
Reproduction
Chinese Support 3
add-on and disable all other add-ons. Perform the standard setup procedures from the README document in this GitHub repo to add aChinese (Basic)
card-type to your Anki.Chinese (Basic)
card, enable Chinese Support V3, then copy-and-paste可能
into theHanzi
field. It should auto-populate the other card fields.Pinyin
HTML and verify that it is<span class="tone3">kě</span><span class="tone3">yǐ</span> <!-- ke yi -->
Pinyin
HTML expanded, click on theColor
field and observe that thePinyin
HTML incorrectly changes to<span class="tone3">kěn</span><span class="tone2">éng</span> <!-- ken eng -->
Expected behavior
The
Pinyin
does not change into an incorrect version of itself.Specs
The text was updated successfully, but these errors were encountered: