Add support for combining characters #4

claui · 2017-09-08T16:47:53Z

In Unicode, a combining character is a character which can be stacked on top of the character preceding it. For example:

Example

The character LATIN SMALL LETTER U (u) has the codepoint U+0075 assigned.
The character COMBINING DIAERESIS, which looks similar to the symbol ¨ but is actually a combining character, has the codepoint U+0308 assigned.
Writing both characters in sequence yields the letter ü, which looks just like ü but is actually two characters.

Impact

On a Mac, such combinations are very common, especially in filenames due to an oddity in the HFS+ filesystem.

The issue

In the applescript-json library however, the encodeString function always fails when the input contains a combining character.
In detail, the code assumes that inside the repeat with ch loop, ch will be always one single character, and its id property will always return an integer. However, in reality ch will contain more than one character if a combining character is involved. Because of that, the id property will return a list instead of an integer. The code is not prepared to handle the list, which triggers the error.

The fix

This PR fixes the issue by fetching id before doing the iteration. This yields a simple list of all codepoints in the entire input string, which can be iterated safely. I have also added a simple test case for the “u followed by ̈ ” scenario described above (including the expected JSON output, which would be u\u0308).

In Unicode, a combining character is a character which can be stacked on top of the character preceding it. For example: - The character LATIN SMALL LETTER U (`u`) has the codepoint U+0075 assigned. - The character COMBINING DIAERESIS, which looks similar to the symbol `¨` but is actually a combining character, has the codepoint U+0308 assigned. - Writing both characters in sequence yields the letter `ü`, which looks just like `ü` but is actually *two* characters. On a Mac, such combinations are very common, especially in filenames due to an oddity in the HFS+ filesystem. In the `applescript-json` library however, the `encodeString` function always fails when the input contains a combining character. In detail, the code assumes that inside the `repeat with ch` loop, `ch` will be always one single character, and its `id` property will always return an integer. However, in reality `ch` will contain more than one character if a combining character is involved. Because of that, the `id` property will return a list instead of an integer. The code is not prepared to handle the list, which triggers the error. This commit adds a simple test case for the “u followed by ̈” scenario described above. It also includes the expected JSON output, which would be `u\u0308`.

This commit fixes the bug described in the previous commit. The trick is to fetch the `id` property on the entire input string _before_ we iterate over it. That way, `id` returns a simple list of integer codepoints for the entire string, which can be iterated safely.

claui · 2017-09-08T16:56:15Z

Oh, and: apologies for overexplaining.
I just came across your slides, which indicate you probably know a lot more about denormalization and combining characters than I do. 😉

mgax · 2017-09-08T17:01:09Z

Wow. Thanks!

mgax · 2017-09-08T17:05:00Z

Oh, and: apologies for overexplaining.

No need! I had no idea how applescript deals with multiple codepoints for a character. And I'm actually surprised anybody is using this library, let alone is willing to send a PR :)

claui · 2017-09-08T17:34:38Z

And I'm actually surprised anybody is using this library, let alone is willing to send a PR :)

Not only do I use it, I feel it’s the best solution out there to make AppleScript write machine-readable things to standard output.

Right now, I’m finishing up an Alfred 3 workflow, which inexplicably failed the other night despite months of testing. Turns out one of my Terminal.app tabs was showing a directory name in decomposed form just as my AppleScript inspected that tab. That one letter then went on to crash your library.

Thank you for making this!

mgax · 2017-09-08T18:33:31Z

Ah, cool, glad to hear that! I made it a while ago to export playlists from iTunes. The idea was to get the data out of applescript-land as quickly as possible and use a sane language to work with it. :)

xilopaint · 2018-09-27T03:15:18Z

Hi @claui! I'm also trying to use applescript-json in an Alfred workflow to generate feedback in a Script Filter. Could you give me a hand in #5?

claui · 2018-09-28T07:49:32Z

@xilopaint Thanks but no, I can’t implement this right now.

xilopaint · 2018-09-28T12:10:36Z

No problem! In fact I didn't ask you to implement the feature, I just thought that you could know a workaround. Now I see someone seems had worked on the feature in #1.

claui added 2 commits September 8, 2017 18:08

mgax merged commit ad86475 into mgax:master Sep 8, 2017

claui deleted the combining-characters branch September 8, 2017 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for combining characters #4

Add support for combining characters #4

claui commented Sep 8, 2017

claui commented Sep 8, 2017

mgax commented Sep 8, 2017

mgax commented Sep 8, 2017

claui commented Sep 8, 2017

mgax commented Sep 8, 2017

xilopaint commented Sep 27, 2018 •

edited

Loading

claui commented Sep 28, 2018

xilopaint commented Sep 28, 2018

Add support for combining characters #4

Add support for combining characters #4

Conversation

claui commented Sep 8, 2017

Example

Impact

The issue

The fix

claui commented Sep 8, 2017

mgax commented Sep 8, 2017

mgax commented Sep 8, 2017

claui commented Sep 8, 2017

mgax commented Sep 8, 2017

xilopaint commented Sep 27, 2018 • edited Loading

claui commented Sep 28, 2018

xilopaint commented Sep 28, 2018

xilopaint commented Sep 27, 2018 •

edited

Loading