Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling automatic spaces between CJK and Latin characters in LuaLaTeX #711

Open
BenjaminGalliot opened this issue Apr 23, 2024 · 2 comments

Comments

@BenjaminGalliot
Copy link

(Sorry for using English!)

Hello,

I'm currently working on an automatically generated document using LuaLaTeX. I've encountered an issue where there isn't an automatic space between CJK and Latin script characters, which affects the readability of mixed-language texts. For example:

中文
…français

The ellipsis (…) directly follows the Chinese characters without any space. I would like to have an automatic space inserted between CJK and Latin characters whenever a line break occurs between them.

Is there a known method or a recommended practice within LuaLaTeX environment to handle this spacing automatically? Any guidance or workaround to ensure proper spacing between these character sets would be greatly appreciated.

Thank you in advance for your help!

MWE:

\documentclass{article}
\RequirePackage[french]{babel}
\RequirePackage{ctex}
\babelprovide[import=zh-Hans]{cmn}
\setCJKfamilyfont{cmn}{AR PL UKai CN}
\setmainfont{EB Garamond}
\RenewDocumentCommand \CJKrmdefault {} {cmn}
\babelfont[french]{rm}{EB Garamond}
\NewDocumentCommand \scriptcjk {} {\ltjsetparameter{jacharrange={-1, +2, +3, -4, -5, +6, +7, -8, +9}}}
\NewDocumentCommand \scriptlatin {} {\ltjsetparameter{jacharrange={-1, -2, -3, -4, -5, +6, +7, -8, -9}}}
\NewDocumentCommand \tfra { m } {\foreignlanguage{french}{\scriptlatin#1}}
\NewDocumentCommand \tcmn { m } {\foreignlanguage{cmn}{\scriptcjk#1}}
\frenchsetup{og=«, fg=», AutoSpacePunctuation=true} % Can be turned off if necessary.
            
\begin{document}
\scriptlatin
\selectlanguage{french}

中文 …français  % Reference.

中文…français  % Expected.

中文
…français  % Not wanted.

中文\ 
…français  % Workaround, but I try to find better.

---------
% It should also work with commands around.

\tcmn{中文} \tfra{…français}

\tcmn{中文}\tfra{…français}

\tcmn{中文}
\tfra{…français}

\tcmn{中文}\
\tfra{…français}

---------
% Without punctuation.

\tcmn{中文} \tfra{français}

\tcmn{中文}\tfra{français}

\tcmn{中文}
\tfra{français}

\tcmn{中文}\
\tfra{français}

---------
% Various behaviours depending on punctuation?

中文 
\tfra{«français}

中文\ 
\tfra{«français}

中文 
\tfra{"français}

中文\ 
\tfra{"français}

中文 
\tfra{(français}

中文\ 
\tfra{(français}

中文 
\tfra{[français}

中文\ 
\tfra{[français}

中文 
\tfra{-français}

中文\ 
\tfra{-français}

中文 
\tfra{–français}

中文\ 
\tfra{–français}

中文 
\tfra{—français}

中文\ 
\tfra{—français}

\end{document}

Screenshot:
Screenshot_20240423_202707

My workaround of manually inserting spaces (\ ) is not ideal. I am looking for a more elegant solution that would automatically handle these spaces, particularly after a line break.

In addition to the main issue of spacing after line breaks, I've also noticed that the behavior changes depending on the punctuation used. Is it the intended behaviour? Is it possible to customize it?

Thank you very much.

@muzimuzhi
Copy link
Contributor

The reported behavior may be inherited from luatexja. I haven't checked.

@wangweixuan
Copy link

The behavior of line breaks is explained in §15.2 in LuaTeX-ja manual:

Considering these situations, handling of an end-of-line in LuaTeX-ja are as follows:

A character whose character code is \ltjlineendcomment is appended to an input line, before LuaTeX actually process it, if and only if the following three conditions are satisfied:

  1. The category code of \endlinechar is 5 (end-of-line).
  2. The category code of \ltjlineendcomment itself is 14 (comment).
  3. The input line matches the following “regular expression”: [...]

To avoid line breaks being treated as comments and ignored, you can do

\catcode\ltjlineendcomment=0

The different handling of punctuations is explained in §4.3:

It is not desirable that ‌xkanjiskip‌ is inserted into every boundary between JAchars and ALchars. For example, xkanjiskip‌ should not be inserted after opening parenthesis (e.g., compare “(あ” and “( あ”). LuaTeX-ja can control whether xkanjiskip‌ can be inserted before/after a character, by changing ‌jaxspmode‌ for JAchars and ‌alxspmode‌ parameters ALchars respectively.

[...]

The second argument preonly means that the insertion of ‌xkanjiskip‌ is allowed before this character,
but not after. the other possible values are postonly, allow, and inhibit.

The default settings include

\ltjsetparameter{jaxspmode={`“,preonly}}
\ltjsetparameter{jaxspmode={`”,postonly}}
\ltjsetparameter{jaxspmode={`—,inhibit}}% U+2014 EM DASH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants