Skip to content

fix: preserve multi-byte UTF-8 in text preprocessing (fixes #1705)#1820

Merged
krystophny merged 1 commit intomainfrom
fix/issue-1705-utf8-text-preprocessing
Apr 25, 2026
Merged

fix: preserve multi-byte UTF-8 in text preprocessing (fixes #1705)#1820
krystophny merged 1 commit intomainfrom
fix/issue-1705-utf8-text-preprocessing

Conversation

@krystophny
Copy link
Copy Markdown
Collaborator

Summary

  • Make preprocess_math_text and process_latex_in_text UTF-8 aware so multi-byte characters (superscripts, Greek letters, etc.) are copied as intact byte sequences rather than being processed byte-by-byte
  • Fixes spurious glyph appearing in suptitle when adjacent labels contain Unicode superscripts (e.g. x², x³)

Changes

  • src/text/fortplot_text_layout.f90: preprocess_math_text uses utf8_char_length to detect and copy multi-byte sequences intact
  • src/utilities/text/fortplot_latex_parser.f90: process_latex_in_text uses utf8_char_length to detect and copy multi-byte sequences intact
  • test/test_utf8_text_preprocessing.f90: new test covering UTF-8 preservation through the full preprocessing pipeline plus the exact issue scenario

Verification

Requirement: suptitle renders without spurious glyph

Command: pdftotext output/example/fortran/subplot_demo/subplot_1x3_demo.pdf -
Output excerpt: Polynomial Growth Comparison — clean text, no extra characters between "Growth" and "Comparison"

Requirement: Unicode superscripts in labels preserved

Command: fpm test --target test_utf8_text_preprocessing
Output:

 PASS: preprocess_math_text preserves UTF-8 characters
 PASS: process_latex_in_text preserves UTF-8 characters
 PASS: prepare_text_for_raster preserves UTF-8 characters
 PASS: 1x3 subplot with Unicode ylabel and ASCII suptitle renders without error
 All UTF-8 text preprocessing tests PASSED!

Requirement: artifact verification passes

Command: make verify-artifacts
Output: Artifact verification passed. (all checks green, including unicode_demo α and ω assertions)

Artifact paths

  • output/example/fortran/subplot_demo/subplot_1x3_demo.pdf — suptitle clean
  • output/example/fortran/subplot_demo/subplot_1x3_demo.png — suptitle clean
  • build/test/output/test_utf8_suptitle.png — test-generated artifact with Unicode ylabel + ASCII suptitle

The preprocess_math_text and process_latex_in_text functions processed
text byte-by-byte, breaking multi-byte UTF-8 sequences (e.g. superscript
² U+00B2 encoded as bytes C2 B2). Orphaned continuation bytes were
misinterpreted as new character starts, producing spurious glyphs in
rendered output.

Fix makes both functions UTF-8 aware: multi-byte characters are now
detected via utf8_char_length and copied as intact sequences rather
than processed byte-by-byte.

Adds test_utf8_text_preprocessing with coverage for:
- preprocess_math_text with superscripts, Greek letters, mixed ASCII/UTF-8
- process_latex_in_text with Unicode pass-through
- prepare_text_for_raster full pipeline
- 1x3 subplot rendering with Unicode ylabel + ASCII suptitle
@krystophny krystophny merged commit 5508ae1 into main Apr 25, 2026
5 checks passed
@krystophny krystophny deleted the fix/issue-1705-utf8-text-preprocessing branch April 25, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant