fix: preserve multi-byte UTF-8 in text preprocessing (fixes #1705) by krystophny · Pull Request #1820 · lazy-fortran/fortplot

krystophny · 2026-04-25T23:49:34Z

Summary

Make preprocess_math_text and process_latex_in_text UTF-8 aware so multi-byte characters (superscripts, Greek letters, etc.) are copied as intact byte sequences rather than being processed byte-by-byte
Fixes spurious glyph appearing in suptitle when adjacent labels contain Unicode superscripts (e.g. x², x³)

Changes

src/text/fortplot_text_layout.f90: preprocess_math_text uses utf8_char_length to detect and copy multi-byte sequences intact
src/utilities/text/fortplot_latex_parser.f90: process_latex_in_text uses utf8_char_length to detect and copy multi-byte sequences intact
test/test_utf8_text_preprocessing.f90: new test covering UTF-8 preservation through the full preprocessing pipeline plus the exact issue scenario

Verification

Requirement: suptitle renders without spurious glyph

Command: pdftotext output/example/fortran/subplot_demo/subplot_1x3_demo.pdf -
Output excerpt: Polynomial Growth Comparison — clean text, no extra characters between "Growth" and "Comparison"

Requirement: Unicode superscripts in labels preserved

Command: fpm test --target test_utf8_text_preprocessing
Output:

 PASS: preprocess_math_text preserves UTF-8 characters
 PASS: process_latex_in_text preserves UTF-8 characters
 PASS: prepare_text_for_raster preserves UTF-8 characters
 PASS: 1x3 subplot with Unicode ylabel and ASCII suptitle renders without error
 All UTF-8 text preprocessing tests PASSED!

Requirement: artifact verification passes

Command: make verify-artifacts
Output: Artifact verification passed. (all checks green, including unicode_demo α and ω assertions)

Artifact paths

output/example/fortran/subplot_demo/subplot_1x3_demo.pdf — suptitle clean
output/example/fortran/subplot_demo/subplot_1x3_demo.png — suptitle clean
build/test/output/test_utf8_suptitle.png — test-generated artifact with Unicode ylabel + ASCII suptitle

The preprocess_math_text and process_latex_in_text functions processed text byte-by-byte, breaking multi-byte UTF-8 sequences (e.g. superscript ² U+00B2 encoded as bytes C2 B2). Orphaned continuation bytes were misinterpreted as new character starts, producing spurious glyphs in rendered output. Fix makes both functions UTF-8 aware: multi-byte characters are now detected via utf8_char_length and copied as intact sequences rather than processed byte-by-byte. Adds test_utf8_text_preprocessing with coverage for: - preprocess_math_text with superscripts, Greek letters, mixed ASCII/UTF-8 - process_latex_in_text with Unicode pass-through - prepare_text_for_raster full pipeline - 1x3 subplot rendering with Unicode ylabel + ASCII suptitle

krystophny merged commit 5508ae1 into main Apr 25, 2026
5 checks passed

krystophny deleted the fix/issue-1705-utf8-text-preprocessing branch April 25, 2026 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve multi-byte UTF-8 in text preprocessing (fixes #1705)#1820

fix: preserve multi-byte UTF-8 in text preprocessing (fixes #1705)#1820
krystophny merged 1 commit intomainfrom
fix/issue-1705-utf8-text-preprocessing

krystophny commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Apr 25, 2026

Summary

Changes

Verification

Requirement: suptitle renders without spurious glyph

Requirement: Unicode superscripts in labels preserved

Requirement: artifact verification passes

Artifact paths

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant