Skip to content

perf: replace df_translations with hashed environment lookup#2387

Merged
ddsjoberg merged 8 commits into
mainfrom
ds-optimize-translate-string
Apr 28, 2026
Merged

perf: replace df_translations with hashed environment lookup#2387
ddsjoberg merged 8 commits into
mainfrom
ds-optimize-translate-string

Conversation

@ddsjoberg
Copy link
Copy Markdown
Owner

Description

Replace the df_translations data frame with lst_translations, a list of hashed environments keyed by language. This changes translation lookups from O(n) row scans to O(1) hashed environment access.

Also bypass get_theme_element() in the translation functions by checking the theme environment directly via .get_language(), avoiding the eval_tidy() overhead on every call. Since the default language is English (no theme element set), this is the hot path.

Benchmark (per-call cost)

Path Old New Speedup
English default (100k calls) 0.777s 0.320s 2.4x
Non-English (10k calls) 0.208s 0.049s 4.2x

Changes

  • data-raw/internal_data.R: Build lst_translations (hashed environments) from the Excel source, replacing df_translations in sysdata.rda
  • R/utils-translations.R: Rewrite .translate_grab_one() to use environment lookup; add .get_language() to bypass get_theme_element() overhead
  • R/sysdata.rda: Regenerated

How to test

NOT_CRAN=true testthat::test_file('tests/testthat/test-theme_gtsummary.R')

Replace the df_translations data frame (O(n) row scan per lookup)
with lst_translations, a list of hashed environments keyed by
language. Each environment maps English strings to translations
with O(1) lookup. ~3x faster for translate_string().

Co-authored-by: Ona <no-reply@ona.com>
@ddsjoberg ddsjoberg force-pushed the ds-optimize-translate-string branch from 976d591 to b706582 Compare March 16, 2026 04:04
Co-authored-by: Ona <no-reply@ona.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 16, 2026

Performance Benchmark

Comparing main (2.5.0.9005) vs PR (2.5.0.9005)

Each benchmark runs 5 independent rounds. The change column shows the mean % difference (negative = faster).
The 95% CI column shows the confidence interval on the change. If the CI excludes 0%, the result is flagged as a real improvement (✅) or regression (❌).

Style functions (10k elements)

expression main pr change ci
style_number 2.3ms 2.3ms ➖ -0.8% [-2.9%, 1.2%]
style_number varying digits 2.4ms 2.4ms ➖ -0.6% [-2%, 0.8%]
style_sigfig 3.2ms 3.2ms ❌ +1.4% [0.2%, 2.7%]

Translation (10 strings per iteration)

expression main pr change ci
translate_string en 0ms 0ms ❌ +22.7% [21.7%, 23.7%]
translate_string es 0ms 0ms ❌ +33.8% [12.1%, 55.4%]

Pipeline benchmarks

expression main pr change ci
tbl_summary 1677.5ms 1667.9ms ➖ -0.5% [-2.9%, 1.8%]
tbl_hierarchical 19190.6ms 19460.8ms ❌ +1.4% [0%, 2.8%]

ddsjoberg and others added 5 commits March 16, 2026 04:35
Use Unicode escape (\u00B7) for the interpunct character and
enc2utf8() for translation strings so they are properly marked
as UTF-8 in sysdata.rda. Fixes snapshot mismatch on Windows where
the native encoding is not UTF-8.

Co-authored-by: Ona <no-reply@ona.com>
Add .get_language() that reads the theme environment directly instead
of going through get_theme_element() which calls eval_tidy() on every
invocation. The language element is always a plain string, so the
eval_tidy() overhead is unnecessary.

This speeds up the English (default) path by ~9% and reduces memory
allocation by ~89%.

Co-authored-by: Ona <no-reply@ona.com>
@ddsjoberg ddsjoberg marked this pull request as ready for review April 28, 2026 18:48
@ddsjoberg ddsjoberg merged commit 3ce19fd into main Apr 28, 2026
8 checks passed
@ddsjoberg ddsjoberg deleted the ds-optimize-translate-string branch April 28, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant