Fix prediction output readability and simplify inference installation by khazic · Pull Request #18 · google-research/metricx

khazic · 2026-03-25T09:46:45Z

Summary

This PR fixes a few practical issues around inference and installation:

keep requirements.txt inference-only and document mt-metrics-eval as an extra dependency for meta-evaluation
make prediction output JSONL human-readable for non-ASCII text
add a padding collator and safer per-device batch size handling in metricx23.predict
handle output_file paths without a parent directory in both predict scripts

Why

Directly installing requirements.txt currently pulls in a VCS dependency that is only needed for meta-evaluation, which makes inference setup more fragile than necessary.

Also, both predict scripts currently write JSON using the default ensure_ascii=True, so multilingual content is emitted as Unicode escape sequences. In practice this makes output files harder to inspect.

Finally, metricx23.predict was missing a padding collator, which can break batched inference on variable-length inputs, and its per-device batch size could become zero when the global batch size is smaller than the GPU count.

Validation

python3 -m py_compile metricx23/predict.py metricx24/predict.py

google-cla · 2026-03-25T09:47:02Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

fix: make prediction output readable and simplify inference install

48914c2

khazic force-pushed the fix/predict-install-and-output branch from f311c93 to 48914c2 Compare March 25, 2026 09:51

khazic added 12 commits March 26, 2026 09:54

feat: add batch MetricX scoring for chat outputs

b0976be

chore: add one-click MetricX scoring runner

ed7578a

chore: run MetricX scorer in foreground

9c5b70c

fix: pad variable-length batches during prediction

61c2fe0

feat: shard MetricX scoring across GPUs

3e0374a

perf: improve multi-gpu MetricX throughput

818fc14

fix: reorder scorer arguments for Python 3.10

8d5dbff

chore: tune MetricX runner defaults

039f118

feat: support resumable MetricX scoring

4461bb9

fix: restore full prompt in metricx source output

8e3887d

feat: switch MetricX runner to XXL model

32b5b5a

fix: reduce default metricx batch size

fc8496d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix prediction output readability and simplify inference installation#18

Fix prediction output readability and simplify inference installation#18
khazic wants to merge 13 commits intogoogle-research:mainfrom
khazic:fix/predict-install-and-output

khazic commented Mar 25, 2026 •

edited

Loading

Uh oh!

google-cla bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

khazic commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Uh oh!

google-cla bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

khazic commented Mar 25, 2026 •

edited

Loading