Handle empty or invalid training_parameters in HF trainer by Ayush-kathil · Pull Request #2656 · kubeflow/katib

Ayush-kathil · 2026-04-17T07:13:40Z

Problem

Kubeflow/Katib jobs could pass empty or malformed training_parameters, causing json.decoder.JSONDecodeError during HuggingFace trainer startup.

Root cause

The example initialized TrainingArguments directly from json.loads(args.training_parameters) without validating the input shape or handling empty payloads.

Fix

Added a dedicated parse_training_args(raw: str) -> dict helper, defaulted the CLI argument to "{}", treated empty/None/whitespace as default TrainingArguments, logged raw and parsed config safely, and raised clear errors for invalid JSON or malformed keys.

Impact

Existing pipelines remain backward compatible, empty payloads now fall back cleanly, and bad configs fail fast with actionable messages instead of an opaque JSON decode crash.

Testing

Added pytest coverage for empty string, None, whitespace, invalid JSON, valid JSON, and malformed keys. Verified with python -m pytest test_hf_llm_training.py -q.

Closes: #2587

github-actions · 2026-04-17T07:13:50Z

🎉 Welcome to the Kubeflow Katib repo! 🎉

Thanks for opening your first PR! We're excited to have you onboard 🚀

Next steps:

Our team will review your PR soon! cc @kubeflow/wg-automl-leads
Check out the Contributing Guide and the Kubeflow Contributor Guide
Join the Kubeflow Slack channels: https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels
Join the AutoML & Training WG meetings: https://bit.ly/2PWVCkV

Feel free to ask questions in the comments. Thanks again for contributing! 🙏

google-oss-prow · 2026-04-17T07:13:59Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

harden HuggingFace training_parameters parsing Signed-off-by: Ayush-kathil <kathilshiva@gmail.com>

google-oss-prow bot added the size/L label Apr 17, 2026

google-oss-prow bot requested review from Electronic-Waste, anencore94 and johnugeorge April 17, 2026 07:13

Ayush-kathil force-pushed the master branch from 5e5e1bd to 2f23b0f Compare April 17, 2026 07:16

Ayush-kathil mentioned this pull request Apr 17, 2026

LLMs Fine-Tuning Errors in llm worker pod with pytorch conatiner #2587

Open

fix(training-operator)

c1463df

harden HuggingFace training_parameters parsing Signed-off-by: Ayush-kathil <kathilshiva@gmail.com>

Ayush-kathil force-pushed the master branch from 2f23b0f to c1463df Compare April 17, 2026 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle empty or invalid training_parameters in HF trainer#2656

Handle empty or invalid training_parameters in HF trainer#2656
Ayush-kathil wants to merge 1 commit intokubeflow:masterfrom
Ayush-kathil:master

Ayush-kathil commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

google-oss-prow bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ayush-kathil commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Impact

Testing

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

google-oss-prow bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ayush-kathil commented Apr 17, 2026 •

edited

Loading