Skip to content

Handle empty or invalid training_parameters in HF trainer#2656

Open
Ayush-kathil wants to merge 1 commit intokubeflow:masterfrom
Ayush-kathil:master
Open

Handle empty or invalid training_parameters in HF trainer#2656
Ayush-kathil wants to merge 1 commit intokubeflow:masterfrom
Ayush-kathil:master

Conversation

@Ayush-kathil
Copy link
Copy Markdown

@Ayush-kathil Ayush-kathil commented Apr 17, 2026

Problem

Kubeflow/Katib jobs could pass empty or malformed training_parameters, causing json.decoder.JSONDecodeError during HuggingFace trainer startup.

Root cause

The example initialized TrainingArguments directly from json.loads(args.training_parameters) without validating the input shape or handling empty payloads.

Fix

Added a dedicated parse_training_args(raw: str) -> dict helper, defaulted the CLI argument to "{}", treated empty/None/whitespace as default TrainingArguments, logged raw and parsed config safely, and raised clear errors for invalid JSON or malformed keys.

Impact

Existing pipelines remain backward compatible, empty payloads now fall back cleanly, and bad configs fail fast with actionable messages instead of an opaque JSON decode crash.

Testing

Added pytest coverage for empty string, None, whitespace, invalid JSON, valid JSON, and malformed keys. Verified with python -m pytest test_hf_llm_training.py -q.

Closes: #2587

@github-actions
Copy link
Copy Markdown

🎉 Welcome to the Kubeflow Katib repo! 🎉

Thanks for opening your first PR! We're excited to have you onboard 🚀

Next steps:

Feel free to ask questions in the comments. Thanks again for contributing! 🙏

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

harden HuggingFace training_parameters parsing

Signed-off-by: Ayush-kathil <kathilshiva@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLMs Fine-Tuning Errors in llm worker pod with pytorch conatiner

1 participant