Replies: 1 comment
-
|
Update I was able to resolve the issue and successfully run the benchmark. The patch failure was due to a mismatch between the patch and the current main branch of the Primus repository. Pinning Primus to a compatible commit (instead of using main) allowed the patch to apply cleanly and the build to proceed. Everything is now working as expected on the NVIDIA H200 setup. Thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m trying to build and run the small_llm_moe_pretraining/primus benchmark from the MLCommons training repository on an NVIDIA H200 setup.
While building the Docker image using Dockerfile.nvidia, I’m encountering an issue where the following patch fails to apply:
patches/primus_evaluator.patch
Error :-
patch failed: primus/backends/megatron/training/evaluator.py
patch does not apply
The patch references:
index f7df2870..24d59cc7
Observations :-
The Dockerfile clones:
https://github.com/AMD-AIG-AIMA/Primus.git
and checks out main
It appears that the Primus main branch has evolved, and the patch is no longer compatible with the current codebase
Multiple hunks in evaluator.py fail to apply
Questions :-
Is there a specific commit or tag of Primus that this patch was designed for?
Has the patch been updated for newer versions of Primus?
Should we pin to a specific commit for reproducibility?
Beta Was this translation helpful? Give feedback.
All reactions