-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
System Info
Environment
• GPU: 8×H200
• Docker image: verlai/verl:vllm011.latest
----------Python Info----------
Version : 3.12.11
Compiler : GCC 11.4.0
Build : ('main', 'Jun 4 2025 08:56:18')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 25.2
Directory : /usr/local/lib/python3.12/dist-packages/pip
vllm : 0.11.0
sglang : not found.
ray : 2.49.2
torch : 2.8.0+cu128
----------verl Info-----------
Version : 0.8.0.dev
Directory : /root/verl/verl
Commit Hash : b53f0f1
----------Platform Info----------
Platform : Linux-5.10.134-013.5.kangaroo.al8.x86_64-x86_64-with-glibc2.35
system : Linux
node : dsw-318478-6bdc478975-5r8xm
release : 5.10.134-013.5.kangaroo.al8.x86_64
version : #1 SMP Thu Nov 20 02:46:27 UTC 2025
----------Environment----------
CUDA Runtime : 12.8
CUDA compiler : Not found: [Errno 2] No such file or directory: 'nvcc'
----------System Info----------
CPU Memory : 1800.00 GB
GPU Count : 8
GPU 1 Type : NVIDIA L20X
GPU 1 Memory : 140.40 GB
GPU 2 Type : NVIDIA L20X
GPU 2 Memory : 140.40 GB
GPU 3 Type : NVIDIA L20X
GPU 3 Memory : 140.40 GB
GPU 4 Type : NVIDIA L20X
GPU 4 Memory : 140.40 GB
GPU 5 Type : NVIDIA L20X
GPU 5 Memory : 140.40 GB
GPU 6 Type : NVIDIA L20X
GPU 6 Memory : 140.40 GB
GPU 7 Type : NVIDIA L20X
GPU 7 Memory : 140.40 GB
GPU 8 Type : NVIDIA L20X
GPU 8 Memory : 140.40 GB
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
With the same training script and hardware setup:
• Vanilla mbridge training loss converges normally
• Megatron-Bridge training loss does not converge
✅ Vanilla mbridge Setup (show progress clearly, mean reward increasing steadily)
With
megatron-core==0.15.0
and script
examples/grpo_trainer/run_qwen3_vl-30b-megatron.sh
The training is stable and normal
❌ Megatron-Bridge Setup (mean reward not increasing)
Apply the following changes:
Package Version Changes
• megatron-core package version → @ main
• megatron-bridge package version → use PR build:
https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/1943
In examples/grpo_trainer/run_qwen3_vl-30b-megatron.sh, add:
actor_rollout_ref.actor.megatron.vanilla_mbridge=False
Expected behavior
Vanilla mbridge Setup
With
megatron-core==0.15.0
