Skip to content

[BUG] Mean reward and acc increasing steadily on Qwen-VL-MoE-32B-A3B with vanilla mbridge setup but not with Megatron-Bridge on example scripts (8×H200) #5187

@sensimintel-master

Description

@sensimintel-master

System Info

Environment
• GPU: 8×H200
• Docker image: verlai/verl:vllm011.latest

----------Python Info----------
Version : 3.12.11
Compiler : GCC 11.4.0
Build : ('main', 'Jun 4 2025 08:56:18')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 25.2
Directory : /usr/local/lib/python3.12/dist-packages/pip
vllm : 0.11.0
sglang : not found.
ray : 2.49.2
torch : 2.8.0+cu128
----------verl Info-----------
Version : 0.8.0.dev
Directory : /root/verl/verl
Commit Hash : b53f0f1
----------Platform Info----------
Platform : Linux-5.10.134-013.5.kangaroo.al8.x86_64-x86_64-with-glibc2.35
system : Linux
node : dsw-318478-6bdc478975-5r8xm
release : 5.10.134-013.5.kangaroo.al8.x86_64
version : #1 SMP Thu Nov 20 02:46:27 UTC 2025
----------Environment----------
CUDA Runtime : 12.8
CUDA compiler : Not found: [Errno 2] No such file or directory: 'nvcc'
----------System Info----------
CPU Memory : 1800.00 GB
GPU Count : 8
GPU 1 Type : NVIDIA L20X
GPU 1 Memory : 140.40 GB
GPU 2 Type : NVIDIA L20X
GPU 2 Memory : 140.40 GB
GPU 3 Type : NVIDIA L20X
GPU 3 Memory : 140.40 GB
GPU 4 Type : NVIDIA L20X
GPU 4 Memory : 140.40 GB
GPU 5 Type : NVIDIA L20X
GPU 5 Memory : 140.40 GB
GPU 6 Type : NVIDIA L20X
GPU 6 Memory : 140.40 GB
GPU 7 Type : NVIDIA L20X
GPU 7 Memory : 140.40 GB
GPU 8 Type : NVIDIA L20X
GPU 8 Memory : 140.40 GB

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

With the same training script and hardware setup:
• Vanilla mbridge training loss converges normally
• Megatron-Bridge training loss does not converge

✅ Vanilla mbridge Setup (show progress clearly, mean reward increasing steadily)
With

megatron-core==0.15.0

and script

examples/grpo_trainer/run_qwen3_vl-30b-megatron.sh

The training is stable and normal

Image

❌ Megatron-Bridge Setup (mean reward not increasing)

Apply the following changes:
Package Version Changes
• megatron-core package version → @ main
• megatron-bridge package version → use PR build:

https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/1943

In examples/grpo_trainer/run_qwen3_vl-30b-megatron.sh, add:

actor_rollout_ref.actor.megatron.vanilla_mbridge=False

Image

Expected behavior

Vanilla mbridge Setup
With

megatron-core==0.15.0
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions