chore(deps): upgrade mbridge from 0.15.1 to 310e8fb by Adiactive · Pull Request #1258 · inclusionAI/AReaL

Adiactive · 2026-04-24T22:55:02Z

Description

Upgrade mbridge from PyPI release 0.15.1 to commit 310e8fb on main (~120 upstream commits). Picks up new model architectures (Qwen3-VL dense/MoE w/ CP, Qwen3.5 + MTP, Qwen3 Omni MoE, InternVL 3.5, DeepSeek NPU), compatibility fixes for megatron-core 0.14+/0.15/0.16 and transformers v5, and several bug fixes around RoPE, weight export, and distributed checkpointing. See the linked issue for the full rationale.

Changes:

Update mbridge requirement in pyproject.toml and pyproject.vllm.toml
Regenerate uv.lock and uv.vllm.lock via scripts/uv_lock.sh (ran with uv 0.10.9)

Related Issue

Fixes #1257

Type of Change

♻️ Refactoring

Checklist

I have read the Contributing Guide
Pre-commit hooks pass (pre-commit run --all-files)
Relevant tests pass; new tests added for new functionality
Documentation updated (if applicable; built with ./docs/build_all.sh)
Branch is up to date with main
Self-reviewed via /review-pr command
This PR was created by a coding agent via /create-pr
This PR is a breaking change

Additional Context

Test Environment

Image: ghcr.io/inclusionai/areal-runtime:v1.0.3-vllm
Mbridge installed via uv pip install --no-deps git+https://github.com/ISEEKYAN/mbridge.git@310e8fb to keep the existing torch/TE pin intact

Tests Run

Test	Result	Notes
`tests/test_estimate_num_params.py`	✅ 3/3	Direct `mbridge.AutoBridge` API
`tests/test_megatron_engine.py`	✅ 3/4	DCP test pre-existing fail (see below)
`tests/test_megatron_engine_distributed.py`	⚠️ 4/7	TP / PP / VPP / MoE-DCP pass; 3 pre-existing fails
`tests/test_tree_training.py`	✅ 12/12	flex/triton × {fsdp, megatron, archon}
`tests/test_offload.py`	✅ 2/2	FSDP and Megatron offload paths
`tests/fp8/test_fp8_rmsnorm.py` + `test_fp8_bf16_comparison.py`	⚠️ 2/4	2 pre-existing fails (see below)

Pre-existing failures unrelated to this upgrade

test_dcp_save_load_weights (test_megatron_engine.py) — CheckpointingException: ShardedTensor.flattened_range is not supported. Known incompatibility between Megatron's distributed optimizer and dist-checkpointing format. Documented at cli_args.py:2177; workaround added in 7ca1fea0 for recovery. The test is @pytest.mark.slow and excluded from CI.
test_qwen3_dcp_save_load (test_megatron_engine_distributed.py) — same flattened_range issue (uses with_optim=True).
test_qwen3_context_parallel & test_qwen3moe_expert_parallel (test_megatron_engine_distributed.py) — KeyError: 'loss_mask' regression introduced by PR feat: CP-local loss and configurable CUDA memory profiling #1223 (d58cca56, merged 2026-04-23). The new CP-local loss path requires loss_mask in padded_mb, but the test's mock_input only provides input_ids / attention_mask. Adding loss_mask to the mock further uncovers a downstream unpack_sequence shape mismatch in the same CP-local path — both belong to PR feat: CP-local loss and configurable CUDA memory profiling #1223 follow-up work, out of scope for this PR.
test_fp8_bf16_gradient_comparison & test_fp8_bf16_partial_layers_comparison (tests/fp8/test_fp8_bf16_comparison.py) — TypeError: sft_loss_fn() got an unexpected keyword argument 'vocab_min_logits'. Caused by PR fix(tree_attn): Fix some bugs in tree training for FSDP and Megatron engines #889 (127b0264) adding vocab_min_logits=... to the engine's loss_fn call, but the fp8 test fixture's sft_loss_fn at tests/fp8/model_hooks.py:127 doesn't accept **kwargs.

Lock file diff

The lock files lose a number of wheel entries for non-x86_64 platforms (ppc64le, s390x, armv7l, riscv64, musllinux). Switching mbridge from a PyPI wheel to a git source — combined with the existing platform_machine == 'x86_64' marker on mbridge in our pyproject — lets uv prove those wheels are unreachable through the mbridge subtree and prune them. AReaL targets x86_64 Linux clusters, so there is no real platform coverage loss.

Upgrade mbridge to support more model architectures and improve compatibility with megatron-core 0.16.0.

gemini-code-assist

Code Review

This pull request updates the mbridge dependency in pyproject.toml and pyproject.vllm.toml to a specific git commit. The review feedback recommends using a full 40-character commit hash instead of a short SHA to ensure better security and reproducibility, and also suggests removing an extra space for consistency with other entries.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

garrett4wade

LGTM!

FYI in the future we will totally "mbridge" to "megatron-bridge" since it does not support transformers>=5.3.

chore(deps): upgrade mbridge from 0.15.1 to 310e8fb

1d5879f

Upgrade mbridge to support more model architectures and improve compatibility with megatron-core 0.16.0.

Adiactive requested review from fishcrap and garrett4wade as code owners April 24, 2026 22:55

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Comment thread pyproject.vllm.toml Outdated

garrett4wade and others added 4 commits April 25, 2026 13:43

Update pyproject.toml

a97b2b5

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update pyproject.vllm.toml

2ecd603

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'main' into feat/upgrade-mbridge

69eefbc

Merge branch 'main' into feat/upgrade-mbridge

99a76de

garrett4wade approved these changes Apr 25, 2026

View reviewed changes

garrett4wade merged commit 4629c4e into inclusionAI:main Apr 25, 2026
5 of 6 checks passed

This was referenced Apr 27, 2026

feat(engine): add Megatron support for Qwen2.5-VL #1281

Merged

feat(engine): add Qwen3-VL dense support to Megatron path #1299

Closed

Adiactive mentioned this pull request May 5, 2026

feat(engine): add Qwen3-VL dense and MoE support to Megatron path #1301

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): upgrade mbridge from 0.15.1 to 310e8fb#1258

chore(deps): upgrade mbridge from 0.15.1 to 310e8fb#1258
garrett4wade merged 5 commits intoinclusionAI:mainfrom
Adiactive:feat/upgrade-mbridge

Adiactive commented Apr 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

garrett4wade left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Adiactive commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

Additional Context

Test Environment

Tests Run

Pre-existing failures unrelated to this upgrade

Lock file diff

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adiactive commented Apr 24, 2026 •

edited

Loading