chore(deps): upgrade mbridge from 0.15.1 to 310e8fb#1258
Merged
garrett4wade merged 5 commits intoinclusionAI:mainfrom Apr 25, 2026
Merged
chore(deps): upgrade mbridge from 0.15.1 to 310e8fb#1258garrett4wade merged 5 commits intoinclusionAI:mainfrom
garrett4wade merged 5 commits intoinclusionAI:mainfrom
Conversation
Upgrade mbridge to support more model architectures and improve compatibility with megatron-core 0.16.0.
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the mbridge dependency in pyproject.toml and pyproject.vllm.toml to a specific git commit. The review feedback recommends using a full 40-character commit hash instead of a short SHA to ensure better security and reproducibility, and also suggests removing an extra space for consistency with other entries.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
garrett4wade
approved these changes
Apr 25, 2026
Collaborator
garrett4wade
left a comment
There was a problem hiding this comment.
LGTM!
FYI in the future we will totally "mbridge" to "megatron-bridge" since it does not support transformers>=5.3.
This was referenced Apr 27, 2026
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Upgrade
mbridgefrom PyPI release0.15.1to commit310e8fbonmain(~120 upstream commits). Picks up new model architectures (Qwen3-VL dense/MoE w/ CP, Qwen3.5 + MTP, Qwen3 Omni MoE, InternVL 3.5, DeepSeek NPU), compatibility fixes formegatron-core0.14+/0.15/0.16 andtransformersv5, and several bug fixes around RoPE, weight export, and distributed checkpointing. See the linked issue for the full rationale.Changes:
mbridgerequirement inpyproject.tomlandpyproject.vllm.tomluv.lockanduv.vllm.lockviascripts/uv_lock.sh(ran with uv0.10.9)Related Issue
Fixes #1257
Type of Change
Checklist
pre-commit run --all-files)./docs/build_all.sh)main/review-prcommand/create-prAdditional Context
Test Environment
ghcr.io/inclusionai/areal-runtime:v1.0.3-vllmuv pip install --no-deps git+https://github.com/ISEEKYAN/mbridge.git@310e8fbto keep the existing torch/TE pin intactTests Run
tests/test_estimate_num_params.pymbridge.AutoBridgeAPItests/test_megatron_engine.pytests/test_megatron_engine_distributed.pytests/test_tree_training.pytests/test_offload.pytests/fp8/test_fp8_rmsnorm.py+test_fp8_bf16_comparison.pyPre-existing failures unrelated to this upgrade
test_dcp_save_load_weights(test_megatron_engine.py) —CheckpointingException: ShardedTensor.flattened_range is not supported.Known incompatibility between Megatron's distributed optimizer and dist-checkpointing format. Documented atcli_args.py:2177; workaround added in7ca1fea0for recovery. The test is@pytest.mark.slowand excluded from CI.test_qwen3_dcp_save_load(test_megatron_engine_distributed.py) — sameflattened_rangeissue (useswith_optim=True).test_qwen3_context_parallel&test_qwen3moe_expert_parallel(test_megatron_engine_distributed.py) —KeyError: 'loss_mask'regression introduced by PR feat: CP-local loss and configurable CUDA memory profiling #1223 (d58cca56, merged 2026-04-23). The new CP-local loss path requiresloss_maskinpadded_mb, but the test'smock_inputonly providesinput_ids/attention_mask. Addingloss_maskto the mock further uncovers a downstreamunpack_sequenceshape mismatch in the same CP-local path — both belong to PR feat: CP-local loss and configurable CUDA memory profiling #1223 follow-up work, out of scope for this PR.test_fp8_bf16_gradient_comparison&test_fp8_bf16_partial_layers_comparison(tests/fp8/test_fp8_bf16_comparison.py) —TypeError: sft_loss_fn() got an unexpected keyword argument 'vocab_min_logits'. Caused by PR fix(tree_attn): Fix some bugs in tree training for FSDP and Megatron engines #889 (127b0264) addingvocab_min_logits=...to the engine'sloss_fncall, but the fp8 test fixture'ssft_loss_fnattests/fp8/model_hooks.py:127doesn't accept**kwargs.Lock file diff
The lock files lose a number of wheel entries for non-x86_64 platforms (
ppc64le,s390x,armv7l,riscv64,musllinux). Switchingmbridgefrom a PyPI wheel to a git source — combined with the existingplatform_machine == 'x86_64'marker onmbridgein our pyproject — letsuvprove those wheels are unreachable through the mbridge subtree and prune them. AReaL targets x86_64 Linux clusters, so there is no real platform coverage loss.