add support for MammothModa2 model #336

HonestDeng · 2025-12-16T13:15:00Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolve #314 , add support for MammothModa2 model https://github.com/bytedance/mammothmoda

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: HonestDeng <[email protected]>

For simplicity, most code of DiT stage is copied from https://github.com/bytedance/mammothmoda. These code will be simplified and reviewd after the pipeline running successfully. Signed-off-by: HonestDeng <[email protected]>

because preview version of mammothmoda2 only use last hidden state Signed-off-by: HonestDeng <[email protected]>

Signed-off-by: HonestDeng <[email protected]>

hsliuustc0106 · 2025-12-19T15:16:31Z

Hi, will the model be ready before 1230 release?

HonestDeng · 2025-12-20T01:40:21Z

Yes.

The MammothModa2-Preview is combined Qwen25-VL(with extra gen-experts in MLP layers) with an DiT module for image generation. Now I have already implemented the Qwen25-VL part of MammothModa2-Preview by reusing vllm code, such as Qwen2Attention, Qwen2MLP, and we can takes text and image as input to generate text token.

Now I'm currently working on DiT parts. Hopefully I will finish DiT parts in this weekend and review my code before 1230.

I'm not quite familiar in supporting new models. If there is any problem in my code, please correct me. Thanks!

Signed-off-by: HonestDeng <[email protected]>

hsliuustc0106 · 2025-12-21T08:41:55Z

Yes.

The MammothModa2-Preview is combined Qwen25-VL(with extra gen-experts in MLP layers) with an DiT module for image generation. Now I have already implemented the Qwen25-VL part of MammothModa2-Preview by reusing vllm code, such as Qwen2Attention, Qwen2MLP, and we can takes text and image as input to generate text token.

Now I'm currently working on DiT parts. Hopefully I will finish DiT parts in this weekend and review my code before 1230.

I'm not quite familiar in supporting new models. If there is any problem in my code, please correct me. Thanks!

the model seems quite similar to Qwen-Image strcuture with a qwen-vl for encoding and a DiT module for image generation.

Signed-off-by: HonestDeng <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-27T08:23:12Z

vllm_omni/model_executor/stage_input_processors/mammoth_moda2.py

+    ar_outputs = stage_list[source_stage_id].engine_outputs
+
+    dit_inputs: list[OmniTokensPrompt] = []
+    for ar_output, prompt in zip(ar_outputs, prompts):
+        addi_info = prompt["additional_information"]


Normalize prompts before zipping to avoid async crashes

In async orchestration, OmniStage.process_engine_inputs passes the original prompt object directly to custom processors; for multimodal requests this is typically a dict, so zip(ar_outputs, prompts) iterates dict keys/characters and the next line prompt["additional_information"] raises a type error or mismatches outputs. This breaks the MammothModa2 AR→DiT stage for async generation with a single prompt. Consider normalizing prompts to a list (as in other processors) before zipping.

Useful? React with 👍 / 👎.

Signed-off-by: HonestDeng <[email protected]>

vllm_omni/model_executor/models/mammoth_moda2/mammothmoda2_dit_layer/__init__.py

hsliuustc0106 · 2025-12-27T12:54:34Z

vllm_omni/worker/gpu_model_runner.py

 logger = init_logger(__name__)


 class OmniGPUModelRunner(GPUModelRunner):


@tzhouam PTAL whether the changes are necessary

vllm_omni/inputs/preprocess.py

hsliuustc0106 · 2025-12-27T12:56:01Z

vllm_omni/entrypoints/omni_stage.py


    # Sequential initialization on the same device to avoid memory calculation errors
    # when multiple instances start simultaneously
-    # For TP/PP/DP/SP, we need to lock ALL devices that will be used by this stage


why you need to make changes to omni_stage?

Sorry, I'll revert this.

hsliuustc0106 · 2025-12-27T12:56:22Z

vllm_omni/diffusion/diffusion_engine.py


    scheduler: Scheduler | None = None
    processes: list[mp.Process] | None = None



this is not related to your PR

Yes, this file is not related to my PR. The CI task(docs/readthedocs.org:vllm-omni) failed in this file, and the codex suggest my to modify this file. I'll revert the change in this file next commit.

I have just figured out why I modified this un-related file. It's an merge accident.

Three days ago, I tried to merge main branch to my working branch and there were some conflicts.

Maybe because my wrong operations, many modifiction by other people from main branch got into my branch.

I'll revert this accident modification.

tests/entrypoints/openai/test_image_server.py

examples/offline_inference/mammothmodal2_preview/mammoth_moda2_image_summary.yaml

hsliuustc0106 · 2025-12-27T13:48:29Z

do we have any test result for performance and accuracy?

HonestDeng · 2025-12-27T13:50:08Z

do we have any test result for performance and accuracy?

I'm working on this.

tzhouam · 2025-12-27T14:47:08Z

Could you briefly explain the motivation for refactoring the model runner? It looks like most of the changes are around preparing the additional information, but I’m not sure why this is necessary.

HonestDeng added 23 commits December 16, 2025 21:08

register MammothModa2 model in registry.py

cdaef14

Signed-off-by: HonestDeng <[email protected]>

add code skeleton

f95173e

Signed-off-by: HonestDeng <[email protected]>

add skeleton of ar and dit stage

0c0b611

Signed-off-by: HonestDeng <[email protected]>

constructs ar model

59ba5a1

Signed-off-by: HonestDeng <[email protected]>

capture hidden states using hook

fb513ce

Signed-off-by: HonestDeng <[email protected]>

add input processors

7baa5e5

Signed-off-by: HonestDeng <[email protected]>

implement DiT stage

4f25a05

For simplicity, most code of DiT stage is copied from https://github.com/bytedance/mammothmoda. These code will be simplified and reviewd after the pipeline running successfully. Signed-off-by: HonestDeng <[email protected]>

remove code of capturing history hidden state

a68cdc0

because preview version of mammothmoda2 only use last hidden state Signed-off-by: HonestDeng <[email protected]>

delete redundant code

0e007c0

Signed-off-by: HonestDeng <[email protected]>

implement MammothModa2ARForConditionalGeneration using qwen2

b6c8802

Signed-off-by: HonestDeng <[email protected]>

delete useless entry

a3e28ad

Signed-off-by: HonestDeng <[email protected]>

Fix MammothModa2 processor/tokenizer in spawn workers

20a8a87

Signed-off-by: HonestDeng <[email protected]>

Fix AutoConfig mapping for Mammoth VL subconfigs

7a40266

Signed-off-by: HonestDeng <[email protected]>

Load config.json successfully

890ff4c

Signed-off-by: HonestDeng <[email protected]>

Add minimal Mammoth text token step debug script

0d535f6

Signed-off-by: HonestDeng <[email protected]>

Make Mammoth token-step script fail fast on missing vLLM platform

7371f98

Signed-off-by: HonestDeng <[email protected]>

Handle OmniOutput in Mammoth compute_logits

e653884

Signed-off-by: HonestDeng <[email protected]>

Fix MammothModa2 wrapper load_weights prefix and AR LM compat

8eab22b

Signed-off-by: HonestDeng <[email protected]>

Handle vLLM passing input_ids=None in Mammoth LM

e3b7a7b

Signed-off-by: HonestDeng <[email protected]>

Use omni AR worker in Mammoth token-step; fix logits and OmniOutput

392d683

Signed-off-by: HonestDeng <[email protected]>

Expose VL token ids on Mammothmoda2Config for mrope

299fe59

Signed-off-by: HonestDeng <[email protected]>

Add MammothModa2 Omni pipeline runner and text decode

7fd44f9

Signed-off-by: HonestDeng <[email protected]>

Add image input support to MammothModa2 Omni example

c889d5d

Signed-off-by: HonestDeng <[email protected]>

HonestDeng added 4 commits December 20, 2025 09:43

Add MammothModa2 unified entry + t2i pipeline scaffold

2a8081b

Signed-off-by: HonestDeng <[email protected]>

Limit MammothModa2 AR max_model_len to reduce KV cache

2ea2b78

Signed-off-by: HonestDeng <[email protected]>

Fix MammothModa2 MoE helper for 2D hidden_states

0c52878

Signed-off-by: HonestDeng <[email protected]>

Now we can generate image, but still bugs exist

0f56070

Signed-off-by: HonestDeng <[email protected]>

HonestDeng added 12 commits December 25, 2025 11:51

simplify code

30761cb

Signed-off-by: HonestDeng <[email protected]>

delete _build_dummy_mm_embeddings function

7369cc7

Signed-off-by: HonestDeng <[email protected]>

change Chinese comments to English

5f1d9b8

Signed-off-by: HonestDeng <[email protected]>

refactor example

d81375e

Signed-off-by: HonestDeng <[email protected]>

delete useless file and rename file

3e38344

Signed-off-by: HonestDeng <[email protected]>

delete useless ocnfig file

1e2d343

Signed-off-by: HonestDeng <[email protected]>

delete Chinese comment

752b2a3

Signed-off-by: HonestDeng <[email protected]>

examples: support multi-prompt t2i outputs

f8b5849

Signed-off-by: HonestDeng <[email protected]>

Merge upstream/main

0b71f18

Signed-off-by: HonestDeng <[email protected]>

fix bug in calling _build_model_kwargs_extra

85e6f66

Signed-off-by: HonestDeng <[email protected]>

examples: add MammothModa2 image summary

dbd18a9

Signed-off-by: HonestDeng <[email protected]>

avoid sampling gen token

397ae64

Signed-off-by: HonestDeng <[email protected]>

HonestDeng changed the title ~~[WIP] add support for MammothModa2 model~~ add support for MammothModa2 model Dec 27, 2025

HonestDeng marked this pull request as ready for review December 27, 2025 08:20

HonestDeng requested a review from hsliuustc0106 as a code owner December 27, 2025 08:20

chatgpt-codex-connector bot reviewed Dec 27, 2025

View reviewed changes

HonestDeng added 2 commits December 27, 2025 16:27

merge main brach

0aef6b6

Signed-off-by: HonestDeng <[email protected]>

compute generated_len in runner

79022c9

Signed-off-by: HonestDeng <[email protected]>

HonestDeng force-pushed the add-mammoth-moda2-support branch 2 times, most recently from 3457e15 to 7154ca5 Compare December 27, 2025 11:51

hsliuustc0106 reviewed Dec 27, 2025

View reviewed changes

HonestDeng force-pushed the add-mammoth-moda2-support branch from 7154ca5 to 79022c9 Compare December 27, 2025 13:37

run pre-commit

9f2377a

HonestDeng added 3 commits December 27, 2025 22:26

rename mammothmoda2_dit to mammothmoda2_dit_layer

39177ef

revert unrelated change

6d8326c

revert change

f0dbd06

tzhouam self-requested a review December 27, 2025 14:47

		logger = init_logger(__name__)


		class OmniGPUModelRunner(GPUModelRunner):


		scheduler: Scheduler \| None = None
		processes: list[mp.Process] \| None = None

add support for MammothModa2 model #336

Are you sure you want to change the base?

add support for MammothModa2 model #336

Conversation

HonestDeng commented Dec 16, 2025

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Dec 19, 2025

Uh oh!

HonestDeng commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Dec 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

HonestDeng Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

HonestDeng Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

HonestDeng Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Dec 27, 2025

Uh oh!

HonestDeng commented Dec 27, 2025

Uh oh!

tzhouam commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HonestDeng commented Dec 20, 2025 •

edited

Loading