Skip to content

Conversation

@HonestDeng
Copy link

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolve #314 , add support for MammothModa2 model https://github.com/bytedance/mammothmoda

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
For simplicity, most code of DiT stage is copied from https://github.com/bytedance/mammothmoda.
These code will be simplified and reviewd after the pipeline running
successfully.

Signed-off-by: HonestDeng <[email protected]>
because preview version of mammothmoda2 only use last hidden state

Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
@hsliuustc0106
Copy link
Collaborator

Hi, will the model be ready before 1230 release?

@HonestDeng
Copy link
Author

HonestDeng commented Dec 20, 2025

Yes.

The MammothModa2-Preview is combined Qwen25-VL(with extra gen-experts in MLP layers) with an DiT module for image generation. Now I have already implemented the Qwen25-VL part of MammothModa2-Preview by reusing vllm code, such as Qwen2Attention, Qwen2MLP, and we can takes text and image as input to generate text token.

Now I'm currently working on DiT parts. Hopefully I will finish DiT parts in this weekend and review my code before 1230.

I'm not quite familiar in supporting new models. If there is any problem in my code, please correct me. Thanks!

@hsliuustc0106
Copy link
Collaborator

Yes.

The MammothModa2-Preview is combined Qwen25-VL(with extra gen-experts in MLP layers) with an DiT module for image generation. Now I have already implemented the Qwen25-VL part of MammothModa2-Preview by reusing vllm code, such as Qwen2Attention, Qwen2MLP, and we can takes text and image as input to generate text token.

Now I'm currently working on DiT parts. Hopefully I will finish DiT parts in this weekend and review my code before 1230.

I'm not quite familiar in supporting new models. If there is any problem in my code, please correct me. Thanks!

the model seems quite similar to Qwen-Image strcuture with a qwen-vl for encoding and a DiT module for image generation.

@HonestDeng HonestDeng changed the title [WIP] add support for MammothModa2 model add support for MammothModa2 model Dec 27, 2025
@HonestDeng HonestDeng marked this pull request as ready for review December 27, 2025 08:20
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +20 to +24
ar_outputs = stage_list[source_stage_id].engine_outputs

dit_inputs: list[OmniTokensPrompt] = []
for ar_output, prompt in zip(ar_outputs, prompts):
addi_info = prompt["additional_information"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize prompts before zipping to avoid async crashes

In async orchestration, OmniStage.process_engine_inputs passes the original prompt object directly to custom processors; for multimodal requests this is typically a dict, so zip(ar_outputs, prompts) iterates dict keys/characters and the next line prompt["additional_information"] raises a type error or mismatches outputs. This breaks the MammothModa2 AR→DiT stage for async generation with a single prompt. Consider normalizing prompts to a list (as in other processors) before zipping.

Useful? React with 👍 / 👎.

@HonestDeng HonestDeng force-pushed the add-mammoth-moda2-support branch 2 times, most recently from 3457e15 to 7154ca5 Compare December 27, 2025 11:51
logger = init_logger(__name__)


class OmniGPUModelRunner(GPUModelRunner):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzhouam PTAL whether the changes are necessary


# Sequential initialization on the same device to avoid memory calculation errors
# when multiple instances start simultaneously
# For TP/PP/DP/SP, we need to lock ALL devices that will be used by this stage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need to make changes to omni_stage?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'll revert this.


scheduler: Scheduler | None = None
processes: list[mp.Process] | None = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not related to your PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this file is not related to my PR. The CI task(docs/readthedocs.org:vllm-omni) failed in this file, and the codex suggest my to modify this file. I'll revert the change in this file next commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just figured out why I modified this un-related file. It's an merge accident.

Three days ago, I tried to merge main branch to my working branch and there were some conflicts.

Maybe because my wrong operations, many modifiction by other people from main branch got into my branch.

I'll revert this accident modification.

@HonestDeng HonestDeng force-pushed the add-mammoth-moda2-support branch from 7154ca5 to 79022c9 Compare December 27, 2025 13:37
@hsliuustc0106
Copy link
Collaborator

do we have any test result for performance and accuracy?

@HonestDeng
Copy link
Author

do we have any test result for performance and accuracy?

I'm working on this.

@tzhouam
Copy link
Collaborator

tzhouam commented Dec 27, 2025

Could you briefly explain the motivation for refactoring the model runner? It looks like most of the changes are around preparing the additional information, but I’m not sure why this is necessary.

@tzhouam tzhouam self-requested a review December 27, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]: bytedance-research/MammothModa2-Preview

4 participants