-
Notifications
You must be signed in to change notification settings - Fork 225
add support for MammothModa2 model #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add support for MammothModa2 model #336
Conversation
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
For simplicity, most code of DiT stage is copied from https://github.com/bytedance/mammothmoda. These code will be simplified and reviewd after the pipeline running successfully. Signed-off-by: HonestDeng <[email protected]>
because preview version of mammothmoda2 only use last hidden state Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
|
Hi, will the model be ready before 1230 release? |
|
Yes. The MammothModa2-Preview is combined Qwen25-VL(with extra gen-experts in MLP layers) with an DiT module for image generation. Now I have already implemented the Qwen25-VL part of MammothModa2-Preview by reusing vllm code, such as Now I'm currently working on DiT parts. Hopefully I will finish DiT parts in this weekend and review my code before 1230. I'm not quite familiar in supporting new models. If there is any problem in my code, please correct me. Thanks! |
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
the model seems quite similar to Qwen-Image strcuture with a qwen-vl for encoding and a DiT module for image generation. |
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ar_outputs = stage_list[source_stage_id].engine_outputs | ||
|
|
||
| dit_inputs: list[OmniTokensPrompt] = [] | ||
| for ar_output, prompt in zip(ar_outputs, prompts): | ||
| addi_info = prompt["additional_information"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize prompts before zipping to avoid async crashes
In async orchestration, OmniStage.process_engine_inputs passes the original prompt object directly to custom processors; for multimodal requests this is typically a dict, so zip(ar_outputs, prompts) iterates dict keys/characters and the next line prompt["additional_information"] raises a type error or mismatches outputs. This breaks the MammothModa2 AR→DiT stage for async generation with a single prompt. Consider normalizing prompts to a list (as in other processors) before zipping.
Useful? React with 👍 / 👎.
Signed-off-by: HonestDeng <[email protected]>
Signed-off-by: HonestDeng <[email protected]>
3457e15 to
7154ca5
Compare
vllm_omni/model_executor/models/mammoth_moda2/mammothmoda2_dit_layer/__init__.py
Show resolved
Hide resolved
| logger = init_logger(__name__) | ||
|
|
||
|
|
||
| class OmniGPUModelRunner(GPUModelRunner): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tzhouam PTAL whether the changes are necessary
|
|
||
| # Sequential initialization on the same device to avoid memory calculation errors | ||
| # when multiple instances start simultaneously | ||
| # For TP/PP/DP/SP, we need to lock ALL devices that will be used by this stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you need to make changes to omni_stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'll revert this.
|
|
||
| scheduler: Scheduler | None = None | ||
| processes: list[mp.Process] | None = None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not related to your PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this file is not related to my PR. The CI task(docs/readthedocs.org:vllm-omni) failed in this file, and the codex suggest my to modify this file. I'll revert the change in this file next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just figured out why I modified this un-related file. It's an merge accident.
Three days ago, I tried to merge main branch to my working branch and there were some conflicts.
Maybe because my wrong operations, many modifiction by other people from main branch got into my branch.
I'll revert this accident modification.
examples/offline_inference/mammothmodal2_preview/mammoth_moda2_image_summary.yaml
Show resolved
Hide resolved
7154ca5 to
79022c9
Compare
|
do we have any test result for performance and accuracy? |
I'm working on this. |
|
Could you briefly explain the motivation for refactoring the model runner? It looks like most of the changes are around preparing the additional information, but I’m not sure why this is necessary. |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Resolve #314 , add support for MammothModa2 model https://github.com/bytedance/mammothmoda
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)