[New Model]Bagel model(Diffusion Only) #319

princepride · 2025-12-15T14:22:16Z

Purpose

Resolves #203

This PR introduces support for the Bagel model (BAGEL-7B-MoT) in vllm-omni.
Specifically, it implements the txt2img inference capability using the BagelPipeline.

Subsequently, I will implement Bagel within the Model Executor. I plan to decompose the model into multiple stages: AR and DiT. The AR stage will directly utilize the implementation from the main repository, while the DiT stage will use the Model Executor's implementation. This approach will enable text2text, text2img, img2text, and img2img capabilities.

Test Plan

To verify the correctness of the implementation, a reproduction script was created to initialize the model and perform a simple text-to-image generation.

Test Script:

from vllm_omni.entrypoints.omni_diffusion import OmniDiffusion

def main():
    model_path = "../models/BAGEL-7B-MoT"
    prompt = "A futuristic city skyline at twilight, cyberpunk style"
    pipeline = OmniDiffusion(model=model_path)
    result = pipeline.generate(prompt)
    output_file = "bagel_output.png"
    result.images[0].save(output_file)

if __name__ == "__main__":
    main()

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

Signed-off-by: wzliu <[email protected]>

Wzliu bagel dev

Signed-off-by: princepride <[email protected]>

hsliuustc0106 · 2025-12-16T00:55:23Z

@natureofnature PTAL

princepride · 2025-12-16T02:06:22Z

Sorry, I forgot to install pre-commit on the computer I used over the weekend.😂

SamitHuang · 2025-12-16T03:35:18Z

vllm_omni/diffusion/cache/cache_dit_backend.py

+        f"W={db_cache_config.max_warmup_steps}, "
+    )
+
+    transformer = pipeline.language_model.model


if we add self.transformer=self.language_model.model in bagel pipeline init_, can we just reuse the regular dit enabler enable_cache_for_dit?

Signed-off-by: princepride <[email protected]>

Signed-off-by: wzliu <[email protected]>

hsliuustc0106 · 2025-12-19T15:17:04Z

Hi, will the model be ready before 1230 release?

princepride · 2025-12-19T15:53:34Z

I believe we can make it!

Support bagel ar in vllm-omni

Signed-off-by: princepride <[email protected]>

hsliuustc0106 · 2025-12-21T08:14:21Z

@natureofnature @princepride currently, can we have an e2e example for AR+DiT?

Signed-off-by: princepride <[email protected]>

princepride · 2025-12-27T08:37:49Z

Since Bagel's DiT component does not follow a traditional architecture, I am currently unable to implement the Cache DiT functionality for it. I have provided a more detailed explanation in this issue: vipshop/cache-dit#598

princepride · 2025-12-27T11:59:52Z

@hsliuustc0106 Can you help review it, I only kept the code for the diffusion part.

hsliuustc0106 · 2025-12-27T12:42:06Z

@hsliuustc0106 Can you help review it, I only kept the code for the diffusion part.

definitely, please fix docs and precommit

docs/models/supported_models.md

vllm_omni/diffusion/models/bagel/bagel_transformer.py

vllm_omni/diffusion/models/bagel/modeling_utils.py

vllm_omni/diffusion/models/bagel/pipeline_bagel.py

vllm_omni/diffusion/models/bagel/utils.py

hsliuustc0106 · 2025-12-27T12:48:51Z

vllm_omni/entrypoints/omni_diffusion.py

-            od_config.model,
-        )
-        od_config.tf_model_config = TransformerConfig.from_dict(tf_config_dict)
+        # Diffusers-style models expose `model_index.json` with `_class_name`.


@ZJY0516 PTAL

If we want to support models that don‘t follow the standard diffusers file structure, we have to add specific handling logic here :(

vllm_omni/model_executor/stage_input_processors/bagel.py

vllm_omni/entrypoints/utils.py

zhangzef · 2025-12-27T13:50:34Z

Hi! This is a great PR — I went through Bagel’s adaptation process and the code in detail, but I still have a few questions that I’m unclear about:
1. From what I’ve observed, model_executor uses stage parallelism. However, Bagel’s understanding and generation parameters don’t seem to be easily decoupled into two independent stages for parallel execution. Would it be feasible to treat Bagel as a single stage during adaptation? If so, would that reduce the overall degree of parallelism and therefore increase the inference cost?
2. Is CacheDiT not applicable because Bagel’s special MoT architecture makes it incompatible? Also, does the Bagel diffusion branch that is about to be merged already use CacheDiT?
3. When you say “batch inference” here, do you mean the model directly computes over the entire batch in one forward pass, or is it referring to something else?

princepride · 2025-12-27T14:56:24Z

Hi! This is a great PR — I went through Bagel’s adaptation process and the code in detail, but I still have a few questions that I’m unclear about: 1. From what I’ve observed, model_executor uses stage parallelism. However, Bagel’s understanding and generation parameters don’t seem to be easily decoupled into two independent stages for parallel execution. Would it be feasible to treat Bagel as a single stage during adaptation? If so, would that reduce the overall degree of parallelism and therefore increase the inference cost? 2. Is CacheDiT not applicable because Bagel’s special MoT architecture makes it incompatible? Also, does the Bagel diffusion branch that is about to be merged already use CacheDiT? 3. When you say “batch inference” here, do you mean the model directly computes over the entire batch in one forward pass, or is it referring to something else?

I do not intend to implement stage parallelism within model_executor because such logic does not exist in Bagel. The current code is still quite messy 😂 as I am currently trying to integrate my code with @natureofnature's code. I will remove a lot of unnecessary content later.
That is correct. MoT causes architectural incompatibility. I reached out to the Cache-DiT [Feature] vLLM-Omni Bagel support but don't know how to use cache-dit, I need your help🙏 vipshop/cache-dit#598, and they suggested we implement it using transformers, but we do not support that currently.
Yes, I am referring to computing the entire batch in a single forward pass. I also hope that in the future Dynamic Stage Transitions [Feature]: Dynamic Stage Transitions #504, the entire batch can pass through the AR computation first, then, the I2T and T2T tasks can exit directly from the AR stage, while the remaining batch continues through the DiT stage to complete the I2I and T2I.

One concern I have is that the I2I in Bagel requires computing additional VAE KV Cache during the AR stage. This also needs to be based on stage tags during batch inference. I suspect that relying entirely on the multi-modal support in the vLLM might not be feasible, as I haven't seen any configurations for multiple vision modules in it yet. Please correct me if I'm wrong. @Isotr0py

princepride · 2025-12-27T15:04:06Z

vipshop/cache-dit#598 I have provided a more intuitive implementation of Bagel DiT Attention in this issue. Specifically, Bagel computes a single step as follows:

<vision_token_start> (AR weights computation) <image_token>*4096 (DiT weights computation) <vision_token_end> (AR weights computation)
=>
bidirectional attention.

zhangzef · 2025-12-27T15:15:31Z

I noticed that the current PR does not implement MoT, only gen's attention.

Signed-off-by: princepride <[email protected]>

princepride · 2025-12-27T15:53:24Z

I noticed that the current PR does not implement MoT, only gen's attention.

Simplify the logic: https://github.com/princepride/vllm-omni/blob/9bf1ef49033de8df9c6edf36f9af2b7a5d67013a/vllm_omni/diffusion/models/bagel/qwen2_navit.py#L356

zhangzef · 2025-12-28T07:19:32Z

Hi! This is a great PR — I went through Bagel’s adaptation process and the code in detail, but I still have a few questions that I’m unclear about: 1. From what I’ve observed, model_executor uses stage parallelism. However, Bagel’s understanding and generation parameters don’t seem to be easily decoupled into two independent stages for parallel execution. Would it be feasible to treat Bagel as a single stage during adaptation? If so, would that reduce the overall degree of parallelism and therefore increase the inference cost? 2. Is CacheDiT not applicable because Bagel’s special MoT architecture makes it incompatible? Also, does the Bagel diffusion branch that is about to be merged already use CacheDiT? 3. When you say “batch inference” here, do you mean the model directly computes over the entire batch in one forward pass, or is it referring to something else?

I do not intend to implement stage parallelism within model_executor because such logic does not exist in Bagel. The current code is still quite messy 😂 as I am currently trying to integrate my code with @natureofnature's code. I will remove a lot of unnecessary content later.

That is correct. MoT causes architectural incompatibility. I reached out to the Cache-DiT [Feature] vLLM-Omni Bagel support but don't know how to use cache-dit, I need your help🙏 vipshop/cache-dit#598, and they suggested we implement it using transformers, but we do not support that currently.

Yes, I am referring to computing the entire batch in a single forward pass. I also hope that in the future Dynamic Stage Transitions [Feature]: Dynamic Stage Transitions #504, the entire batch can pass through the AR computation first, then, the I2T and T2T tasks can exit directly from the AR stage, while the remaining batch continues through the DiT stage to complete the I2I and T2I.

One concern I have is that the I2I in Bagel requires computing additional VAE KV Cache during the AR stage. This also needs to be based on stage tags during batch inference. I suspect that relying entirely on the multi-modal support in the vLLM might not be feasible, as I haven't seen any configurations for multiple vision modules in it yet. Please correct me if I'm wrong. @Isotr0py

Thank you for your answer. I am also very interested in this work at present. Could you add me on wechat for further communication? If you agree, you could send your wechat account to my email [email protected]

…sion architecture. Signed-off-by: princepride <[email protected]>

ZJY0516

overall, LGTM

vllm_omni/diffusion/models/bagel/bagel_transformer.py

ZJY0516 · 2025-12-30T14:36:38Z

@princepride please fix the doc build error

hsliuustc0106 · 2025-12-31T01:40:36Z

docs/readthedocs.org:vllm-omni

it looks like you need to add init under bagel folder

Signed-off-by: princepride <[email protected]>

hsliuustc0106

lgtm, looking forward to the follow-up PRs

hsliuustc0106 · 2025-12-31T05:51:00Z

@princepride please also submit a PR to vllm/recipe

natureofnature and others added 6 commits December 12, 2025 15:51

try to support bagel diffusion

5165e84

Signed-off-by: wzliu <[email protected]>

try to fix bagel workflow

6cd6951

Signed-off-by: wzliu <[email protected]>

try to fix output

6b410ad

Signed-off-by: wzliu <[email protected]>

Merge pull request #2 from natureofnature/wzliu_bagel_dev

0f74329

Wzliu bagel dev

replace vllm_flash_attn

0adecc8

Signed-off-by: princepride <[email protected]>

can generate perfect image

90fb9fb

Signed-off-by: princepride <[email protected]>

princepride requested a review from hsliuustc0106 as a code owner December 15, 2025 14:22

ZJY0516 requested review from SamitHuang and ZJY0516 December 15, 2025 14:41

princepride added 2 commits December 15, 2025 15:07

Add Bagel model support and fix related issues

d9623c1

Signed-off-by: princepride <[email protected]>

add enable_cache_for_bagel

66c0e31

Signed-off-by: princepride <[email protected]>

david6666666 mentioned this pull request Dec 16, 2025

[Roadmap]: preparing for v0.12.0 release #165

Open

61 tasks

SamitHuang reviewed Dec 16, 2025

View reviewed changes

Fix pre-commit errors and update Bagel model code

cb028e8

Signed-off-by: princepride <[email protected]>

princepride force-pushed the bagel-model branch from 7360bf6 to cb028e8 Compare December 18, 2025 05:34

princepride and others added 3 commits December 18, 2025 05:58

fix seed bug

156fb31

Signed-off-by: princepride <[email protected]>

fix pre-commit bug

e97965d

Signed-off-by: princepride <[email protected]>

Support bagel ar in vllm-omni

76f18dd

Signed-off-by: wzliu <[email protected]>

princepride added 7 commits December 20, 2025 10:26

Merge pull request #3 from natureofnature/wzliu_bagel_dev

29f46c3

Support bagel ar in vllm-omni

adjust bagel end2end.py

deb7056

Signed-off-by: princepride <[email protected]>

remote useless code from diffusion bagel

3a619b5

Signed-off-by: princepride <[email protected]>

remote useless code from diffusion bagel

febd622

Signed-off-by: princepride <[email protected]>

remove duplicate code

1a32a04

Signed-off-by: princepride <[email protected]>

remove duplicate code

b893df0

Signed-off-by: princepride <[email protected]>

remove duplicate code

d86971e

Signed-off-by: princepride <[email protected]>

princepride added 2 commits December 27, 2025 08:19

remove bagel stage related code

768f912

Signed-off-by: princepride <[email protected]>

remove bagel stage related code

8d82ded

Signed-off-by: princepride <[email protected]>

hsliuustc0106 reviewed Dec 27, 2025

View reviewed changes

princepride added 3 commits December 27, 2025 15:17

remove bagel stage part code

47a284d

Signed-off-by: princepride <[email protected]>

adjust the code

88ec0e4

Signed-off-by: princepride <[email protected]>

simplify qwen2_navit MoTLayer load logic

9bf1ef4

Signed-off-by: princepride <[email protected]>

Refactored the Bagel file structure to align with the vLLM-omni diffu…

845cb18

…sion architecture. Signed-off-by: princepride <[email protected]>

ZJY0516 mentioned this pull request Dec 30, 2025

[Bug]: Optimizing the process logic of omni_diffusion #531

Open

1 task

ZJY0516 reviewed Dec 30, 2025

View reviewed changes

vllm_omni/diffusion/models/bagel/bagel_transformer.py Outdated Show resolved Hide resolved

vllm_omni/diffusion/models/bagel/bagel_transformer.py Outdated Show resolved Hide resolved

ZJY0516 requested a review from SamitHuang December 30, 2025 14:36

Merge branch 'main' into bagel-model

60d7f2b

hsliuustc0106 added the ready label to trigger buildkite CI label Dec 31, 2025

princepride added 2 commits December 31, 2025 02:53

Remove useless code

b745719

Signed-off-by: princepride <[email protected]>

add __init__ and use vllm_omni RotaryEmbedding

4207714

Signed-off-by: princepride <[email protected]>

princepride requested a review from ZJY0516 December 31, 2025 03:15

Merge branch 'main' into bagel-model

f096aeb

hsliuustc0106 approved these changes Dec 31, 2025

View reviewed changes

hsliuustc0106 merged commit 23bf317 into vllm-project:main Dec 31, 2025
7 checks passed

[New Model]Bagel model(Diffusion Only) #319

[New Model]Bagel model(Diffusion Only) #319

Uh oh!

Conversation

princepride commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Dec 16, 2025

Uh oh!

princepride commented Dec 16, 2025

Uh oh!

SamitHuang Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 19, 2025

Uh oh!

princepride commented Dec 19, 2025

Uh oh!

hsliuustc0106 commented Dec 21, 2025

Uh oh!

princepride commented Dec 27, 2025

Uh oh!

princepride commented Dec 27, 2025

Uh oh!

hsliuustc0106 commented Dec 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhangzef commented Dec 27, 2025

Uh oh!

princepride commented Dec 27, 2025

Uh oh!

princepride commented Dec 27, 2025

Uh oh!

zhangzef commented Dec 27, 2025

Uh oh!

princepride commented Dec 27, 2025

Uh oh!

zhangzef commented Dec 28, 2025

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ZJY0516 commented Dec 30, 2025

Uh oh!

hsliuustc0106 commented Dec 31, 2025

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

princepride commented Dec 15, 2025 •

edited

Loading