-
Notifications
You must be signed in to change notification settings - Fork 239
[New Model]Bagel model(Diffusion Only) #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: wzliu <[email protected]>
Signed-off-by: wzliu <[email protected]>
Signed-off-by: wzliu <[email protected]>
Wzliu bagel dev
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
|
@natureofnature PTAL |
|
Sorry, I forgot to install pre-commit on the computer I used over the weekend.😂 |
| f"W={db_cache_config.max_warmup_steps}, " | ||
| ) | ||
|
|
||
| transformer = pipeline.language_model.model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we add self.transformer=self.language_model.model in bagel pipeline init_, can we just reuse the regular dit enabler enable_cache_for_dit?
Signed-off-by: princepride <[email protected]>
7360bf6 to
cb028e8
Compare
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: wzliu <[email protected]>
|
Hi, will the model be ready before 1230 release? |
|
I believe we can make it! |
Support bagel ar in vllm-omni
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
|
@natureofnature @princepride currently, can we have an e2e example for AR+DiT? |
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
|
Since Bagel's DiT component does not follow a traditional architecture, I am currently unable to implement the Cache DiT functionality for it. I have provided a more detailed explanation in this issue: vipshop/cache-dit#598 |
|
@hsliuustc0106 Can you help review it, I only kept the code for the diffusion part. |
definitely, please fix docs and precommit |
| od_config.model, | ||
| ) | ||
| od_config.tf_model_config = TransformerConfig.from_dict(tf_config_dict) | ||
| # Diffusers-style models expose `model_index.json` with `_class_name`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZJY0516 PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to support models that don‘t follow the standard diffusers file structure, we have to add specific handling logic here :(
|
Hi! This is a great PR — I went through Bagel’s adaptation process and the code in detail, but I still have a few questions that I’m unclear about: |
One concern I have is that the I2I in Bagel requires computing additional VAE KV Cache during the AR stage. This also needs to be based on stage tags during batch inference. I suspect that relying entirely on the multi-modal support in the vLLM might not be feasible, as I haven't seen any configurations for multiple vision modules in it yet. Please correct me if I'm wrong. @Isotr0py |
|
vipshop/cache-dit#598 I have provided a more intuitive implementation of Bagel DiT Attention in this issue. Specifically, Bagel computes a single step as follows: <vision_token_start> (AR weights computation) <image_token>*4096 (DiT weights computation) <vision_token_end> (AR weights computation) |
|
I noticed that the current PR does not implement MoT, only gen's attention. |
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
Simplify the logic: https://github.com/princepride/vllm-omni/blob/9bf1ef49033de8df9c6edf36f9af2b7a5d67013a/vllm_omni/diffusion/models/bagel/qwen2_navit.py#L356 |
Thank you for your answer. I am also very interested in this work at present. Could you add me on wechat for further communication? If you agree, you could send your wechat account to my email [email protected] |
…sion architecture. Signed-off-by: princepride <[email protected]>
ZJY0516
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, LGTM
|
@princepride please fix the doc build error |
it looks like you need to add init under bagel folder |
Signed-off-by: princepride <[email protected]>
Signed-off-by: princepride <[email protected]>
hsliuustc0106
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, looking forward to the follow-up PRs
|
@princepride please also submit a PR to vllm/recipe |
Purpose
Resolves #203
This PR introduces support for the Bagel model (
BAGEL-7B-MoT) invllm-omni.Specifically, it implements the
txt2imginference capability using theBagelPipeline.Subsequently, I will implement Bagel within the Model Executor. I plan to decompose the model into multiple stages: AR and DiT. The AR stage will directly utilize the implementation from the main repository, while the DiT stage will use the Model Executor's implementation. This approach will enable text2text, text2img, img2text, and img2img capabilities.
Test Plan
To verify the correctness of the implementation, a reproduction script was created to initialize the model and perform a simple text-to-image generation.
Test Script:
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.