-
Notifications
You must be signed in to change notification settings - Fork 189
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Motivation.
This live page describes the roadmap to v0.12.0 release of vllm-omni, which is in companion with vllm v0.12.0. We also list help wanted item as 🙋in areas that the committer group is seeking more dedicated contributions.
Proposed Change.
CI/CD
P0: E2E test
- online serving
- Qwen3-Omni [CI] Qwen3-Omni online test #257
- Z-Image&Qwen-Image DALL-E compatible image generation endpoint #292
- offline serving
- Qwen2.5-Omni [E2E] Add Qwen2.5-Omni model test with OmniRunner #168
- Qwen3-Omni [CI] Add Qwen3-omni offline UT #216
- Z-Image [CI] add diffusion ci #174
P1: UT/ST for the following models
- UT/ST for current and new features.
- CI workflow for NPU/Rocm/.etc.
- CI for wheel package compilation. [Feature]: CI for wheel package compilation #238 @congw729
- CI improvements [ci] Refactor CI files to use new CI pipeline generator #246
Model Support 🙋
P0:
- MiMo-Audio [New Model]: MiMo-Audio from Xiaomi #151
- HunyuanImage-3.0 [New Model]: Add HunYuanImage3.0 #42
- Bagel [New Model]: ByteDance-Seed/BAGEL-7B-MoT #203 [New Model]Bagel model(Diffusion Only) #319
P1:
- LongCat-Flash-Omni [New Model]: LongCat-Flash-Omni #213
- Step-Audio2 [New Model]:Step Audio 2 #271
- Step-Audio-EditX [New Model]: Step Audio EditX #272
- MammothModa2-Preview [New Model]: bytedance-research/MammothModa2-Preview #314
Docs Refinement
P0:
- vLLM-Omni main architecture update arch overview #258
- how to add a new model into vLLM-Omni @R2-Y @ywang96
- EntryPoints design @fake0fan
- AR module design @Gaohan123
- DiT module desgin @SamitHuang
- Cache acceleration & Attention backend @ZJY0516 @SamitHuang
Core 🙋
P0:
- Support streaming input and output for both offline and online inference. @fake0fan @Gaohan123
- streaming input [Feature] add session based streaming input support to v1 vllm#28973
- streaming output @Gaohan123
- Flexible and robust input processing for mixed modalities. (e.g. use_audio_in video) @ywang96
- Flexible output modality control and support vllm cli args for online serving [RFC]: Support parameter-controlled audio response via vllm serve for Qwen-Omni series #162 @Gaohan123
- Support single card deployment and memory profiling with auto gpu mem util. [RFC]: Automatic GPU Mem Utilization Tuning #160 @tzhouam
- Endpoints
- /v1/images/generation [Feature]: API - OpenAI API for image generation #197
- /v1/audio/speech [Feature]: Add openai entrypoint for /v1/audio/speech #218 add openai create speech endpoint #305
P1:
- Abstract request related state updates away from model implementation.
- Support async computation and communication across stages by chunks. @R2-Y [RFC]: Support async computation and communication across stages by chunks #268
Disaggregation
P0:
- Support basic OmniConnector for disaggregated stages within one node . [Feature] Omni Connector + ray supported #215
Mode:
P0:
- (EPD)G
P1:
- E(PD)G
- EPDG
Model adaptation:
- Bagel
- HunyuanImage-3.0
- Qwen3-Omni&LongCat-Omni
Hardware:
P0:
- plugin platform abstraction for multiple hardware registry.
Benchmark 🙋
- Implement vllm benchmark –omni for offline serving benchmarks comparing with HF [Benchmark] Benchmark Running Samples for Qwen3 Omni and Dataset Preparation #212
- support both online and offline benchmarks [RFC]: DiT models Performance benchmark(T2I/I2I/T2V/TI2V) #344
- t2i
- t2v
- i2v
- ti2v
vLLM alignment and verification: 🙋
P0:
P1:
- caching
- parallelism
- lora
- multimodal input processing
- PD/EPD disaggregation
Refactor 🙋
P0:
- vLLM 0.12.0 alignment after CI prepared [Rebase] Rebase to vllm 0.12.0 #335
- Stage configs and model implementation optimization and simplification. [RFC]: simplify model stage config for end users #74
P1:
- Simple and Unified init and running arguments setting for both offline and online inference. @tzhouam
- Unified implementation of stage_worker across offline, async online and multi-node. @Gaohan123
For diffusion supports, please check another independent issue #85
Feedback Period.
No response
CC List.
@Gaohan123 @ywang96 @Isotr0py @DarkLight1337 @david6666666 @ZJY0516
Any Other Things.
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
gcanlin, gongshaotian, erfgss, wjcwjc77, ZongWei-HUST and 2 morezengqingfu1442
Sub-issues
Metadata
Metadata
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed