-
Notifications
You must be signed in to change notification settings - Fork 250
Description
🚀 The feature, motivation and pitch
verl is a production-ready RLHF framework that orchestrates distributed training (FSDP/Megatron) with high-throughput rollout generation via vLLM. Its core design relies on:
- Ray-based vLLM rollout workers (VLLMRolloutActor) for parallel inference
- Zero-copy weight synchronization from training workers to inference engines
- DataProto batch protocol for bidirectional data transfer
Currently, verl works seamlessly with native vLLM for text models but cannot leverage vllm-omni's multi-modal/audio generation capabilities, blocking RLHF for Omni-series models (Qwen2.5-Omni, etc.).
vllm-omni, while powerful for distributed multi-modal inference, has several gaps that prevent direct integration, Currently only the weight from auto regressive stage that call back to vllm main stream support these feature.
For example in qwen2.5-omni: Only Thinker part currently support this feature. It is missing for Talker part.
Support for 1. Model weight offload, reload;2.sync model weight;3. Offload DIT cache in the vLLM-omni is required to achieve full multimodal inference capability.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.