Skip to content

[Feature][RL]: Support Model weight offload, reload and sync model weight & Offload DIT cache #316

@wwwbby

Description

@wwwbby

🚀 The feature, motivation and pitch

verl is a production-ready RLHF framework that orchestrates distributed training (FSDP/Megatron) with high-throughput rollout generation via vLLM. Its core design relies on:

  • Ray-based vLLM rollout workers (VLLMRolloutActor) for parallel inference
  • Zero-copy weight synchronization from training workers to inference engines
  • DataProto batch protocol for bidirectional data transfer

Currently, verl works seamlessly with native vLLM for text models but cannot leverage vllm-omni's multi-modal/audio generation capabilities, blocking RLHF for Omni-series models (Qwen2.5-Omni, etc.).

vllm-omni, while powerful for distributed multi-modal inference, has several gaps that prevent direct integration, Currently only the weight from auto regressive stage that call back to vllm main stream support these feature.
For example in qwen2.5-omni: Only Thinker part currently support this feature. It is missing for Talker part.

Support for 1. Model weight offload, reload;2.sync model weight;3. Offload DIT cache in the vLLM-omni is required to achieve full multimodal inference capability.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions