[Feature][RL]: Support Model weight offload, reload and sync model weight & Offload DIT cache

### 🚀 The feature, motivation and pitch

[verl](https://github.com/volcengine/verl) is a production-ready RLHF framework that orchestrates distributed training (FSDP/Megatron) with high-throughput rollout generation via vLLM. Its core design relies on:

- Ray-based vLLM rollout workers (VLLMRolloutActor) for parallel inference
- Zero-copy weight synchronization from training workers to inference engines
- DataProto batch protocol for bidirectional data transfer

Currently, verl works seamlessly with native vLLM for text models but cannot leverage vllm-omni's multi-modal/audio generation capabilities, blocking RLHF for Omni-series models (Qwen2.5-Omni, etc.).

vllm-omni, while powerful for distributed multi-modal inference, has several gaps that prevent direct integration, Currently only the weight from auto regressive stage that call back to vllm main stream support these feature.
For example in qwen2.5-omni: Only Thinker part currently support this feature. It is missing for Talker part.

Support for **1. Model weight offload, reload；2.sync model weight；3. Offload DIT cache**  in the vLLM-omni is required to achieve full multimodal inference capability.



### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature][RL]: Support Model weight offload, reload and sync model weight & Offload DIT cache #316

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature][RL]: Support Model weight offload, reload and sync model weight & Offload DIT cache #316

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions