[RFC]: CPU offloading support


### Motivation

While validating large diffusion models in vLLM-Omni (e.g., during [https://github.com/vllm-project/vllm-omni/pull/302](https://github.com/vllm-project/vllm-omni/pull/302)), model initialization and execution required CPU offloading even on high-memory GPUs (e.g., H100). 

More generally, CPU offloading also enables:
* Execution of larger models on constrained GPU setups
* Improved utilization of host memory for large or infrequently accessed components

---

### Proposed Change

Add a **hook-based CPU offloading mechanism** that integrates with vLLM-Omni’s existing `HookRegistry`.

* Introduce `CPUOffloadHook` (extends `ModelHook`)

  * Intercepts module execution via `new_forward()`
  * Performs per-forward device transfers (CPU ↔ GPU)
  * Supports coordinated offloading across multiple modules

* Introduce `CPUOffloadBackend`

  * Registers offload hooks on diffusion pipeline components (`text_encoder`, `transformer`, `VAE`, `image_encoder`)
  * Controlled via `OmniDiffusionConfig` flags (e.g., `dit_cpu_offload`, `text_encoder_cpu_offload`)
  * Initialized during `GPUWorker` pipeline construction

* No diffusion pipeline code changes required

  * Device placement handled entirely via hooks
  * Compatible with existing hook-based features (e.g., TeaCache)

Please refer https://github.com/vllm-project/vllm-omni/pull/405 for initial implementation

> The initial implementation prioritizes correctness and extensibility over performance optimizations.

---

### Feedback Period

Open for discussion.

---

### CC List

@SamitHuang @ZJY0516

---
### Before submitting a new issue... 
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: CPU offloading support #412

Motivation

Proposed Change

Feedback Period

CC List

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: CPU offloading support #412

Description

Motivation

Proposed Change

Feedback Period

CC List

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions