Add CPU offloading support for all diffusion models #405

LawJarp-A · 2025-12-22T11:34:49Z

Purpose

Add CPU offloading support for all diffusion models in vllm-omni. This enables memory-efficient inference by automatically moving model components (text encoder, DIT transformer, VAE) between CPU and GPU based on usage, reducing GPU memory requirements.

Key features:

Alternating offload strategy: offload DIT during text encoding, offload text encoder during DIT inference
Configurable via existing config flags (dit_cpu_offload, text_encoder_cpu_offload, vae_cpu_offload)
Enabled by default for all diffusion models

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md

Signed-off-by: Azure User <[email protected]> Signed-off-by: Prajwal A <[email protected]>

hsliuustc0106 · 2025-12-22T11:49:27Z

is there any similar features in diffusers? if any, please link related issues or PRs

Remove all manual .to(device) calls for CPU offloading from 8 pipeline files. Device transfers are now handled automatically by the hook-based system. Signed-off-by: Prajwal A <[email protected]>

Signed-off-by: Prajwal A <[email protected]>

ZJY0516 · 2025-12-22T12:25:26Z

is there any similar features in diffusers? if any, please link related issues or PRs

https://huggingface.co/docs/diffusers/main/en/api/pipelines/overview#diffusers.DiffusionPipeline.enable_sequential_cpu_offload

ZJY0516 · 2025-12-22T12:25:47Z

@LawJarp-A will write a RFC soon

Add CPU offloading support for all diffusion models

a093f38

Signed-off-by: Azure User <[email protected]> Signed-off-by: Prajwal A <[email protected]>

LawJarp-A added 2 commits December 22, 2025 11:53

Remove manual CPU offload device transfers from pipeline files

dca9c57

Remove all manual .to(device) calls for CPU offloading from 8 pipeline files. Device transfers are now handled automatically by the hook-based system. Signed-off-by: Prajwal A <[email protected]>

add hook-based cpu offloading system

3924a83

Signed-off-by: Prajwal A <[email protected]>

ZJY0516 self-requested a review December 22, 2025 12:15

LawJarp-A mentioned this pull request Dec 22, 2025

[RFC]: CPU offloading support #412

Open

1 task

david6666666 mentioned this pull request Dec 23, 2025

[RFC]: DiT model and feature support enhancement #85

Open

54 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CPU offloading support for all diffusion models #405

Add CPU offloading support for all diffusion models #405

Uh oh!

LawJarp-A commented Dec 22, 2025

Uh oh!

hsliuustc0106 commented Dec 22, 2025

Uh oh!

ZJY0516 commented Dec 22, 2025

Uh oh!

ZJY0516 commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add CPU offloading support for all diffusion models #405

Are you sure you want to change the base?

Add CPU offloading support for all diffusion models #405

Uh oh!

Conversation

LawJarp-A commented Dec 22, 2025

Purpose

Uh oh!

hsliuustc0106 commented Dec 22, 2025

Uh oh!

ZJY0516 commented Dec 22, 2025

Uh oh!

ZJY0516 commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants