Skip to content

Conversation

@LawJarp-A
Copy link
Contributor

Purpose

Add CPU offloading support for all diffusion models in vllm-omni. This enables memory-efficient inference by automatically moving model components (text encoder, DIT transformer, VAE) between CPU and GPU based on usage, reducing GPU memory requirements.

Key features:

  • Alternating offload strategy: offload DIT during text encoding, offload text encoder during DIT inference
  • Configurable via existing config flags (dit_cpu_offload, text_encoder_cpu_offload, vae_cpu_offload)
  • Enabled by default for all diffusion models

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md

@hsliuustc0106
Copy link
Collaborator

is there any similar features in diffusers? if any, please link related issues or PRs

Remove all manual .to(device) calls for CPU offloading from 8 pipeline files.
Device transfers are now handled automatically by the hook-based system.

Signed-off-by: Prajwal A <[email protected]>
@ZJY0516 ZJY0516 self-requested a review December 22, 2025 12:15
@ZJY0516
Copy link
Collaborator

ZJY0516 commented Dec 22, 2025

is there any similar features in diffusers? if any, please link related issues or PRs

https://huggingface.co/docs/diffusers/main/en/api/pipelines/overview#diffusers.DiffusionPipeline.enable_sequential_cpu_offload

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Dec 22, 2025

@LawJarp-A will write a RFC soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants