Skip to content

Conversation

@LifeJiggy
Copy link

@LifeJiggy LifeJiggy commented Oct 14, 2025

  • Add fix_memory_issue.py with comprehensive memory optimization
  • Implement chunked loading to reduce peak memory usage
  • Add CPU offload and xFormers memory efficient attention
  • Provide automatic device selection based on available memory
  • Include diagnostic tools for memory-related issues

Addresses the issue where loading Qwen-Image-Edit causes system memory
to spike and process to be killed due to insufficient RAM (128GB needed)."

@hsliuustc0106
Copy link

this feature looks great, would you please also submit a PR to vllm-omni?

@LifeJiggy
Copy link
Author

Thanks for the feedback!
Yes, I’m happy to work on a corresponding PR for vllm-omni.
I’ll review the current model loading and memory flow there and adapt the same chunked loading + offload strategy where applicable.
I’ll follow up with a PR once ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants