-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Hi, I have a question about UniWorld‑V2.
I noticed the paper “Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback” mentions they have released models called UniWorld‑Qwen‑Image‑Edit‑2509 and UniWorld‑FLUX.1‑Kontext‑Dev.
My question is:
For the best results reported in the paper (UniWorld-V2), is the backbone or pre-training weight based on Qwen-Image-Edit (i.e., you used Qwen-Edit as the base model for UniWorld-V2)?
When will the full UniWorld-V2 model weights be released?
If I want to train such a large model (20B + 7B) on 1024×1024 image-editing data pairs, using a single node of NVIDIA A100 ( 80 GB variant) — would that GPU memory be sufficient, or would I need a distributed multi-node setup or special memory techniques (e.g., full-shard data parallelism, gradient checkpointing, etc.)?
Thanks in advance for your clarification.