A web-based interface for Qwen image generation and editing models with LoRA support, multi-image compositing, and flexible pipeline management.
- Image Generation - Text-to-image generation with Qwen-Image models
- Image Editing - Instruction-based image editing with multiple model options
- Multi-Image Compositing - Combine multiple images into one scene (2509/2511 models)
- LoRA Support - Load LoRAs from HuggingFace or local files
- Preset System - Configurable LoRA presets for quick loading
- Pipeline Swapping - Memory-efficient GPU management for limited VRAM
- Job Queue - Background processing with progress tracking
| Model | Flag | Description |
|---|---|---|
| Qwen-Image | --generation-model original |
Original generation model |
| Qwen-Image-2512 | --generation-model 2512 |
Latest generation model (default) |
| Model | Flag | Description |
|---|---|---|
| Qwen-Image-Edit | --edit-model original |
Original edit model, supports Lightning LoRA |
| Qwen-Image-Edit-2509 | --edit-model 2509 |
Enhanced consistency, multi-image support |
| Qwen-Image-Edit-2511 | --edit-model 2511 |
Reduced drift, better geometric reasoning (default) |
# Clone the repository
git clone <repo-url>
cd qwen-image-studio
# Install dependencies
pip install torch torchvision
pip install diffusers transformers accelerate
pip install fastapi uvicorn python-multipart
pip install pillow
# Optional: For quantization support
pip install bitsandbytes# Start with defaults (2512 generation + 2511 edit)
python server.py
# Open http://localhost:8000 in your browserpython server.py [OPTIONS]
Model Selection:
--generation-model {original,2512} Generation model (default: 2512)
--edit-model {original,2509,2511} Edit model (default: 2511)
Memory Management:
--quantize Enable 4-bit quantization for reduced VRAM
--cpu-offload Enable CPU offloading (default: True)
--pipeline-swap Swap pipelines between CPU/GPU on demand
--keep-in-vram {generation,edit} Pin specific pipeline in VRAM
Multi-GPU:
--device DEVICE Default device: auto, cpu, cuda, cuda:0, cuda:1, etc.
--generation-device DEVICE Device for generation (e.g., cuda:0)
--edit-device DEVICE Device for edit (e.g., cuda:1)
--device-map Distribute model across all GPUs (model parallelism)
Pipeline Control:
--disable-generation Disable generation pipeline
--disable-edit Disable edit pipeline
Other:
--host HOST Host to bind to (default: 0.0.0.0)
--port PORT Port to bind to (default: 8000)
--max-pixels N Max pixels for editing (default: 1048576)
python server.py --pipeline-swap --keep-in-vram edit# Generation on GPU 0, Edit on GPU 1
python server.py --generation-device cuda:0 --edit-device cuda:1# Distribute each model across all available GPUs
python server.py --device-mappython server.py --edit-model original
# Then load Lightning LoRA from the UI for 8-step inferencepython server.py --device cpu --disable-generation# From a single .safetensors file (ComfyUI format)
python server.py --edit-model /path/to/model.safetensors
# From a local diffusers-format directory
python server.py --edit-model /path/to/model-directory/
# Specify pipeline type for custom models
python server.py --edit-model /path/to/model.safetensors --edit-model-type plusThe 2509 and 2511 edit models support combining multiple images:
- Switch to Edit mode in the UI
- Upload multiple images (drag & drop or click to add)
- Write a prompt describing how to combine them:
"The cat from the first image is sitting next to the dog from the second image in a sunny garden"
- Submit and wait for processing
Two strategies are available for systems with multiple GPUs:
Load generation and edit pipelines on different GPUs so both can remain in VRAM:
python server.py --generation-device cuda:0 --edit-device cuda:1| Pros | Cons |
|---|---|
| Both pipelines always ready | Requires 2+ GPUs with enough VRAM each |
| No swapping overhead | Each GPU must fit its full model |
| Simple to understand |
Best for: Systems with 2+ mid-range GPUs (e.g., 2x RTX 3090)
Distribute a single large model's layers across all available GPUs:
python server.py --device-map| Pros | Cons |
|---|---|
| Can fit very large models | Cross-GPU communication overhead |
| Uses all available VRAM | Higher latency per inference |
| Automatic layer distribution | More complex debugging |
Best for: Running models that don't fit on a single GPU
You can use --device-map with only one pipeline enabled:
# Distribute edit model across GPUs, disable generation
python server.py --device-map --disable-generationimport torch
print(f"GPUs available: {torch.cuda.device_count()}")
for i in range(torch.cuda.device_count()):
print(f" cuda:{i} - {torch.cuda.get_device_name(i)}")
print(f" Memory: {torch.cuda.get_device_properties(i).total_memory / 1024**3:.1f} GB")You can load custom edit models from local files instead of HuggingFace:
Load .safetensors files that contain all model weights in one file:
python server.py --edit-model /path/to/Qwen-Edit-Custom.safetensorsThe loader expects ComfyUI-style key prefixes:
model.diffusion_model.*→ transformertext_encoders.*→ text encodervae.*→ VAE
Load from a directory containing the standard diffusers model structure:
python server.py --edit-model /path/to/model-directory/For custom models, specify the pipeline architecture if auto-detection fails:
# Use QwenImageEditPlusPipeline (for 2509/2511-style models)
python server.py --edit-model /path/to/model.safetensors --edit-model-type plus
# Use QwenImageEditPipeline (for original-style models)
python server.py --edit-model /path/to/model.safetensors --edit-model-type originalFrom HuggingFace:
- Open "Manage LoRAs" panel
- Select "HuggingFace" source
- Enter repo ID (e.g.,
lightx2v/Qwen-Image-Lightning) - Optionally specify weight file name
- Click "Load LoRA"
From Local Files:
- Place
.safetensorsfiles in theloras/folder - Open "Manage LoRAs" panel
- Select "Local File" source
- Choose from available local LoRAs
- Click "Load"
Edit loras/presets.json to configure quick-load presets:
{
"presets": [
{
"name": "Lightning",
"source": "huggingface",
"repo_id": "lightx2v/Qwen-Image-Lightning",
"weight_name": "Qwen-Image-Lightning-8steps-V1.1.safetensors",
"pipeline": "edit",
"description": "8-step fast inference (for --edit-model original only)",
"recommended_steps": 8
}
]
}- Activate/Deactivate - Toggle LoRA without unloading
- Scale - Adjust LoRA strength (0.0 - 2.0)
- Unload - Remove LoRA from memory
| Method | Endpoint | Description |
|---|---|---|
| POST | /generate |
Submit generation job |
| POST | /edit |
Submit edit job (supports multiple images) |
| GET | /status/{job_id} |
Get job status |
| GET | /jobs |
List all jobs |
| GET | /queue |
Get queue info |
| Method | Endpoint | Description |
|---|---|---|
| POST | /lora/load |
Load a LoRA |
| POST | /lora/unload/{name} |
Unload a LoRA |
| POST | /lora/activate/{name} |
Activate a LoRA |
| POST | /lora/deactivate/{name} |
Deactivate a LoRA |
| GET | /lora/list |
List loaded LoRAs |
| GET | /lora/available |
List local LoRA files |
| GET | /lora/presets |
Get configured presets |
| Method | Endpoint | Description |
|---|---|---|
| GET | /system/info |
Server configuration and GPU stats |
qwen-image-studio/
├── server.py # FastAPI server
├── static/
│ └── index.html # Web UI
├── loras/
│ ├── presets.json # LoRA preset configuration
│ └── *.safetensors # Local LoRA files
├── generated_images/ # Output images
└── uploaded_images/ # Input images
- Enable
--pipeline-swapto swap models between CPU/GPU - Use
--quantizewith original edit model (note: incompatible with--pipeline-swap) - Reduce
--max-pixelsfor smaller image processing - Disable unused pipeline with
--disable-generationor--disable-edit
--quantizeworks with all models (generation and edit)- Quantizes both transformer and text encoder to 4-bit NF4
- Quantized models are pinned to CUDA and cannot use
--pipeline-swap - Requires
bitsandbytespackage installed
Multi-image editing requires --edit-model 2509 or --edit-model 2511. The original model only supports single images.
MIT - See Qwen-Image model cards for model-specific licensing.
- Qwen-Image by Alibaba
- Diffusers by Hugging Face