Skip to content

Conversation

@black4der
Copy link

@black4der black4der commented Oct 10, 2025

📄 中文版本

📝 描述

概述

本PR为Qwen Image模型添加了完整的LoRA(Low-Rank Adaptation)支持,使得用户能够在Nunchaku量化推理框架中使用LoRA微调模型,实现更灵活的模型定制和更低的内存占用。

主要功能

1. 新增 Qwen Image LoRA 模块 (nunchaku/lora/qwenimage/)

  • 格式转换 (diffusers_converter.py, nunchaku_converter.py)

    • Diffusers格式 ↔ Nunchaku格式的双向转换
    • 支持从完整模型权重提取LoRA参数
    • 自动处理权重映射和格式兼容性
  • LoRA合并 (compose.py)

    • 支持多个LoRA模型的组合
    • 可为每个LoRA设置不同的强度权重
    • 优化的合并算法,保证数值稳定性
  • 工具函数 (utils.py, packer.py)

    • 格式检测和验证
    • LoRA打包和解包工具
    • 权重管理辅助函数

2. 核心模型增强

  • Attention模块 (models/attention.py)

    • 添加LoRA支持的注意力层 (+73行)
    • 保持原有性能的同时支持动态LoRA加载
  • Linear层 (models/linear.py)

    • 实现LoRA兼容的线性层 (+69行)
    • 支持低秩矩阵分解和高效计算
  • Transformer (models/transformers/transformer_qwenimage.py)

    • 深度集成LoRA到Qwen Image Transformer架构 (+315行)
    • 支持层级LoRA应用和选择性加载

API示例

使用单个LoRA(可调节强度)

from nunchaku import NunchakuQwenImageTransformer2DModel

# 加载量化模型
transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "nunchaku-tech/nunchaku-qwen-image/svdq-int4_r32-qwen-image.safetensors"
)

# 加载并应用LoRA
transformer.update_lora_params("path/to/lora.safetensors")

# 调节LoRA强度(0.0到2.0)
transformer.set_lora_strength(0.8)

# 重置到原始模型(移除LoRA)
transformer.reset_lora()

使用多个组合LoRA(每个独立强度)

from nunchaku.lora.qwenimage import compose_lora

# 组合多个LoRA,每个设置不同强度
composed_lora = compose_lora([
    ("lora1.safetensors", 1.0),  # 第一个LoRA,强度1.0
    ("lora2.safetensors", 0.6),  # 第二个LoRA,强度0.6
])

# 应用组合后的LoRA(传递num_loras参数很重要!)
transformer.update_lora_params(composed_lora, num_loras=2)

# 注意:组合LoRA的强度已经预先烘焙,不要再调用set_lora_strength()

格式转换

from nunchaku.lora.qwenimage import to_nunchaku, to_diffusers

# Diffusers格式 → Nunchaku格式(单个LoRA)
lora_nunchaku = to_nunchaku("lora.safetensors", output_path="lora_nunchaku.safetensors")

# 组合LoRA → Nunchaku格式(跳过base merge)
composed = compose_lora([("lora1.safetensors", 0.8), ("lora2.safetensors", 0.6)])
lora_nunchaku = to_nunchaku(composed, output_path="composed.safetensors", skip_base_merge=True)

# Nunchaku格式 → Diffusers格式
lora_diffusers = to_diffusers("lora_nunchaku.safetensors", output_path="lora_diffusers.safetensors")

技术亮点

  • ✅ 支持safetensors格式的高效加载
  • ✅ 完全兼容Diffusers生态系统
  • ✅ 零拷贝的LoRA权重管理
  • ✅ 量化感知的LoRA计算
  • ✅ 支持多个LoRA组合和独立强度控制
  • ✅ 自动处理 .alpha 参数和各种命名约定(在to_diffuserscompose_lora中都正确处理)
  • ✅ 智能QKV融合和维度验证
  • ✅ 完整的类型注解和文档
  • num_loras参数支持**:修复组合LoRA的base model重复合并问题,确保Diffusers pipeline中正确使用多LoRA

代码统计

  • 新增文件: 6个
  • 修改文件: 3个
  • 总计: 9个文件,+2012行代码

📂 文件列表

新增文件:

  • nunchaku/lora/qwenimage/__init__.py - LoRA 模块接口
  • nunchaku/lora/qwenimage/compose.py - 多 LoRA 合并逻辑
  • nunchaku/lora/qwenimage/diffusers_converter.py - Diffusers 格式转换
  • nunchaku/lora/qwenimage/nunchaku_converter.py - Nunchaku 格式转换
  • nunchaku/lora/qwenimage/packer.py - 权重打包工具
  • nunchaku/lora/qwenimage/utils.py - 工具函数

修改文件:

  • nunchaku/models/attention.py - 添加 LoRA 更新方法 (+73行)
  • nunchaku/models/linear.py - 添加 LoRA 强度和缩放支持 (+69行)
  • nunchaku/models/transformers/transformer_qwenimage.py - 深度集成 LoRA (+315行)

测试

  • 单元测试已通过(通过 ComfyUI 功能测试)
  • 集成测试已通过(单个/多个 LoRA、ControlNet 组合测试)
  • 性能基准测试已完成(延迟加载和缓存优化验证)

兼容性

  • 向后兼容现有的Qwen Image推理代码
  • 不影响未使用LoRA的工作流程

⚠️ 已知限制

CPU Offload 兼容性(ComfyUI 集成)

部分 LoRA 文件(如 Qwen-Image-Lightning)对不同 transformer block 训练了不同的层,导致不同 block 之间的内部结构(rank)不一致。在 ComfyUI 环境中使用时,由于 QwenImage 使用 Python 层的 CPUOffloadManager 实现 CPU offload,而该管理器要求所有 block 具有完全相同的结构,因此这类 LoRA 无法与 CPU offload 同时使用

影响范围:仅影响 ComfyUI 中启用 CPU offload 的场景,不影响标准推理

症状

RuntimeError: The size of tensor a (128) must match the size of tensor b (192) at non-singleton dimension 1

解决方案

  1. 禁用 CPU offload(推荐,适合高 VRAM 用户)
  2. 使用所有 block 结构一致的 LoRA

技术原因

  • Flux 模型使用 C++ 层实现,每个 block 独立管理内存,支持 rank 不一致
  • QwenImage 使用 Python 层的 CPUOffloadManager,通过固定的 buffer blocks 进行参数复制,要求所有 block 结构一致
  • 未来可能通过 C++ 层实现来解决此限制

📚 相关Issue

Closes #[issue编号](如果有的话)


✅ 检查清单

  • 代码遵循项目编码规范
  • 添加了必要的文档和注释
  • 代码已在本地测试通过
  • 更新了相关文档
  • 功能测试已完成(通过 ComfyUI 集成测试)

📄 English Version

🎯 PR Title

feat: Add LoRA support for Qwen Image models

📝 Description

Overview

This PR adds comprehensive LoRA (Low-Rank Adaptation) support for Qwen Image models, enabling users to leverage LoRA fine-tuned models within Nunchaku's quantized inference framework for more flexible model customization and reduced memory footprint.

Key Features

1. New Qwen Image LoRA Module (nunchaku/lora/qwenimage/)

  • Format Conversion (diffusers_converter.py, nunchaku_converter.py)

    • Bidirectional conversion between Diffusers ↔ Nunchaku formats
    • Support for extracting LoRA parameters from full model weights
    • Automatic weight mapping and format compatibility handling
  • LoRA Composition (compose.py)

    • Support for combining multiple LoRA models
    • Configurable strength weights for each LoRA
    • Optimized merging algorithm ensuring numerical stability
  • Utility Functions (utils.py, packer.py)

    • Format detection and validation
    • LoRA packing and unpacking utilities
    • Weight management helper functions

2. Core Model Enhancements

  • Attention Module (models/attention.py)

    • Added LoRA-enabled attention layers (+73 lines)
    • Dynamic LoRA loading while maintaining original performance
  • Linear Layer (models/linear.py)

    • LoRA-compatible linear layer implementation (+69 lines)
    • Support for low-rank matrix decomposition and efficient computation
  • Transformer (models/transformers/transformer_qwenimage.py)

    • Deep integration of LoRA into Qwen Image Transformer architecture (+315 lines)
    • Support for layer-wise LoRA application and selective loading

API Examples

Using Single LoRA (Adjustable Strength)

from nunchaku import NunchakuQwenImageTransformer2DModel

# Load quantized model
transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "nunchaku-tech/nunchaku-qwen-image/svdq-int4_r32-qwen-image.safetensors"
)

# Load and apply LoRA
transformer.update_lora_params("path/to/lora.safetensors")

# Adjust LoRA strength (0.0 to 2.0)
transformer.set_lora_strength(0.8)

# Reset to original model (remove LoRA)
transformer.reset_lora()

Using Multiple Composed LoRAs (Independent Strengths)

from nunchaku.lora.qwenimage import compose_lora

# Compose multiple LoRAs with different strengths
composed_lora = compose_lora([
    ("lora1.safetensors", 1.0),  # First LoRA, strength 1.0
    ("lora2.safetensors", 0.6),  # Second LoRA, strength 0.6
])

# Apply composed LoRA (num_loras parameter is important!)
transformer.update_lora_params(composed_lora, num_loras=2)

# Note: Composed LoRA strengths are pre-baked, do not call set_lora_strength()

Format Conversion

from nunchaku.lora.qwenimage import to_nunchaku, to_diffusers

# Diffusers format → Nunchaku format (single LoRA)
lora_nunchaku = to_nunchaku("lora.safetensors", output_path="lora_nunchaku.safetensors")

# Composed LoRA → Nunchaku format (skip base merge)
composed = compose_lora([("lora1.safetensors", 0.8), ("lora2.safetensors", 0.6)])
lora_nunchaku = to_nunchaku(composed, output_path="composed.safetensors", skip_base_merge=True)

# Nunchaku format → Diffusers format
lora_diffusers = to_diffusers("lora_nunchaku.safetensors", output_path="lora_diffusers.safetensors")

Technical Highlights

  • ✅ Efficient loading with safetensors format support
  • ✅ Full compatibility with Diffusers ecosystem
  • ✅ Zero-copy LoRA weight management
  • ✅ Quantization-aware LoRA computation
  • ✅ Support for multiple LoRA composition with independent strength control
  • ✅ Automatic handling of .alpha parameters and various naming conventions (correctly handled in both to_diffusers and compose_lora)
  • ✅ Intelligent QKV fusion and dimension validation
  • ✅ Complete type annotations and documentation
  • num_loras Parameter Support**: Fixed composed LoRA base model double-merging issue, ensuring correct multi-LoRA usage in Diffusers pipelines

Code Statistics

  • New Files: 6
  • Modified Files: 3
  • Total: 9 files, +2012 lines of code

📂 File List

New Files:

  • nunchaku/lora/qwenimage/__init__.py - LoRA module interface
  • nunchaku/lora/qwenimage/compose.py - Multi-LoRA composition logic
  • nunchaku/lora/qwenimage/diffusers_converter.py - Diffusers format conversion
  • nunchaku/lora/qwenimage/nunchaku_converter.py - Nunchaku format conversion
  • nunchaku/lora/qwenimage/packer.py - Weight packing utilities
  • nunchaku/lora/qwenimage/utils.py - Utility functions

Modified Files:

  • nunchaku/models/attention.py - Added LoRA update methods (+73 lines)
  • nunchaku/models/linear.py - Added LoRA strength and scaling support (+69 lines)
  • nunchaku/models/transformers/transformer_qwenimage.py - Deep LoRA integration (+315 lines)

Testing

  • Unit tests passed (via ComfyUI functional testing)
  • Integration tests passed (single/multiple LoRA, ControlNet combination tests)
  • Performance benchmarks completed (lazy loading and caching optimization verified)

Compatibility

  • Backward compatible with existing Qwen Image inference code
  • No impact on workflows not using LoRA

⚠️ Known Limitations

CPU Offload Compatibility (ComfyUI Integration)

Some LoRA files (e.g., Qwen-Image-Lightning) train different layers for different transformer blocks, resulting in inconsistent internal structures (ranks) across blocks. When used in ComfyUI, since QwenImage uses a Python-based CPUOffloadManager for CPU offload, which requires all blocks to have identical structure, these LoRAs cannot be used with CPU offload simultaneously.

Scope: Only affects ComfyUI scenarios with CPU offload enabled, does not affect standard inference

Symptom:

RuntimeError: The size of tensor a (128) must match the size of tensor b (192) at non-singleton dimension 1

Solutions:

  1. Disable CPU offload (recommended for high-VRAM users)
  2. Use LoRAs with consistent block structures

Technical Reason:

  • Flux models use C++ implementation where each block independently manages memory, supporting rank inconsistencies
  • QwenImage uses Python-based CPUOffloadManager, which copies parameters through fixed buffer blocks, requiring all blocks to have identical structure
  • This limitation may be resolved in the future through C++ layer implementation

📚 Related Issues

Closes #[issue number] (if applicable)


✅ Checklist

  • Code follows project coding standards
  • Added necessary documentation and comments
  • Code tested locally and passes
  • Updated relevant documentation
  • Functional testing completed (via ComfyUI integration tests)

- Implemented `update_lora_params` method in `NunchakuFeedForward`, `NunchakuQwenAttention`, and `NunchakuQwenImageTransformerBlock` to handle LoRA weights.
- Added `set_lora_strength` method in `SVDQW4A4Linear` for dynamic adjustment of LoRA scaling.
- Enhanced `NunchakuQwenImageTransformer2DModel` with methods to update and restore original parameters for LoRA.
- Introduced handling for unquantized and quantized parts of the model in LoRA updates.
@black4der black4der changed the title feat: add LoRA support for attention and transformer models feat: Add LoRA and ControlNet support for Qwen Image models Oct 11, 2025
@m0rph3us1987
Copy link

m0rph3us1987 commented Oct 11, 2025

Merged this into my local Repo, both nunchaku and Comfy-UI nunchaku.

Some of my trained loras produce wrong results, as if the weights were applied twice. Some other work well.
Lowering the strength to something like .5 or .6 worked.
Also generating images at 1920x720 does not work at all.

In the last couple of days I also played around with Bluear7878's PR's,

nunchaku-PR
ComfyUI-nunchaku

With Bluear7878's changes everything seems to be working as expected, so the there might
be some issues,

@black4der
Copy link
Author

Merged this into my local Repo, both nunchaku and Comfy-UI nunchaku.

Some of my trained loras produce wrong results, as if the weights were applied twice. Some other work well. Lowering the strength to something like .5 or .6 worked. Also generating images at 1920x720 does not work at all.

In the last couple of days I also played around with Bluear7878's PR's,

nunchaku-PR ComfyUI-nunchaku

With Bluear7878's changes everything seems to be working as expected, so the there might be some issues,

Thanks a lot for testing and for your detailed feedback!

Regarding the issue where the LoRA weights seem to be applied twice — that might be because my code currently uses AMPLIFICATION_FACTOR = 2.0 to compensate for potential loss from W4A4 quantization. When I was testing my own LoRA, the results looked too weak without this amplification, so I kept it.

If others are finding that the LoRA effect is too strong, I’ll remove or adjust this compensation factor. So far, I’ve only trained one LoRA myself, so I don’t yet have a good sense of the overall strength balance.

As for the issue with generating 1920x720 images — I just tested it on my end, and it seems to be working correctly. I’m currently using the FP4 R128 model.

@m0rph3us1987
Copy link

You are welcome.

I would say the AMPLIFICATION_FACTOR = 2.0 might be to high, since the loras work fine without nunchaku, but who am I to judge? :)

Regarding the resolution I am using svdq-int4_r128-qwen-image-lightningv1.1-8steps, and when running at 1920x1080 or 1080x1920 i got a tensor size error in the ksampler step. Sorry do not have the exact error message anymore.

@black4der
Copy link
Author

You are welcome.

I would say the AMPLIFICATION_FACTOR = 2.0 might be to high, since the loras work fine without nunchaku, but who am I to judge? :)

Regarding the resolution I am using svdq-int4_r128-qwen-image-lightningv1.1-8steps, and when running at 1920x1080 or 1080x1920 i got a tensor size error in the ksampler step. Sorry do not have the exact error message anymore.

You can try using a model without the fused Lightning LoRA, and instead load the Lightning LoRA separately through the LoRA Loader node for testing.
I’ve been testing with a model that doesn’t include any fused LoRAs, and it works fine on my side — maybe give that a try to see if your LoRA runs normally.

- Added logic to remove '.default.' from PEFT-style naming for LoRA weights, ensuring compatibility with models trained using PEFT/Kohya.
- Updated key transformation to handle cases where '.default.weight' appears at the end of the key.
@GavChap
Copy link

GavChap commented Oct 11, 2025

Merged this into my local Repo, both nunchaku and Comfy-UI nunchaku.
Some of my trained loras produce wrong results, as if the weights were applied twice. Some other work well. Lowering the strength to something like .5 or .6 worked. Also generating images at 1920x720 does not work at all.
In the last couple of days I also played around with Bluear7878's PR's,
nunchaku-PR ComfyUI-nunchaku
With Bluear7878's changes everything seems to be working as expected, so the there might be some issues,

Thanks a lot for testing and for your detailed feedback!

Regarding the issue where the LoRA weights seem to be applied twice — that might be because my code currently uses AMPLIFICATION_FACTOR = 2.0 to compensate for potential loss from W4A4 quantization. When I was testing my own LoRA, the results looked too weak without this amplification, so I kept it.

If others are finding that the LoRA effect is too strong, I’ll remove or adjust this compensation factor. So far, I’ve only trained one LoRA myself, so I don’t yet have a good sense of the overall strength balance.

As for the issue with generating 1920x720 images — I just tested it on my end, and it seems to be working correctly. I’m currently using the FP4 R128 model.

I'd probably leave the amplification factor as 1.0 as weights can always be turned up, but turning them down if they're already amplified by 2 would mean you lose precision.

@black4der
Copy link
Author

Merged this into my local Repo, both nunchaku and Comfy-UI nunchaku.
Some of my trained loras produce wrong results, as if the weights were applied twice. Some other work well. Lowering the strength to something like .5 or .6 worked. Also generating images at 1920x720 does not work at all.
In the last couple of days I also played around with Bluear7878's PR's,
nunchaku-PR ComfyUI-nunchaku
With Bluear7878's changes everything seems to be working as expected, so the there might be some issues,

Thanks a lot for testing and for your detailed feedback!

Regarding the issue where the LoRA weights seem to be applied twice — that might be because my code currently uses AMPLIFICATION_FACTOR = 2.0 to compensate for potential loss from W4A4 quantization. When I was testing my own LoRA, the results looked too weak without this amplification, so I kept it.

If others are finding that the LoRA effect is too strong, I’ll remove or adjust this compensation factor. So far, I’ve only trained one LoRA myself, so I don’t yet have a good sense of the overall strength balance.

As for the issue with generating 1920x720 images — I just tested it on my end, and it seems to be working correctly. I’m currently using the FP4 R128 model.

I'd probably leave the amplification factor as 1.0 as weights can always be turned up, but turning them down if they're already amplified by 2 would mean you lose precision.

You can test it. When I adjusted AMPLIFICATION_FACTOR = 2 to 1, I found that the application intensity of lora is not enough. You can compare under what intensity 2 and 1 are correct exactly.

@GavChap
Copy link

GavChap commented Oct 11, 2025

You can test it. When I adjusted AMPLIFICATION_FACTOR = 2 to 1, I found that the application intensity of lora is not enough. You can compare under what intensity 2 and 1 are correct exactly.

I still stand by my original point, it's easier to get people to turn up a lora weight of 1.0 than it is to get everyone to turn it down to 0.5. The issue I've found with Qwen is that a lot of LoRAs are actually very weak acting anyway even on the base model, and when you add lightning into the mix it gets worse.

I've trained around 10 loras now on Qwen so I'm fairly up to speed with how to get them working correctly and what they should look like.

Findings are below with AMPLIFICATION_FACTOR = 2, all the same seed.

No lora - 30 steps / cfg 2 / euler / simple
image

JibMixQwen (extracted lora, 0.5 strength) - same sampler settings
This looks correct to me and is similar to what I got from the other lora PR
image

JibMixQwen (extracted lora, 1 strength) - same sampler settings
This is very overcooked and has started to get the grid coming through and the image has changed completely
image

Maybe your amplification factor is correct for FP4, but not for INT4?

I did this on a completely fresh copy of ComfyUI with only sageattention and your nunchaku pull requests as the custom nodes.

As an aside, the lightning lora only worked correctly at strength 0.5 as well. 1.0 was very cooked.

Lightning 1.1 bf16 @ 1.0 strength, 8 steps, 1.0 CFG - Very overcooked
image

Lightning 1.1 bf16 @ 0.5 strength, 8 steps, 1.0CFG - Perfect!
image

I then adjusted AMPLIFICATION_FACTOR to 1.0 AND RERAN THE LIGHTNING LORA

Lightning 1.1 bf16 @ 1.0 strength, 8 steps, 1.0CFG - Perfect!
image

EDIT: Another thing!

I've also noticed a marked increase in doubling of people in images when using non-square ratios, please test non-square ratios and 2Mpix sizes.

See my config below:

Checkpoint files will always be loaded safely.
Total VRAM 15948 MB, total RAM 128674 MB
pytorch version: 2.8.0+cu128
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4060 Ti : cudaMallocAsync
Using pytorch attention
Python version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
ComfyUI version: 0.3.64
ComfyUI frontend version: 1.27.10
[Prompt Server] web root: /home/me/TestComfyUI/venv/lib/python3.12/site-packages/comfyui_frontend_package/static
======================================== ComfyUI-nunchaku Initialization ========================================
Nunchaku version: 1.0.1
ComfyUI-nunchaku version: 1.0.1
ComfyUI-nunchaku 1.0.1 is not compatible with nunchaku 1.0.1. Please update nunchaku to a supported version in ['v1.0.0'].
=================================================================================================================

Import times for custom nodes:
   0.0 seconds: /home/me/TestComfyUI/custom_nodes/websocket_image_save.py
   1.4 seconds: /home/me/TestComfyUI/custom_nodes/ComfyUI-nunchaku

Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server

To see the GUI go to: http://0.0.0.0:8189

@GavChap
Copy link

GavChap commented Oct 11, 2025

Image doubling / Unusual placement issue.

Both of these images use the same seed, same sampler settings, same everything, both are two stages.

Prompt was only A woman standing in the middle of a large open room

Stage 1 has NO loras on it for the first 4 stesp, Stage 2 has just the 8 step lightning lora on it for the last 4 steps. These two images should be identical (more or less).

This PR
image

**The other PR for LoRAs #680 **
image

From my own experience, the second one is correct, Qwen will prefer to put objects in dead-center, and I've consistently seen this PR's version of nunchaku placing objects on the right when using wide ratios.

More examples of this PRs right-placement of objects all different seeds
image
image
image
image

@black4der
Copy link
Author

black4der commented Oct 12, 2025

Image doubling / Unusual placement issue.

Both of these images use the same seed, same sampler settings, same everything, both are two stages.

Prompt was only A woman standing in the middle of a large open room

Stage 1 has NO loras on it for the first 4 stesp, Stage 2 has just the 8 step lightning lora on it for the last 4 steps. These two images should be identical (more or less).

This PR image

**The other PR for LoRAs #680 ** image

From my own experience, the second one is correct, Qwen will prefer to put objects in dead-center, and I've consistently seen this PR's version of nunchaku placing objects on the right when using wide ratios.

More examples of this PRs right-placement of objects all different seeds image image image image

I’ve addressed the issue with AMPLIFICATION_FACTOR = 1 (LoRA strength) and the position ID generation that caused characters to
ComfyUI_temp_tguaz_00004_
be off-center, especially in certain resolutions. I tested the fix on a 1664x1216 resolution using the Qwen-Image-Lightning-8steps-V2.0 model with strength 1, seed 489237653520767, and 10 steps. The result is now a properly centered image.

- Changed the amplification factor from 2.0 to 1.0 to address quantization precision loss in W4A4 models.
@GavChap
Copy link

GavChap commented Oct 12, 2025

I’ve addressed the issue with AMPLIFICATION_FACTOR = 1 (LoRA strength) and the position ID generation that caused characters to ComfyUI_temp_tguaz_00004_ be off-center, especially in certain resolutions. I tested the fix on a 1664x1216 resolution using the Qwen-Image-Lightning-8steps-V2.0 model with strength 1, seed 489237653520767, and 10 steps. The result is now a properly centered image.

Thank you! I've done some retesting and it's perfect. :D

image

@m0rph3us1987
Copy link

Thank you for the fixes, works great now.

@ntiyachkon-design
Copy link

sorry for the noob question, im fairly new to comfyui, is there any tutorial on how to impliment this to my comfyui, im using the ComfyUI-Easy-Install build with the latest nunchaku 1.01

@dimitribarbot
Copy link

Hi,

Thank you for this PR.

I'm trying to get this code to work with a diffusers pipeline, but without success so far.
I'm using one of the Qwen-Image example in the example folder of this repository where I'm adding a call to transformer.update_lora_params with one LoRA or 2 composed LoRAs (after a call to your compose_lora function).

With one LoRA, I get this error:

python3: /nunchaku/src/kernels/zgemm/gemm_w4a4_launch_impl.cuh:482: static void nunchaku::kernels::GEMM_W4A4_Launch<Config, USE_FP4>::quantize_w4a4_act_fuse_lora(Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, bool, bool) [with Config = nunchaku::kernels::GEMMConfig_W4A4<true>; bool USE_FP4 = false]: Assertion `lora_down.shape[0] == N' failed.

With two composed LoRAs, I get an image full of noise.

Could you please provide an example of:

  • A use case with one LoRA, for which we may change its strength (using the transformer set_lora_strength function ?),
  • A use case with two composed LoRAs, each one having a dedicated strength,
  • A use case where we reset the transformer back to its original state (can we use the transformer restore_original_params function ? Or should we use the transformer reset_lora function ?) ?

I have the feeling that code from your ComfyUI PR (especially in the models/qwenimage.py file) should be moved to this repository, no?

- Added functionality to handle .alpha parameters for scaling lora_A weights.
- Extracted alpha values from tensors and applied scaling based on the rank of lora_A weights.
- Updated `update_lora_params` method to support multiple LoRA compositions, allowing for proper handling of composed LoRAs.
@black4der
Copy link
Author

Hi,

Thank you for this PR.

I'm trying to get this code to work with a diffusers pipeline, but without success so far. I'm using one of the Qwen-Image example in the example folder of this repository where I'm adding a call to transformer.update_lora_params with one LoRA or 2 composed LoRAs (after a call to your compose_lora function).

With one LoRA, I get this error:

python3: /nunchaku/src/kernels/zgemm/gemm_w4a4_launch_impl.cuh:482: static void nunchaku::kernels::GEMM_W4A4_Launch<Config, USE_FP4>::quantize_w4a4_act_fuse_lora(Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, bool, bool) [with Config = nunchaku::kernels::GEMMConfig_W4A4<true>; bool USE_FP4 = false]: Assertion `lora_down.shape[0] == N' failed.

With two composed LoRAs, I get an image full of noise.

Could you please provide an example of:

  • A use case with one LoRA, for which we may change its strength (using the transformer set_lora_strength function ?),
  • A use case with two composed LoRAs, each one having a dedicated strength,
  • A use case where we reset the transformer back to its original state (can we use the transformer restore_original_params function ? Or should we use the transformer reset_lora function ?) ?

I have the feeling that code from your ComfyUI PR (especially in the models/qwenimage.py file) should be moved to this repository, no?

Hello! Thank you for your feedback.

Regarding the issues you encountered:

  • Assertion error with a single LoRA: This may be an issue with the LoRA file itself. Please ensure the LoRA file is compatible with the model.
  • Noise issue with combined LoRAs: This occurs because the official update_lora_params function repeatedly merges the base model.

I have fixed the combined LoRAs issue in this PR:

  1. Added the num_loras parameter to the update_lora_params method.
  2. When using combined LoRAs, pass num_loras=2 (or more) — the system will automatically skip base model merging.

Usage examples have been added to the PR description, including:

  • Usage and strength adjustment of a single LoRA
  • Correct usage of multiple combined LoRAs (with the num_loras parameter)
  • Format conversion example

Please refer to the "API Examples" section in the updated PR description.

@dimitribarbot
Copy link

Thank you for your answer.

After further testing with your new version, it appears that my problem occurs whenever I use a lightning LoRA.

This is the full code I use:

import torch
from diffusers import QwenImagePipeline
from huggingface_hub import hf_hub_download
from nunchaku import NunchakuQwenImageTransformer2DModel


transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "nunchaku-tech/nunchaku-qwen-image/svdq-int4_r32-qwen-image.safetensors"
)

lora_path = hf_hub_download(
    repo_id="lightx2v/Qwen-Image-Lightning",
    filename="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
)

transformer.update_lora_params(lora_path)

pipeline = QwenImagePipeline.from_pretrained(
    "Qwen/Qwen-Image",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipeline.enable_model_cpu_offload()

generator = torch.Generator(device="cpu").manual_seed(42)

output_image = pipeline(
    prompt="GHIBSKY style painting, sign saying 'Flux Ghibsky'",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=generator,
).images[0]

output_image.save(f"qwen-image.png")

With this code, it crashes with the following error:

python3: /nunchaku/src/kernels/zgemm/gemm_w4a4_launch_impl.cuh:482: static void nunchaku::kernels::GEMM_W4A4_Launch<Config, USE_FP4>::quantize_w4a4_act_fuse_lora(Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, bool, bool) [with Config = nunchaku::kernels::GEMMConfig_W4A4<true>; bool USE_FP4 = false]: Assertion `lora_down.shape[0] == N' failed.

I also tried without CPU offloading just in case (as I saw it may not work with lightning LoRAs, but I think you're referring to offloading at the transformer level, not pipeline one). Since I have an RTX 4090, I can do without CPU offloading by using a quantized text encoder:

import torch
from diffusers import QwenImagePipeline
from transformers import Qwen2_5_VLForConditionalGeneration
from huggingface_hub import hf_hub_download
from nunchaku import NunchakuQwenImageTransformer2DModel


text_encoder = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "unsloth/Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit",
    dtype=torch.bfloat16,
    device_map="auto"
)

transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "nunchaku-tech/nunchaku-qwen-image/svdq-int4_r32-qwen-image.safetensors"
)

lora_path = hf_hub_download(
    repo_id="lightx2v/Qwen-Image-Lightning",
    filename="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
)

transformer.update_lora_params(lora_path)

pipeline = QwenImagePipeline.from_pretrained(
    "Qwen/Qwen-Image",
    text_encoder=text_encoder,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
).to("cuda")

generator = torch.Generator(device="cpu").manual_seed(42)

output_image = pipeline(
    prompt="GHIBSKY style painting, sign saying 'Flux Ghibsky'",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=generator,
).images[0]

output_image.save(f"qwen-image.png")

But the result is the same. I still have the previous error. Adding a call to set_lora_strength after update_lora_params gives the same result. Also, when using the lightning LoRA with another LoRA and using compose_lora, I get an image full of noise.

When using the other PR's branch, it's working and I get this image:

qwen-image-1

In addition to this issue, I noticed that, when using a LoRA that is different from a lightning LoRA, it works as if the LoRA had not been applied, even though there are no errors. For instance with this code:

import torch
from diffusers import QwenImagePipeline
from huggingface_hub import hf_hub_download
from nunchaku import NunchakuQwenImageTransformer2DModel


transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "nunchaku-tech/nunchaku-qwen-image/svdq-int4_r32-qwen-image.safetensors"
)

lora_path = hf_hub_download(
    repo_id="Raelina/Raena-Qwen-Image",
    filename="raena_qwen_image_lora_v0.1_diffusers_fix.safetensors",
)

transformer.update_lora_params(lora_path)

pipeline = QwenImagePipeline.from_pretrained(
    "Qwen/Qwen-Image",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipeline.enable_model_cpu_offload()

generator = torch.Generator(device="cpu").manual_seed(42)

output_image = pipeline(
    prompt="anime illustration of a girl with long black hair with blunt bangs and purple eyes, wearing a blue kimono with purple floral prints and a purple obi. She is looking at the viewer with a slight smile, standing outdoors at night. The background features a brightly lit food stand with lanterns and a blurred figure in the distance. The girl is positioned slightly to the right, with a three-quarter view from the front. The scene has a festival atmosphere, with warm yellow and orange lights from the lanterns. The viewing angle is slightly below eye level, focusing on her upper body.",
    negative_prompt=" ",
    width=1024,
    height=1408,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=generator,
).images[0]

output_image.save(f"qwen-image.png")

I get the following image:

qwen-image-with-lora

But without applying any LoRA, I get:

qwen-image-no-lora

And when using the other PR's branch, I get:

qwen-image-2

This last image seems more in line with the expected result when using this LoRA.

Do you experience the same behaviour, or do you think it's a configuration issue on my end?

Finally, I would also like to add that there appears to be a small error on import of nunchaku/models/attention.py, line 141.

Shouldn't it be:

from .linear import SVDQW4A4Linear

Instead of:

from ..linear import SVDQW4A4Linear

?

This line could even be removed completely, since there is already an identical import at the top of the file, right? (line 11)

@GavChap
Copy link

GavChap commented Oct 13, 2025

Thank you for your answer.

After further testing with your new version, it appears that my problem occurs whenever I use a lightning LoRA.
This line could even be removed completely, since there is already an identical import at the top of the file, right? (line 11)

I've just tested with the ComfyUI version of this PR with the lora from your example and it seems to be working as expected. Both use the lightning lora and the same seed.

With lora
img_00327_

Without lora
img_00328_

@FlowDownTheRiver
Copy link

FlowDownTheRiver commented Oct 15, 2025

Working great. Tried with int4 model with baked in 4 steps lora. Thank you very much for your work!

@lisi31415926
Copy link

如果CPU Offload 兼容性(ComfyUI 集成)有问题,像8G显存的显卡,是不是就不太适用你的分支?谢谢
If there are compatibility issues with CPU Offload (ComfyUI integration), would graphics cards with 8GB VRAM be less suitable for your branch? Thanks.

@JackDainzh
Copy link

如果CPU Offload 兼容性(ComfyUI 集成)有问题,像8G显存的显卡,是不是就不太适用你的分支?谢谢 If there are compatibility issues with CPU Offload (ComfyUI integration), would graphics cards with 8GB VRAM be less suitable for your branch? Thanks.

image

About the 8 gb of vram. I have 4070 with 12, and I can't launch it without cpu offload.

@vgabbo
Copy link

vgabbo commented Nov 7, 2025

Thank you for all your work! I just wanted to ask if this will get merged and what stop it. It seems very important, now that many Lora are famous (like Multi-View, Fusion, Relight).

@RenZhou0327
Copy link

RenZhou0327 commented Nov 10, 2025

Thank you for your answer.

After further testing with your new version, it appears that my problem occurs whenever I use a lightning LoRA.

This is the full code I use:

import torch
from diffusers import QwenImagePipeline
from huggingface_hub import hf_hub_download
from nunchaku import NunchakuQwenImageTransformer2DModel


transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "nunchaku-tech/nunchaku-qwen-image/svdq-int4_r32-qwen-image.safetensors"
)

lora_path = hf_hub_download(
    repo_id="lightx2v/Qwen-Image-Lightning",
    filename="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
)

transformer.update_lora_params(lora_path)

pipeline = QwenImagePipeline.from_pretrained(
    "Qwen/Qwen-Image",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipeline.enable_model_cpu_offload()

generator = torch.Generator(device="cpu").manual_seed(42)

output_image = pipeline(
    prompt="GHIBSKY style painting, sign saying 'Flux Ghibsky'",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=generator,
).images[0]

output_image.save(f"qwen-image.png")

With this code, it crashes with the following error:

python3: /nunchaku/src/kernels/zgemm/gemm_w4a4_launch_impl.cuh:482: static void nunchaku::kernels::GEMM_W4A4_Launch<Config, USE_FP4>::quantize_w4a4_act_fuse_lora(Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, bool, bool) [with Config = nunchaku::kernels::GEMMConfig_W4A4<true>; bool USE_FP4 = false]: Assertion `lora_down.shape[0] == N' failed.

I also tried without CPU offloading just in case (as I saw it may not work with lightning LoRAs, but I think you're referring to offloading at the transformer level, not pipeline one). Since I have an RTX 4090, I can do without CPU offloading by using a quantized text encoder:

import torch
from diffusers import QwenImagePipeline
from transformers import Qwen2_5_VLForConditionalGeneration
from huggingface_hub import hf_hub_download
from nunchaku import NunchakuQwenImageTransformer2DModel


text_encoder = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "unsloth/Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit",
    dtype=torch.bfloat16,
    device_map="auto"
)

transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "nunchaku-tech/nunchaku-qwen-image/svdq-int4_r32-qwen-image.safetensors"
)

lora_path = hf_hub_download(
    repo_id="lightx2v/Qwen-Image-Lightning",
    filename="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
)

transformer.update_lora_params(lora_path)

pipeline = QwenImagePipeline.from_pretrained(
    "Qwen/Qwen-Image",
    text_encoder=text_encoder,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
).to("cuda")

generator = torch.Generator(device="cpu").manual_seed(42)

output_image = pipeline(
    prompt="GHIBSKY style painting, sign saying 'Flux Ghibsky'",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=generator,
).images[0]

output_image.save(f"qwen-image.png")

But the result is the same. I still have the previous error. Adding a call to set_lora_strength after update_lora_params gives the same result. Also, when using the lightning LoRA with another LoRA and using compose_lora, I get an image full of noise.

When using the other PR's branch, it's working and I get this image:

qwen-image-1 In addition to this issue, I noticed that, when using a LoRA that is different from a lightning LoRA, it works as if the LoRA had not been applied, even though there are no errors. For instance with this code:
import torch
from diffusers import QwenImagePipeline
from huggingface_hub import hf_hub_download
from nunchaku import NunchakuQwenImageTransformer2DModel


transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
    "nunchaku-tech/nunchaku-qwen-image/svdq-int4_r32-qwen-image.safetensors"
)

lora_path = hf_hub_download(
    repo_id="Raelina/Raena-Qwen-Image",
    filename="raena_qwen_image_lora_v0.1_diffusers_fix.safetensors",
)

transformer.update_lora_params(lora_path)

pipeline = QwenImagePipeline.from_pretrained(
    "Qwen/Qwen-Image",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipeline.enable_model_cpu_offload()

generator = torch.Generator(device="cpu").manual_seed(42)

output_image = pipeline(
    prompt="anime illustration of a girl with long black hair with blunt bangs and purple eyes, wearing a blue kimono with purple floral prints and a purple obi. She is looking at the viewer with a slight smile, standing outdoors at night. The background features a brightly lit food stand with lanterns and a blurred figure in the distance. The girl is positioned slightly to the right, with a three-quarter view from the front. The scene has a festival atmosphere, with warm yellow and orange lights from the lanterns. The viewing angle is slightly below eye level, focusing on her upper body.",
    negative_prompt=" ",
    width=1024,
    height=1408,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=generator,
).images[0]

output_image.save(f"qwen-image.png")

I get the following image:

qwen-image-with-lora But without applying any LoRA, I get: qwen-image-no-lora And when using the [other PR's branch](https://github.com//pull/680), I get: qwen-image-2 This last image seems more in line with the expected result when using this LoRA.

Do you experience the same behaviour, or do you think it's a configuration issue on my end?

Finally, I would also like to add that there appears to be a small error on import of nunchaku/models/attention.py, line 141.

Shouldn't it be:

from .linear import SVDQW4A4Linear

Instead of:

from ..linear import SVDQW4A4Linear

?

This line could even be removed completely, since there is already an identical import at the top of the file, right? (line 11)

I'm having the same issue, would like to ask how did you solve it?

@moveforever
Copy link

hi, i deploy this pr, then i use code as followed. The result of image is blur:

                angles_lora_path = 'dx8152/Qwen-Edit-2509-Multiple-angles/multi_angles.safetensors'
                fusion_lora_path = 'dx8152/Qwen-Image-Edit-2509-Fusion/fusion.safetensors'
                from nunchaku.lora.qwenimage import compose_lora
                composed_lora = compose_lora([
                    (angles_lora_path, 0.8),  # 第一个LoRA,强度1.0
                    (fusion_lora_path, 0),  # 第二个LoRA,强度0.6
                ])

                ## 应用组合后的LoRA(传递num_loras参数很重要!)
                self.transformer.update_lora_params(composed_lora, num_loras=2)

            #self.pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", transformer= self.transformer, scheduler=scheduler, torch_dtype=torch.bfloat16)
            self.pipeline = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit", transformer= self.transformer, scheduler=scheduler, torch_dtype=torch.bfloat16)
            ```
            

@dimitribarbot
Copy link

I'm having the same issue, would like to ask how did you solve it?

Unfortunately, I was unable to get this PR to work with diffusers. For my part, I am using the code from PR #680 with the suggested changes proposed here.

@ykj467422034
Copy link

hi, i deploy this pr, then i use code as followed. The result of image is blur:嗨,我部署了这个 PR,然后我使用如下代码。图像的结果是模糊:

                angles_lora_path = 'dx8152/Qwen-Edit-2509-Multiple-angles/multi_angles.safetensors'
                fusion_lora_path = 'dx8152/Qwen-Image-Edit-2509-Fusion/fusion.safetensors'
                from nunchaku.lora.qwenimage import compose_lora
                composed_lora = compose_lora([
                    (angles_lora_path, 0.8),  # 第一个LoRA,强度1.0
                    (fusion_lora_path, 0),  # 第二个LoRA,强度0.6
                ])

                ## 应用组合后的LoRA(传递num_loras参数很重要!)
                self.transformer.update_lora_params(composed_lora, num_loras=2)

            #self.pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", transformer= self.transformer, scheduler=scheduler, torch_dtype=torch.bfloat16)
            self.pipeline = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit", transformer= self.transformer, scheduler=scheduler, torch_dtype=torch.bfloat16)
            ```
            

I found the same problem, have you ever solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.