Skip to content

Conversation

@black4der
Copy link

@black4der black4der commented Oct 10, 2025

中文版

功能概述

为 Nunchaku Qwen Image 和 Qwen Image Edit 模型添加 LoRA 支持,使用户能够在 ComfyUI 中灵活地应用和组合多个 LoRA 权重,支持文本到图像(T2I)和图像编辑(I2I)工作流。

主要改动

🎯 新增节点
  1. NunchakuQwenImageLoraLoader

    • 单个 LoRA 加载器节点
    • 支持可调节的 LoRA 强度(-100.0 到 100.0)
    • 支持链式连接多个 LoRA 节点
  2. NunchakuQwenImageLoraStack

    • 多 LoRA 堆栈节点
    • 在单个节点中配置最多 15 个 LoRA
    • 简化工作流,避免多个节点的视觉混乱
🔧 核心实现
  • ComfyQwenImageWrapper(新增)
    • 包装 NunchakuQwenImageTransformer2DModel 以支持 ComfyUI 集成
    • 实现延迟 LoRA 组合和应用策略
    • 支持 LoRA 缓存和智能重组
    • 提供高级缓存策略以优化性能
📦 模型增强
  • NunchakuQwenImageTransformer2DModel 中添加 LoRA 相关方法:
    • update_lora_params() - 更新 LoRA 参数
    • reset_lora() - 重置到原始权重
    • forward() - 适配 ComfyUI 参数格式
    • process_img() - 处理 4D/5D 输入张量(支持 T2I 和 I2I 模型)
    • 支持 Nunchaku 和 Diffusers 格式的 LoRA 自动转换
    • 修复非方形比例的位置编码,使用居中对齐的 position IDs(与 Diffusers pipeline 一致)
📂 修改文件列表

新增:

  • wrappers/qwenimage.py - ComfyQwenImageWrapper 实现
  • nodes/lora/qwenimage.py - LoRA 加载器节点

修改:

  • models/qwenimage.py - 添加 LoRA 支持方法
  • nodes/models/qwenimage.py - 包装 transformer 以支持 LoRA

技术特性

  • 延迟组合:LoRA 在前向传播时动态应用,避免不必要的权重合并
  • 格式兼容:自动检测和转换 Nunchaku/Diffusers 格式的 LoRA
  • 智能缓存:缓存已加载的 LoRA 状态字典,提升性能
  • 共享权重:使用 Flux 风格的深拷贝策略共享 transformer 实例,节省内存
  • 多 LoRA 叠加:支持同时应用多个 LoRA,每个可独立设置强度
  • 链式支持:支持串联多个 LoRA 节点或使用堆栈节点
  • .alpha 参数处理:自动识别和应用 LoRA 的 .alpha 缩放参数
  • ControlNet 支持:完整支持 Qwen Image ControlNet(与标准 ComfyUI ControlNet 节点兼容)
  • 非方形比例支持:修复位置编码偏移问题,使用居中对齐的 position IDs,确保生成内容在非方形比例下正确居中(与官方 Diffusers pipeline 行为一致)
  • Qwen Image Edit 支持:完整支持图像编辑模型(I2I),自动处理 ref_latents 的 5D tensor 输入
  • 类型检查:严格的模型类型验证,提供友好的错误提示

使用方法

单个 LoRA

Nunchaku Qwen Image DiT Loader → Nunchaku Qwen Image LoRA Loader → KSampler

多个 LoRA(链式)

Nunchaku Qwen Image DiT Loader 
  → Nunchaku Qwen Image LoRA Loader (LoRA 1, strength=1.0)
  → Nunchaku Qwen Image LoRA Loader (LoRA 2, strength=0.6)
  → KSampler

多个 LoRA(堆栈)

Nunchaku Qwen Image DiT Loader → Nunchaku Qwen Image LoRA Stack → KSampler

(在堆栈节点中配置多个 LoRA 和强度)

ControlNet

Nunchaku Qwen Image DiT Loader → ControlNet Apply Advanced → KSampler
                                          ↑
                      ControlNet Loader + Control Image

💡 推荐参数

LoRA 强度

  • 标准强度:1.0(直接使用,无额外放大)
  • 调整范围:0.5 - 1.5
  • 大部分LoRA在1.0强度时效果最佳

ControlNet 强度

  • 推荐范围:0.3 - 0.8(与原版模型相同)
  • 与原版模型使用相同强度值即可
  • 注意:高强度(> 0.8)可能影响画质

测试

  • ✅ 验证单个 LoRA 加载和应用
  • ✅ 验证多个 LoRA 链式连接
  • ✅ 验证 LoRA Stack 节点功能
  • ✅ 测试不同 LoRA 强度值的效果
  • ✅ 验证 ControlNet 支持和强度应用
  • ✅ 测试 LoRA + ControlNet 组合使用

⚠️ 已知限制

CPU Offload 兼容性

部分 LoRA 文件(如 Qwen-Image-Lightning)对不同 transformer block 训练了不同的层,导致不同 block 之间的内部结构(rank)不一致。由于 QwenImage 使用 Python 层的 CPUOffloadManager 实现 CPU offload,而该管理器要求所有 block 具有完全相同的结构,因此这类 LoRA 无法与 CPU offload 同时使用

症状

RuntimeError: The size of tensor a (128) must match the size of tensor b (192) at non-singleton dimension 1

解决方案

  1. 禁用 CPU offload(推荐,适合高 VRAM 用户)
    • 在模型加载器节点中取消勾选 CPU offload 选项
  2. 使用其他 LoRA(选择所有 block 结构一致的 LoRA)

技术原因

  • Flux 模型使用 C++ 层实现,每个 block 独立管理内存,支持 rank 不一致
  • QwenImage 使用 Python 层的 CPUOffloadManager,通过固定的 buffer blocks 进行参数复制,要求所有 block 结构一致
  • 未来可能通过 C++ 层实现来解决此限制

English Version

Overview

Add LoRA support for Nunchaku Qwen Image models, enabling users to flexibly apply and compose multiple LoRA weights in ComfyUI.

Key Changes

🎯 New Nodes
  1. NunchakuQwenImageLoraLoader

    • Single LoRA loader node
    • Supports adjustable LoRA strength (-100.0 to 100.0)
    • Supports chaining multiple LoRA nodes
  2. NunchakuQwenImageLoraStack

    • Multi-LoRA stack node
    • Configure up to 15 LoRAs in a single node
    • Simplifies workflows and reduces visual clutter
🔧 Core Implementation
  • ComfyQwenImageWrapper (New)
    • Wraps NunchakuQwenImageTransformer2DModel for ComfyUI integration
    • Implements lazy LoRA composition and application strategy
    • Supports LoRA caching and intelligent recomposition
    • Provides advanced caching strategies for performance optimization
📦 Model Enhancements
  • Added LoRA-related methods to NunchakuQwenImageTransformer2DModel:
    • update_lora_params() - Update LoRA parameters
    • reset_lora() - Reset to original weights
    • forward() - Adapt ComfyUI parameter format
    • process_img() - Handle 4D input tensors
    • Automatic conversion between Nunchaku and Diffusers format LoRAs
📂 Modified Files

New:

  • wrappers/qwenimage.py - ComfyQwenImageWrapper implementation
  • nodes/lora/qwenimage.py - LoRA loader nodes

Modified:

  • models/qwenimage.py - Added LoRA support methods
  • nodes/models/qwenimage.py - Wrap transformer for LoRA support

Technical Features

  • Lazy Composition: LoRAs are applied dynamically during forward pass, avoiding unnecessary weight merging
  • Format Compatibility: Automatic detection and conversion of Nunchaku/Diffusers format LoRAs
  • Smart Caching: Caches loaded LoRA state dicts for improved performance
  • Shared Weights: Uses Flux-style deepcopy strategy to share transformer instances, saving memory
  • Multi-LoRA Stacking: Support applying multiple LoRAs simultaneously with independent strength control
  • Chaining Support: Supports chaining multiple LoRA nodes or using stack node
  • .alpha Parameter Handling: Automatically recognizes and applies LoRA .alpha scaling parameters
  • ControlNet Support: Full support for Qwen Image ControlNet (compatible with standard ComfyUI ControlNet nodes)
  • Non-Square Aspect Ratio Support: Fixed position encoding offset issue using center-aligned position IDs, ensuring generated content is properly centered in non-square aspect ratios (consistent with official Diffusers pipeline behavior)
  • Qwen Image Edit Support: Full support for image editing models (I2I), automatically handles 5D tensor inputs for ref_latents
  • Type Checking: Strict model type validation with user-friendly error messages

Usage

Single LoRA:

Nunchaku Qwen Image DiT Loader → Nunchaku Qwen Image LoRA Loader → KSampler

Multiple LoRAs (Chaining):

Nunchaku Qwen Image DiT Loader 
  → Nunchaku Qwen Image LoRA Loader (LoRA 1, strength=1.0)
  → Nunchaku Qwen Image LoRA Loader (LoRA 2, strength=0.6)
  → KSampler

Multiple LoRAs (Stack):

Nunchaku Qwen Image DiT Loader → Nunchaku Qwen Image LoRA Stack → KSampler

(Configure multiple LoRAs and strengths in the stack node)

Qwen Image Edit (I2I) with LoRA:

Nunchaku Qwen Image Edit DiT Loader 
  → Nunchaku Qwen Image LoRA Loader 
  → KSampler (with reference latents)

Note: Image Edit models use the same LoRA nodes, automatically handling ref_latents

ControlNet:

Nunchaku Qwen Image DiT Loader → ControlNet Apply Advanced → KSampler
                                          ↑
                      ControlNet Loader + Control Image

💡 Recommended Parameters

LoRA Strength:

  • Standard strength: 1.0 (direct use, no extra amplification)
  • Adjustment range: 0.5 - 1.5
  • Most LoRAs work best at 1.0 strength

ControlNet Strength:

  • Recommended range: 0.3 - 0.8 (same as original model)
  • Use the same strength values as the original model
  • Note: High strength (> 0.8) may affect image quality

Note: Based on user testing, LoRAs for INT4 quantized models do not require additional amplification factor. 1.0 strength achieves optimal results

Testing

  • ✅ Verified single LoRA loading and application
  • ✅ Verified multiple LoRA chaining
  • ✅ Verified LoRA Stack node functionality
  • ✅ Tested different LoRA strength values
  • ✅ Verified ControlNet support and strength application
  • ✅ Tested LoRA + ControlNet combined usage
  • ✅ Verified position centering in non-square aspect ratios (1664x1216, etc.)
  • ✅ Tested LoRA support for Qwen Image Edit models

⚠️ Known Limitations

CPU Offload Compatibility

Some LoRA files (e.g., Qwen-Image-Lightning) train different layers for different transformer blocks, resulting in inconsistent internal structures (ranks) across blocks. Since QwenImage uses a Python-based CPUOffloadManager for CPU offload, which requires all blocks to have identical structure, these LoRAs cannot be used with CPU offload simultaneously.

Symptom:

RuntimeError: The size of tensor a (128) must match the size of tensor b (192) at non-singleton dimension 1

Solutions:

  1. Disable CPU offload (recommended for high-VRAM users)
    • Uncheck the CPU offload option in the model loader node
  2. Use different LoRAs (choose LoRAs with consistent block structures)

Technical Reason:

  • Flux models use C++ implementation where each block independently manages memory, supporting rank inconsistencies
  • QwenImage uses Python-based CPUOffloadManager, which copies parameters through fixed buffer blocks, requiring all blocks to have identical structure
  • This limitation may be resolved in the future through C++ layer implementation

✅ Checklist

  • Code follows project coding standards
  • Added necessary documentation and comments
  • Code tested locally and passes
  • Updated relevant documentation
  • Functional testing completed (manual tests for all features)

- Introduced `NunchakuQwenImageLoraLoader` and `NunchakuQwenImageLoraStack` nodes for applying LoRA weights.
- Wrapped transformer in `ComfyQwenImageWrapper` for enhanced integration with ComfyUI.
- Implemented LoRA composition and caching strategies in the transformer model.
- Added methods for updating and restoring original parameters in transformer blocks.
- Enhanced model processing to handle dynamic LoRA inputs and strengths.
@GavChap
Copy link

GavChap commented Oct 10, 2025

This seems to completely wreck image quality even without loras in the pipeline. This is one of the images I made with this pull and the corresponding nunchaku pull request

image

Compared with base nunchaku qwen image with no loras. Both are the same seed.
image

@black4der
Copy link
Author

即使没有 Loras 的管道,这似乎也会彻底毁掉图像质量。这是我用这个 pull 请求和对应的 Nunchaku pull 请求制作的图像之一。

图像 与没有 loras 的基础双节棍 qwen 图像相比。两者都是相同的种子。 图像

你用的int4的模型还是fp4的模型?因为我仅测试了fp4模型的效果
ComfyUI_temp_hlvlo_00011_
这是在fp4 r128上测试的结果
ComfyUI_temp_hlvlo_00010_

- Update model implementation in models/qwenimage.py

- Improve LoRA node functionality in nodes/lora/qwenimage.py

- Refine model loading in nodes/models/qwenimage.py

- Update wrapper implementation in wrappers/qwenimage.py
- Added support for `controlnet_block_samples` in `NunchakuQwenImageTransformer2DModel` to maintain backward compatibility.
- Updated logic to convert old control format to a dictionary for improved flexibility.
- Refined control parameter handling in `ComfyQwenImageWrapper` to streamline integration with the transformer model.
@black4der black4der changed the title add-lora-support-for-qwen feat: Add LoRA and ControlNet support for Qwen Image models Oct 11, 2025
@GavChap
Copy link

GavChap commented Oct 11, 2025

即使没有 Loras 的管道,这似乎也会彻底毁掉图像质量。这是我用这个 pull 请求和对应的 Nunchaku pull 请求制作的图像之一。
图像
与没有 loras 的基础双节棍 qwen 图像相比。两者都是相同的种子。 图像

你用的int4的模型还是fp4的模型?因为我仅测试了fp4模型的效果 ComfyUI_temp_hlvlo_00011_ 这是在fp4 r128上测试的结果 ComfyUI_temp_hlvlo_00010_

I'm using int4

…r2DModel

- Adjusted calculations for height and width lengths to use padded dimensions after applying padding.
- Modified return statement to include both padded and original dimensions for better unpatchify functionality.
- Cleaned up whitespace in ComfyQwenImageWrapper for consistency.
@black4der
Copy link
Author

即使没有 Loras 的管道,这似乎也会彻底毁掉图像质量。这是我用这个 pull 请求和对应的 Nunchaku pull 请求制作的图像之一。
图像
与没有 loras 的基础双节棍 qwen 图像相比。两者都是相同的种子。 图像

你用的int4的模型还是fp4的模型?因为我仅测试了fp4模型的效果 ComfyUI_temp_hlvlo_00011_ 这是在fp4 r128上测试的结果 ComfyUI_temp_hlvlo_00010_

I'm using int4

Please try testing again with the latest code from Git.
Also, you can try using a model without the fused Lightning LoRA, and load the Lightning LoRA separately through the LoRA Loader node for testing.
I’ve been testing with a model that doesn’t include any fused LoRAs, and it works fine on my side.

@isaac-mcfadyen
Copy link

This PR seems like it might be a duplicate of #642?

@black4der
Copy link
Author

This PR seems like it might be a duplicate of #642?

No, this is realized based on my own ideas.

…uQwenImageTransformer2DModel

- Updated the calculation of txt_start to utilize padded height and width dimensions for improved accuracy.
- Ensured compatibility with previous padding adjustments for consistent model behavior.
…nImageTransformer2DModel

- Introduced a VAE scale factor for image shape calculations to align with the diffusers pipeline.
- Updated patch grid dimension calculations to ensure consistency with the original model behavior.
- Added a RoPE-based position embedding method for improved positional encoding.
- Enhanced image processing methods to support packed latents and maintain compatibility with the official diffusers pipeline.
- Refined the forward method to directly implement original ComfyUI logic for better integration.
- Updated ComfyQwenImageWrapper to unpack model output correctly.
… for Image Edit models

- Added support for handling both Qwen Image (T2I) and Qwen Image Edit (I2I) models.
- Updated input tensor documentation to reflect new shape handling for Image Edit models.
- Implemented logic to squeeze the middle dimension for 5D input tensors.
@zhangyi90825-tech
Copy link

大佬!节点已经做出来了吗?要去哪里下?等了好久了

@henrylaobai
Copy link

牛逼了 66666666666

@zhangyi90825-tech
Copy link

@black4der 大佬 什么时候可以用?

@black4der
Copy link
Author

@black4der 大佬 什么时候可以用?

最简单的方式就是git我们分支的nunchaku和comfyui-nunchaku ,从新编译nunchaku,然后git我分支的comfyui-nunchaku文件到custom_node文件夹,然后重启comfyui就可以使用了,但是现在是测试版本,稳定性等还没有办法保证,安装不懂得话你可以看看nunchaku是如何编译的,我只是简单的修改了python代码 c++代码我没有动过所以说编译应该没有问题,还有一个简单的方法,就是git我的nunchaku分支,下载后里面有一个nunchaku的python文件夹,你可以把这些python文件替换掉你环境里面的nunchaku的python文件,然后下载Comfyui-nunchaku到你的comfyui插件目录同样可以使用,不懂的可以再这里质询

@zhangyi90825-tech
Copy link

@black4der 要是有解压包直接解压覆盖就好了,虽然有电脑基础,但是太复杂确实不知道怎么弄.不过还是感谢大佬,希望还是看到的.

@zhangyi90825-tech
Copy link

@lmxyy 大佬,麻烦先更新一下Lora的支持,拜托了

@aaaxulei
Copy link

@black4der大佬什么时候可以用?

最简单的方式就是git我们分支的nunchaku和comfyui-nunchaku ,从新编译nunchaku,然后git我分支的comfyui-nunchaku文件到custom_node文件夹,重启comfyui就可以使用了,但是现在还是测试版本,稳定性等没有办法保证,安装不明白的话你可以看看nunchaku是如何编译的,我简单的只是修改了代码c++代码我没有动过所以说编译应该没有问题,还有一个简单的方法,就是git我的nunchaku分支,下载后里面有一个nunchaku的python文件夹,你可以把这些python文件替换掉你环境里面的nunchaku的python文件,然后下载Comfyui-nunchaku到你的comfyui插件目录同样可以使用,不懂的可以再这里质询

没有看见一个名为nunchaku的python的文件夹

@black4der
Copy link
Author

black4der commented Oct 13, 2025

@black4der大佬什么时候可以用?

最简单的方式就是git我们分支的nunchaku和comfyui-nunchaku ,从新编译nunchaku,然后git我分支的comfyui-nunchaku文件到custom_node文件夹,重启comfyui就可以使用了,但是现在还是测试版本,稳定性等没有办法保证,安装不明白的话你可以看看nunchaku是如何编译的,我简单的只是修改了代码c++代码我没有动过所以说编译应该没有问题,还有一个简单的方法,就是git我的nunchaku分支,下载后里面有一个nunchaku的python文件夹,你可以把这些python文件替换掉你环境里面的nunchaku的python文件,然后下载Comfyui-nunchaku到你的comfyui插件目录同样可以使用,不懂的可以再这里质询

没有看见一个名为nunchaku的python的文件夹

插件免编译安装教程

此安装过程分为两个核心部分:

  1. 安装核心库 (nunchaku):这是插件运行所依赖的底层 Python 库。
  2. 安装插件节点 (ComfyUI-nunchaku):这是你在 ComfyUI 界面中能看到的自定义节点。

**第一步:替换核心库python文件 **

这一步的目标是将 nunchaku 库的文件放置到 ComfyUI 内嵌的 Python 环境中。

1. 获取 nunchaku 库文件

你有两种方式获取:

2. 复制核心文件

这是最关键的一步,请仔细操作:

打开刚刚通过 Git 克隆或解压得到的 nunchaku 文件夹,你会发现里面还有一个同名的 nunchaku 文件夹。

nunchaku/       <-- 你下载/克隆的顶层文件夹
└── nunchaku/   <-- [目标] 这是我们需要的内层文件夹
    ├── __init__.py
    └── ... (其他所有文件)

*复制这个内层 nunchaku 文件夹里的所有内容。

3. 粘贴到 ComfyUI 的 Python 环境中

  • 找到你的 ComfyUI 安装目录,并导航至以下路径:
    ...\ComfyUI\python_embeded\Lib\site-packages\nunchaku

    提示: ... 代表你的 ComfyUI 所在的磁盘,例如 E:\

复制的所有文件和文件夹,粘贴到这个Python环境下的的 nunchaku 文件夹中。

至此,核心库替换完成。


第二步:安装 ComfyUI 插件 (ComfyUI-nunchaku)

这一步是将插件的节点文件安装到 ComfyUI 的自定义节点目录中。

1. 导航到 custom_nodes 目录

  • 进入你的 ComfyUI 安装目录,找到 ComfyUI\custom_nodes\ 文件夹。
  • 重要: 如果该文件夹内已存在 ComfyUI-nunchaku 文件夹,请先删除它

2. 获取插件文件

同样,你有两种方式:

完成后,你的 custom_nodes 目录结构应该如下所示:

- ComfyUI
  - custom_nodes
    - ... (其他插件)
    - ComfyUI-nunchaku  <-- [插件文件夹]
      ├── __init__.py
      ├── nunchaku_qwen.py
      └── ... (其他插件文件)

第三步:重启并使用

  1. 重启 ComfyUI
    完全关闭正在运行的 ComfyUI,然后重新启动它。

  2. 在工作流中使用
    加载一个使用 Nunchaku Qwen image 的示例工作流,
    添加我们新的Nunchaku Qwen Image LoRA Stack节点并链接Nunchaku Qwen Image DiT Loader

    Nunchaku Qwen Image DiT LoaderNunchaku Qwen Image LoRA StackKSampler

@GavChap
Copy link

GavChap commented Oct 13, 2025

Another thing I think may need looking at is if the lora application code in this PR uses the lora's Alpha at all, it should use the alpha setting from the lora or set the alpha to the same as the rank. I'm not 100% sure what this does at the moment as some loras seem to apply themselves differently to expected from running the non-nunchaku model.

@black4der
Copy link
Author

Another thing I think may need looking at is if the lora application code in this PR uses the lora's Alpha at all, it should use the alpha setting from the lora or set the alpha to the same as the rank. I'm not 100% sure what this does at the moment as some loras seem to apply themselves differently to expected from running the non-nunchaku model.

During my tests on FP4, I noticed that the strength of FP4 is completely insufficient when AMPLIFICATION_FACTOR is set to 1.0. I hope you can collaborate with me on further testing: use a LoRA with a strong style, paired with Lightning-8steps-V2.0, and run the test with both LoRAs set to a strength of 1. This way, we can compare the sensitivity of IN4 and FP4 models to LoRA strength.
According to my test results, only when AMPLIFICATION_FACTOR is set to 2.0 can the LoRA strength match the results obtained with the Qwen-Image FP8 model. Therefore, I’m confused: is there a possibility that IN4 and FP4 require different AMPLIFICATION_FACTOR values?

@GavChap
Copy link

GavChap commented Oct 13, 2025

Another thing I think may need looking at is if the lora application code in this PR uses the lora's Alpha at all, it should use the alpha setting from the lora or set the alpha to the same as the rank. I'm not 100% sure what this does at the moment as some loras seem to apply themselves differently to expected from running the non-nunchaku model.

During my tests on FP4, I noticed that the strength of FP4 is completely insufficient when AMPLIFICATION_FACTOR is set to 1.0. I hope you can collaborate with me on further testing: use a LoRA with a strong style, paired with Lightning-8steps-V2.0, and run the test with both LoRAs set to a strength of 1. This way, we can compare the sensitivity of IN4 and FP4 models to LoRA strength. According to my test results, only when AMPLIFICATION_FACTOR is set to 2.0 can the LoRA strength match the results obtained with the Qwen-Image FP8 model. Therefore, I’m confused: is there a possibility that IN4 and FP4 require different AMPLIFICATION_FACTOR values?

If you're pairing it with Lightning 2.0 then that already gives problems with using any strong style loras. Your test should be with a single lora, and I suggest using the Lightning 8 Step - 1.1 as in my testing that gives consistently good results at AMPLIFICATION_FACTOR = 1.0. I will post an image with workflow attached, then you can run it with FP4 and see if your result is the same.

I suggest using a lightning lora as it's very obvious when they are too strong and your code HAS to work at strength 1.0 with a lightning lora as that is how the lightning lora was designed, otherwise it's a problem.

Nunchaku r128 Qwen Image with AMPLIFICATION_FACTOR=1.0 / Lightning Lora @ 1.0 / 8 Steps, workflow attached.
img_00280_

Standard FP16 Qwen Image / Lightning Lora @ 1.0 / 8 Steps, workflow attached.
img_00284_

You can see the images above are very similar, this is correct.

Nunchaku r128 Qwen Image with simulated AMPLIFICATION_FACTOR=2.0 / Lightning Lora @ 1.0 / 8 Steps, workflow attached.
img_00281_

You can see the image has changed significantly and some noise has started to creep in.

Edit: Here's an interesting idea, why not add an "amplification factor" to the lora nodes so it can be changed by the user all it does is multiply the weight by the factor right? Make it user configurable

@black4der
Copy link
Author

Another thing I think may need looking at is if the lora application code in this PR uses the lora's Alpha at all, it should use the alpha setting from the lora or set the alpha to the same as the rank. I'm not 100% sure what this does at the moment as some loras seem to apply themselves differently to expected from running the non-nunchaku model.

During my tests on FP4, I noticed that the strength of FP4 is completely insufficient when AMPLIFICATION_FACTOR is set to 1.0. I hope you can collaborate with me on further testing: use a LoRA with a strong style, paired with Lightning-8steps-V2.0, and run the test with both LoRAs set to a strength of 1. This way, we can compare the sensitivity of IN4 and FP4 models to LoRA strength. According to my test results, only when AMPLIFICATION_FACTOR is set to 2.0 can the LoRA strength match the results obtained with the Qwen-Image FP8 model. Therefore, I’m confused: is there a possibility that IN4 and FP4 require different AMPLIFICATION_FACTOR values?

If you're pairing it with Lightning 2.0 then that already gives problems with using any strong style loras. Your test should be with a single lora, and I suggest using the Lightning 8 Step - 1.1 as in my testing that gives consistently good results at AMPLIFICATION_FACTOR = 1.0. I will post an image with workflow attached, then you can run it with FP4 and see if your result is the same.

I suggest using a lightning lora as it's very obvious when they are too strong and your code HAS to work at strength 1.0 with a lightning lora as that is how the lightning lora was designed, otherwise it's a problem.

Nunchaku r128 Qwen Image with AMPLIFICATION_FACTOR=1.0 / Lightning Lora @ 1.0 / 8 Steps, workflow attached. img_00280_

Standard FP16 Qwen Image / Lightning Lora @ 1.0 / 8 Steps, workflow attached. img_00284_

You can see the images above are very similar, this is correct.

Nunchaku r128 Qwen Image with simulated AMPLIFICATION_FACTOR=2.0 / Lightning Lora @ 1.0 / 8 Steps, workflow attached. img_00281_

You can see the image has changed significantly and some noise has started to creep in.

Edit: Here's an interesting idea, why not add an "amplification factor" to the lora nodes so it can be changed by the user all it does is multiply the weight by the factor right? Make it user configurable

Based on my testing findings, you've probably also encountered issues where other LoRAs (apart from the Lightning LoRA) either fail to work or have insufficient strength. During my tests, I noticed that the Lightning LoRA has an α=8.
After applying our scale = alpha / rank, it turns out that its weights don't need additional amplification.
This led me to modify the LoRA strength loading logic:
The alpha itself already significantly weakens the weights (e.g., ×0.125)
This weakening coincidentally compensates for the INT4 quantization loss (×0.5)
Result: No additional AMPLIFICATION_FACTOR is needed
However, for other LoRAs that either have no alpha or a large alpha (>16):
Their weights maintain normal strength
INT4 quantization causes a 50% performance loss
Quantization loss needs to be compensated
Therefore, I've adopted a dynamic LoRA loading approach to adjust weights correctly:
For LoRAs with small alpha (≤16) → AMPLIFICATION = 1.0
For LoRAs without alpha or with large alpha (>16) → AMPLIFICATION = 2.0
With this, all LoRAs should now load properly.
Thank you for all your testing on this PR. :)

@GavChap
Copy link

GavChap commented Oct 13, 2025

Another thing I think may need looking at is if the lora application code in this PR uses the lora's Alpha at all, it should use the alpha setting from the lora or set the alpha to the same as the rank. I'm not 100% sure what this does at the moment as some loras seem to apply themselves differently to expected from running the non-nunchaku model.

During my tests on FP4, I noticed that the strength of FP4 is completely insufficient when AMPLIFICATION_FACTOR is set to 1.0. I hope you can collaborate with me on further testing: use a LoRA with a strong style, paired with Lightning-8steps-V2.0, and run the test with both LoRAs set to a strength of 1. This way, we can compare the sensitivity of IN4 and FP4 models to LoRA strength. According to my test results, only when AMPLIFICATION_FACTOR is set to 2.0 can the LoRA strength match the results obtained with the Qwen-Image FP8 model. Therefore, I’m confused: is there a possibility that IN4 and FP4 require different AMPLIFICATION_FACTOR values?

If you're pairing it with Lightning 2.0 then that already gives problems with using any strong style loras. Your test should be with a single lora, and I suggest using the Lightning 8 Step - 1.1 as in my testing that gives consistently good results at AMPLIFICATION_FACTOR = 1.0. I will post an image with workflow attached, then you can run it with FP4 and see if your result is the same.
I suggest using a lightning lora as it's very obvious when they are too strong and your code HAS to work at strength 1.0 with a lightning lora as that is how the lightning lora was designed, otherwise it's a problem.
Nunchaku r128 Qwen Image with AMPLIFICATION_FACTOR=1.0 / Lightning Lora @ 1.0 / 8 Steps, workflow attached. img_00280_
Standard FP16 Qwen Image / Lightning Lora @ 1.0 / 8 Steps, workflow attached. img_00284_
You can see the images above are very similar, this is correct.
Nunchaku r128 Qwen Image with simulated AMPLIFICATION_FACTOR=2.0 / Lightning Lora @ 1.0 / 8 Steps, workflow attached. img_00281_
You can see the image has changed significantly and some noise has started to creep in.
Edit: Here's an interesting idea, why not add an "amplification factor" to the lora nodes so it can be changed by the user all it does is multiply the weight by the factor right? Make it user configurable

Based on my testing findings, you've probably also encountered issues where other LoRAs (apart from the Lightning LoRA) either fail to work or have insufficient strength. During my tests, I noticed that the Lightning LoRA has an α=8. After applying our scale = alpha / rank, it turns out that its weights don't need additional amplification. This led me to modify the LoRA strength loading logic: The alpha itself already significantly weakens the weights (e.g., ×0.125) This weakening coincidentally compensates for the INT4 quantization loss (×0.5) Result: No additional AMPLIFICATION_FACTOR is needed However, for other LoRAs that either have no alpha or a large alpha (>16): Their weights maintain normal strength INT4 quantization causes a 50% performance loss Quantization loss needs to be compensated Therefore, I've adopted a dynamic LoRA loading approach to adjust weights correctly: For LoRAs with small alpha (≤16) → AMPLIFICATION = 1.0 For LoRAs without alpha or with large alpha (>16) → AMPLIFICATION = 2.0 With this, all LoRAs should now load properly. Thank you for all your testing on this PR. :)

The ComfyUI default for a lora missing alpha is to set rank multiplier to 1.0 (i.e. alpha = rank). Remember alpha is a ratio of alpha / rank, so just defaulting to a simple multiplier for larger alphas won't always work, if I have a lora with alpha 32 and rank 32, I'd expect a 1.0x multiplier, not 2.0x, so you're actually doubling its effect.

Comfy snippet:

        if v[2] is not None:
            alpha = v[2] / mat2.shape[0]
        else:
            alpha = 1.0
        
        ...more code...
        
        weight += function(((strength * alpha) * lora_diff).type(weight.dtype))

@zhangyi90825-tech
Copy link

@black4der 你好!我的ComfyUI文件夹里没有看到python_embeded这个文件夹,是python文件夹吗?
image

@zhangyi90825-tech
Copy link

@black4der 你好!我的ComfyUI文件夹里没有看到python_embeded这个文件夹,是python文件夹吗? 图像

我尝试把你教程第一步的文件复制到D:\Comfy UI\python\Lib\site-packages\nunchaku
最终重启Comfy UI还是搜索不到Lora节点

@NielsGx NielsGx mentioned this pull request Oct 14, 2025
2 tasks
@zhangyi90825-tech
Copy link

@black4der 大佬 能教一下吗?

@m0rph3us1987
Copy link

@black4der Unfortunately Rank 32 and Alpha 32 lora's do not work as expected.

Without nunchaku and strength 1.0 -> works fine.
With nunchaku and strength 1.0 -> Lora has effect at all.
With nunchaku and strength >= 1.5 -> Effect visible but result is too heavy.

I would expect it to have the same/similiar effect on strength 1.0 that I get without nunchaku

@kajfblsdkgnlsndg
Copy link

Looking forward to the arrival of the official version and the official wheels

@AnimeArtAlchemist
Copy link

Seems like its working perfectly with the handful of lora I've tested so far with the int4 lightning model. Thanks for your efforts on this, it's finally made qwen usable for me.

@lhucklen
Copy link

Please release…

@risenh
Copy link

risenh commented Oct 22, 2025

@black4der大佬什么时候可以用?

最简单的方式就是git我们分支的nunchaku和comfyui-nunchaku ,从新编译nunchaku,然后git我分支的comfyui-nunchaku文件到custom_node文件夹,重启comfyui就可以使用了,但是现在还是测试版本,稳定性等没有办法保证,安装不明白的话你可以看看nunchaku是如何编译的,我简单的只是修改了代码c++代码我没有动过所以说编译应该没有问题,还有一个简单的方法,就是git我的nunchaku分支,下载后里面有一个nunchaku的python文件夹,你可以把这些python文件替换掉你环境里面的nunchaku的python文件,然后下载Comfyui-nunchaku到你的comfyui插件目录同样可以使用,不懂的可以再这里质询

没有看见一个名为nunchaku的python的文件夹

插件免编译安装教程

此安装过程分为两个核心部分:

  1. 安装核心库 (nunchaku):这是插件运行所依赖的底层 Python 库。
  2. 安装插件节点 (ComfyUI-nunchaku):这是你在 ComfyUI 界面中能看到的自定义节点。

**第一步:替换核心库python文件 **

这一步的目标是将 nunchaku 库的文件放置到 ComfyUI 内嵌的 Python 环境中。

1. 获取 nunchaku 库文件

你有两种方式获取:

2. 复制核心文件

这是最关键的一步,请仔细操作:

打开刚刚通过 Git 克隆或解压得到的 nunchaku 文件夹,你会发现里面还有一个同名的 nunchaku 文件夹。

nunchaku/       <-- 你下载/克隆的顶层文件夹
└── nunchaku/   <-- [目标] 这是我们需要的内层文件夹
    ├── __init__.py
    └── ... (其他所有文件)

*复制这个内层 nunchaku 文件夹里的所有内容。

3. 粘贴到 ComfyUI 的 Python 环境中

  • 找到你的 ComfyUI 安装目录,并导航至以下路径:
    ...\ComfyUI\python_embeded\Lib\site-packages\nunchaku

    提示: ... 代表你的 ComfyUI 所在的磁盘,例如 E:\

复制的所有文件和文件夹,粘贴到这个Python环境下的的 nunchaku 文件夹中。

至此,核心库替换完成。

第二步:安装 ComfyUI 插件 (ComfyUI-nunchaku)

这一步是将插件的节点文件安装到 ComfyUI 的自定义节点目录中。

1. 导航到 custom_nodes 目录

  • 进入你的 ComfyUI 安装目录,找到 ComfyUI\custom_nodes\ 文件夹。
  • 重要: 如果该文件夹内已存在 ComfyUI-nunchaku 文件夹,请先删除它

2. 获取插件文件

同样,你有两种方式:

完成后,你的 custom_nodes 目录结构应该如下所示:

- ComfyUI
  - custom_nodes
    - ... (其他插件)
    - ComfyUI-nunchaku  <-- [插件文件夹]
      ├── __init__.py
      ├── nunchaku_qwen.py
      └── ... (其他插件文件)

第三步:重启并使用

  1. 重启 ComfyUI
    完全关闭正在运行的 ComfyUI,然后重新启动它。
  2. 在工作流中使用
    加载一个使用 Nunchaku Qwen image 的示例工作流,
    添加我们新的Nunchaku Qwen Image LoRA Stack节点并链接Nunchaku Qwen Image DiT Loader
    Nunchaku Qwen Image DiT LoaderNunchaku Qwen Image LoRA StackKSampler

求教,用这个方法最后没有出来nunchaku qwen image lora stack这个节点,看诊断包有下面一段,但是不知道要怎么处理
======================================== ComfyUI-nunchaku Initialization ========================================
Nunchaku version: 1.0.0
ComfyUI-nunchaku version: 1.0.1
W1022 22:14:56.860000 20020 python\Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
Multiple distributions found for package optimum. Picked distribution: optimum
'nunchaku_versions.json' not found. Node will start in minimal mode.

@risenh
Copy link

risenh commented Oct 22, 2025

安装搞定了,之前git clone的时候没有指定分支-b add-lora-support-for-qwen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.