fix: accelerate the first inference speed on low-level NPUs. #10050

muxuezzz · 2025-09-27T07:25:13Z

This PR will implement a binary mode to accelerate the first inference speed on NPUs.
When using the NPU:

For Atlas A2 training series products, jit_compile is disabled by default (jit_compile=False).
For Atlas training series products/Atlas inference series products, jit_compile is enabled by default (jit_compile=True).

This causes chips like the 310P to compile required operators in real-time during the first inference rather than using existing operators from the operator library. As a result, ComfyUI becomes extremely slow during the first generation. For example, with SD1.5, the first generation takes approximately 600 seconds without setting jit_compile=False, but only about 40 seconds after setting it to False.

Kosinkadink · 2025-09-29T21:33:19Z

Are there any performance penalties if jit_compile is set to False for the second run onwards, or is the sampling of subsequent runs the same regardless of jit_compile setting?

If there are not performance downsides to flipping it off, we'll merge this in, but I'd like your confirmation first!

fix: accelerate the first inference speed on low-level NPUs.

083d8aa

muxuezzz requested a review from Kosinkadink as a code owner September 27, 2025 07:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: accelerate the first inference speed on low-level NPUs. #10050

fix: accelerate the first inference speed on low-level NPUs. #10050

muxuezzz commented Sep 27, 2025

Uh oh!

Kosinkadink commented Sep 29, 2025

Uh oh!

Uh oh!

fix: accelerate the first inference speed on low-level NPUs. #10050

Are you sure you want to change the base?

fix: accelerate the first inference speed on low-level NPUs. #10050

Conversation

muxuezzz commented Sep 27, 2025

Uh oh!

Kosinkadink commented Sep 29, 2025

Uh oh!

Uh oh!