Skip to content

Conversation

muxuezzz
Copy link
Contributor

This PR will implement a binary mode to accelerate the first inference speed on NPUs.
When using the NPU:

  • For Atlas A2 training series products, jit_compile is disabled by default (jit_compile=False).
  • For Atlas training series products/Atlas inference series products, jit_compile is enabled by default (jit_compile=True).

This causes chips like the 310P to compile required operators in real-time during the first inference rather than using existing operators from the operator library. As a result, ComfyUI becomes extremely slow during the first generation. For example, with SD1.5, the first generation takes approximately 600 seconds without setting jit_compile=False, but only about 40 seconds after setting it to False.

@Kosinkadink
Copy link
Collaborator

Are there any performance penalties if jit_compile is set to False for the second run onwards, or is the sampling of subsequent runs the same regardless of jit_compile setting?

If there are not performance downsides to flipping it off, we'll merge this in, but I'd like your confirmation first!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants