-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] kokoro onnx performance issues #23384
Comments
I experience the same issue. Reproduce: """
pip install kokoro-onnx soundfile
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
ONNX_PROVIDER=CoreMLExecutionProvider LOG_LEVEL=DEBUG uv run main.py
ONNX_PROVIDER=CPUExecutionProvider LOG_LEVEL=DEBUG uv run main.py
"""
import soundfile as sf
from kokoro_onnx import Kokoro
kokoro = Kokoro("kokoro-v0_19.onnx", "voices.json")
samples, sample_rate = kokoro.create(
"Hello. This audio generated by kokoro!", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav") |
I don't know what Kokoro does around onnxruntime, but typically you should create the inference session and re-use it. Session creation is expensive, as is the first inference. Any performance measurement should be done from the 2nd inference on to be meaningful. |
I create the session only once. There's option to add while loop in the line with |
There's a lot happening inside |
Describe the issue
Hello.
I'm trying to use a kokoro onnx model and I see there are a lot of performance difference between pytorch CPU and onnxruntime CPU (no special provider specified)
I've seen there is https://www.ui.perfetto.dev/ useful to investigate performance issues
I'm attaching a tracing of 3 inferences in a row, I don't have the skill to understand what's the problem
I've used CPU without any other SessionOptions specified, in c# with onnxruntime
onnxruntime_profile__2025-01-15_15-22-14.zip
using the query
select name, (dur/1000000) as ms, ts from slice where parent_id=3 AND category = 'Node' order by dur desc
where 3 is the slice id SequentialExecuter of a single inference run, that i used to filter infos of a specific inference run (I don't know if there was a better way to get it) I was able to sort the nodes execution by time spent but I'm not able to go further because I don't know how to evaluate these timings with expected onesI'd like someone to point me out to a resource about how to detect bottlenecks of a model, or someone who have the skill to help with the issue
To reproduce
thanks @thewh1teagle
Urgency
Could you please suggest how to properly understand what are the performance problem of a model?
It's hard to me to find proper documentation that guide me to understanding deeply how things works, any link to docs/tutorials are apprecciated (remind I'm mainly a C# guy)
Thanks
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
nuget package 1.20.1
ONNX Runtime API
C#
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: