-
Notifications
You must be signed in to change notification settings - Fork 5
REPL, tracing and performance counters #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@GandalfTea heyo, #12 introduces some conflicts with this one but we can merge your PR later together, its just some code locations that were changed & some typings. |
|
should be an easy fix but I'll need to wrap more code in tracer frames to get detailed timings. I'll resolve with master later tonight, still have quite a few local changes to push. |
|
llama3 model spends most time stalling on comm queue. also true for gpt_oss. qwen3 doesn't have this problem. looking into this |
7c3934b to
9456d61
Compare
|
added For now frames are aggregated into groups like @erhant, @andthattoo let me know if you have any other default metrics to add here. I have more info I don't surface yet like network statistics that I would like to add. |
Correctly aggregating the lower-level frames to get time distribution per node. There's now a staging buffer for events that arrive before the |
|
Rough metrics for default config llama 3.3 70b 4bit on 2 Macs (M4, M4 Pro) and a MacBook (M3), 56gb total ram, TB4: |
This reverts commit a12eefb6f3807ff9f1812cd755743bc4664a8714.
fixes #10
This pr implements: