-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Enable per-conversation loading states to allow having parallel conversations #16327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Would it be possible to push the new index? |
|
@ggerganov here you go (17e80c5) 😉 |
|
Noticed 2 issues:
|
LLama-Svelte-Parallel-4-AVC500Kbps.mp4 |
Yep. Will fix these |
|
I have a small feature request about this feature if possible: The list of conversation on the left should have some indicator about which conversations are currently being generated. |
Ah, of course, that's a very reasonable request :) |
5164657 to
9bfceda
Compare
Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution.
Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion.
|
Rebased on https://github.com/allozaur/llama.cpp/tree/16133-parallel-streaming + 1 commit dfd3ab8 (for model JSON payload) --parallel 5 UI monkey test: ParallelUIMonkeyTest-AVC400kbps.mp4Nothing to report, except that it takes a short moment for the slot to be released. |
|
Thanks, @ServeurpersoCom for testing this on your end. @ggerganov it would be great to hear an update from you on this as well. LMK! |
|
Works great. One problem - using "Regenerate" button does not trigger the "processing" indicator in the side panel. It is only triggered when submitting a new message. |
Ah, I see! I will push a patch for that and then merge 🙂 |
|
@ggerganov just pushed an update addressing the loading indicator for regenerated messages. |
…rsations (ggml-org#16327) * feat: Per-conversation loading states and tracking streaming stats * chore: update webui build output * refactor: Chat state management Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution. * feat: Adds loading indicator to conversation items * chore: update webui build output * fix: Fix aborting chat streaming Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion. * refactor: Remove redundant comments * chore: build webui static output * refactor: Cleanup * chore: update webui build output * chore: update webui build output * fix: Conversation loading indicator for regenerating messages * chore: update webui static build * feat: Improve configuration * feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI
…rsations (ggml-org#16327) * feat: Per-conversation loading states and tracking streaming stats * chore: update webui build output * refactor: Chat state management Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution. * feat: Adds loading indicator to conversation items * chore: update webui build output * fix: Fix aborting chat streaming Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion. * refactor: Remove redundant comments * chore: build webui static output * refactor: Cleanup * chore: update webui build output * chore: update webui build output * fix: Conversation loading indicator for regenerating messages * chore: update webui static build * feat: Improve configuration * feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI


Close #16133
Close #16398
Introduces granular loading state management for individual conversations, allowing concurrent message processing which prevents UI lockup and improves user experience when multiple conversations are active.