Skip to content

Conversation

@allozaur
Copy link
Collaborator

@allozaur allozaur commented Sep 29, 2025

Close #16133
Close #16398

Introduces granular loading state management for individual conversations, allowing concurrent message processing which prevents UI lockup and improves user experience when multiple conversations are active.

@ggerganov
Copy link
Member

Would it be possible to push the new index?

@allozaur
Copy link
Collaborator Author

@ggerganov here you go (17e80c5) 😉

@ggerganov
Copy link
Member

Noticed 2 issues:

  • The "regenerate" button stops working after I switch to another conv
  • The displayed stats do not correspond to the selected conv while another conv is being generated

@ServeurpersoCom
Copy link
Collaborator

LLama-Svelte-Parallel-4-AVC500Kbps.mp4

@allozaur
Copy link
Collaborator Author

Noticed 2 issues:

* The "regenerate" button stops working after I switch to another conv

* The displayed stats do not correspond to the selected conv while another conv is being generated

Yep. Will fix these

@ggerganov
Copy link
Member

I have a small feature request about this feature if possible:

The list of conversation on the left should have some indicator about which conversations are currently being generated.

@allozaur
Copy link
Collaborator Author

I have a small feature request about this feature if possible:

The list of conversation on the left should have some indicator about which conversations are currently being generated.

Ah, of course, that's a very reasonable request :)

@allozaur allozaur force-pushed the 16133-parallel-streaming branch from 5164657 to 9bfceda Compare October 17, 2025 11:11
Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states.

This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed.

Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution.
Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent.

This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion.
@ServeurpersoCom
Copy link
Collaborator

Rebased on https://github.com/allozaur/llama.cpp/tree/16133-parallel-streaming + 1 commit dfd3ab8 (for model JSON payload)

--parallel 5 UI monkey test:

ParallelUIMonkeyTest-AVC400kbps.mp4

Nothing to report, except that it takes a short moment for the slot to be released.

@allozaur allozaur marked this pull request as ready for review October 19, 2025 20:43
@allozaur
Copy link
Collaborator Author

Thanks, @ServeurpersoCom for testing this on your end. @ggerganov it would be great to hear an update from you on this as well. LMK!

@ggerganov
Copy link
Member

Works great. One problem - using "Regenerate" button does not trigger the "processing" indicator in the side panel. It is only triggered when submitting a new message.

@allozaur
Copy link
Collaborator Author

Works great. One problem - using "Regenerate" button does not trigger the "processing" indicator in the side panel. It is only triggered when submitting a new message.

Ah, I see! I will push a patch for that and then merge 🙂

@allozaur
Copy link
Collaborator Author

@ggerganov just pushed an update addressing the loading indicator for regenerated messages.

@allozaur
Copy link
Collaborator Author

There's some issues with npm's availability right now so the CI can't finish the E2E tests which are using npx http-server.

Zrzut ekranu 2025-10-20 o 11 45 54

@allozaur
Copy link
Collaborator Author

There's some issues with npm's availability right now so the CI can't finish the E2E tests which are using npx http-server.
Zrzut ekranu 2025-10-20 o 11 45 54

Okay, I've managed to work around it by installing http-server as a dev dependency.

@allozaur allozaur merged commit 13f2cfa into ggml-org:master Oct 20, 2025
14 checks passed
@allozaur allozaur deleted the 16133-parallel-streaming branch October 20, 2025 10:41
FMayran pushed a commit to FMayran/llama.cpp that referenced this pull request Oct 23, 2025
…rsations (ggml-org#16327)

* feat: Per-conversation loading states and tracking streaming stats

* chore: update webui build output

* refactor: Chat state management

Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states.

This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed.

Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution.

* feat: Adds loading indicator to conversation items

* chore: update webui build output

* fix: Fix aborting chat streaming

Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent.

This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion.

* refactor: Remove redundant comments

* chore: build webui static output

* refactor: Cleanup

* chore: update webui build output

* chore: update webui build output

* fix: Conversation loading indicator for regenerating messages

* chore: update webui static build

* feat: Improve configuration

* feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
…rsations (ggml-org#16327)

* feat: Per-conversation loading states and tracking streaming stats

* chore: update webui build output

* refactor: Chat state management

Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states.

This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed.

Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution.

* feat: Adds loading indicator to conversation items

* chore: update webui build output

* fix: Fix aborting chat streaming

Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent.

This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion.

* refactor: Remove redundant comments

* chore: build webui static output

* refactor: Cleanup

* chore: update webui build output

* chore: update webui build output

* fix: Conversation loading indicator for regenerating messages

* chore: update webui static build

* feat: Improve configuration

* feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants