Enable per-conversation loading states to allow having parallel conversations #16327

allozaur · 2025-09-29T10:19:15Z

Introduces granular loading state management for individual conversations, allowing concurrent message processing which prevents UI lockup and improves user experience when multiple conversations are active.

ggerganov · 2025-09-29T12:39:52Z

Would it be possible to push the new index?

allozaur · 2025-09-29T14:10:23Z

@ggerganov here you go (17e80c5) 😉

ggerganov · 2025-09-29T14:18:57Z

Noticed 2 issues:

The "regenerate" button stops working after I switch to another conv
The displayed stats do not correspond to the selected conv while another conv is being generated

ServeurpersoCom · 2025-09-29T14:22:43Z

LLama-Svelte-Parallel-4-AVC500Kbps.mp4

allozaur · 2025-09-29T14:23:46Z

Noticed 2 issues:

* The "regenerate" button stops working after I switch to another conv

* The displayed stats do not correspond to the selected conv while another conv is being generated

Yep. Will fix these

ggerganov · 2025-09-29T14:26:25Z

I have a small feature request about this feature if possible:

The list of conversation on the left should have some indicator about which conversations are currently being generated.

allozaur · 2025-09-29T14:27:43Z

I have a small feature request about this feature if possible:

The list of conversation on the left should have some indicator about which conversations are currently being generated.

Ah, of course, that's a very reasonable request :)

Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution.

Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion.

ServeurpersoCom · 2025-10-19T19:14:21Z

Rebased on https://github.com/allozaur/llama.cpp/tree/16133-parallel-streaming + 1 commit dfd3ab8 (for model JSON payload)

--parallel 5 UI monkey test:

ParallelUIMonkeyTest-AVC400kbps.mp4

Nothing to report, except that it takes a short moment for the slot to be released.

allozaur · 2025-10-19T20:45:03Z

Thanks, @ServeurpersoCom for testing this on your end. @ggerganov it would be great to hear an update from you on this as well. LMK!

…aming

ggerganov · 2025-10-20T07:23:13Z

Works great. One problem - using "Regenerate" button does not trigger the "processing" indicator in the side panel. It is only triggered when submitting a new message.

allozaur · 2025-10-20T07:24:34Z

Works great. One problem - using "Regenerate" button does not trigger the "processing" indicator in the side panel. It is only triggered when submitting a new message.

Ah, I see! I will push a patch for that and then merge 🙂

allozaur · 2025-10-20T08:34:46Z

@ggerganov just pushed an update addressing the loading indicator for regenerated messages.

allozaur · 2025-10-20T09:46:50Z

There's some issues with npm's availability right now so the CI can't finish the E2E tests which are using npx http-server.

…`npx` in CI

allozaur · 2025-10-20T10:32:25Z

There's some issues with npm's availability right now so the CI can't finish the E2E tests which are using npx http-server.

Okay, I've managed to work around it by installing http-server as a dev dependency.

…rsations (ggml-org#16327) * feat: Per-conversation loading states and tracking streaming stats * chore: update webui build output * refactor: Chat state management Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution. * feat: Adds loading indicator to conversation items * chore: update webui build output * fix: Fix aborting chat streaming Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion. * refactor: Remove redundant comments * chore: build webui static output * refactor: Cleanup * chore: update webui build output * chore: update webui build output * fix: Conversation loading indicator for regenerating messages * chore: update webui static build * feat: Improve configuration * feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI

github-actions bot added examples server labels Sep 29, 2025

ggerganov mentioned this pull request Oct 3, 2025

Eval bug: switching to another chat in webview stops generation #16398

Closed

allozaur added 2 commits October 17, 2025 12:51

feat: Per-conversation loading states and tracking streaming stats

02a6536

chore: update webui build output

9bfceda

allozaur force-pushed the 16133-parallel-streaming branch from 5164657 to 9bfceda Compare October 17, 2025 11:11

allozaur added 8 commits October 17, 2025 22:25

feat: Adds loading indicator to conversation items

761e432

chore: update webui build output

55ecb9b

refactor: Remove redundant comments

58b1a0e

chore: build webui static output

26e9c18

refactor: Cleanup

821d543

chore: update webui build output

cdb1be2

allozaur marked this pull request as ready for review October 19, 2025 20:43

allozaur added 2 commits October 20, 2025 01:04

Merge remote-tracking branch 'origin/master' into 16133-parallel-stre…

5b3a673

…aming

chore: update webui build output

4086967

ggerganov approved these changes Oct 20, 2025

View reviewed changes

fix: Conversation loading indicator for regenerating messages

b288550

chore: update webui static build

ff15caf

feat: Improve configuration

33741e0

feat: Install http-server as dev dependency to not need to rely on …

7dabc2a

…`npx` in CI

allozaur merged commit 13f2cfa into ggml-org:master Oct 20, 2025
14 checks passed

allozaur deleted the 16133-parallel-streaming branch October 20, 2025 10:41

firecoperana mentioned this pull request Oct 26, 2025

Add --webui arg to launch llama.cpp new webui ikawrakow/ik_llama.cpp#786

Merged

Enable per-conversation loading states to allow having parallel conversations #16327

Enable per-conversation loading states to allow having parallel conversations #16327

Uh oh!

Conversation

allozaur commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Sep 29, 2025

Uh oh!

allozaur commented Sep 29, 2025

Uh oh!

ggerganov commented Sep 29, 2025

Uh oh!

ServeurpersoCom commented Sep 29, 2025

Uh oh!

allozaur commented Sep 29, 2025

Uh oh!

ggerganov commented Sep 29, 2025

Uh oh!

allozaur commented Sep 29, 2025

Uh oh!

ServeurpersoCom commented Oct 19, 2025

Uh oh!

allozaur commented Oct 19, 2025

Uh oh!

ggerganov commented Oct 20, 2025

Uh oh!

allozaur commented Oct 20, 2025

Uh oh!

allozaur commented Oct 20, 2025

Uh oh!

allozaur commented Oct 20, 2025

Uh oh!

allozaur commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allozaur commented Sep 29, 2025 •

edited

Loading