-
Notifications
You must be signed in to change notification settings - Fork 34
Implement botOutput event handling with fallback support for older server versions #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
51f8672 to
56cc4a9
Compare
mattieruth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my only other thought is that I'm just wondering if it's possible to have the luxury of requiring a newer RTVI version vs. trying to back support.
if not (and i'm guessing we can't)... we should have comments and a plan for removing the deprecated logic in the future.
| export interface UseBotMessagesCallbacks { | ||
| /** | ||
| * Called when a bot message stream starts for a given type. | ||
| * @param type - The message type: "llm" for unspoken content, "tts" for spoken content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think spoken or not spoken is really the preferred terminology. the content received could have been manipulated since the llm. Also, the sentence-level events with spoken: false are technically coming from the TTS still, so it's really not an accurate mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, in general, we've tried to shy away from using TTS and LLM in our terminology unless the thing really truly corresponds to tts/llm. The whole point of calling it bot output was to disambiguate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so there's actually a mental shift, which isn't really represented, yet, to be kind of "backwards-compatible" with the older tts/llm events. This actually has some broader impact on the way we handle these events in the Voice UI Kit, e.g. when it comes to message rendering. Currently we display a TTS/LLM toggle, but in fact this would go away when handling BotOutput events. Perhaps I'll have to move this distinction higher up and switch the logic in a couple more components. If I understand correctly the BotOutput event allows to pre-render an entire sentence and highlight only the spoken words, at least that's what I remember Mark said to me a while back. I'm not sure I can wrap this up before the season, but I'll see how far I can get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is all legacy and not the preferred way of doing things, the naming here feels too nice/encouraging. Maybe call it UseLegacyBotMessageCallbacks? We should be extra clear that this is only here for backwards compatibility. Which brings back my question of... if people have to update their client to support older pipelines... shouldn't they just update their pipelines? is there a version of this where we don't add all this complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though... one could argue that they really want the raw llm and tts events for whatever reason... in which case this isn't actually legacy. It's more of way to support raw events. So maybe UseRawServiceMessagesCallbacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like that, actually. That also makes the use of "llm" and "tts" make more sense and be accurately named.
- remove unnecessary caching of llm/tts events - simplify botOutputSupported checks
This PR updates the UI Kit to use the latest client versions, which adds support for the new preferred
botOutputevent.The
ConversationProviderandTranscriptOverlaycomponents are updated to use a newuseBotMessageshook, which abstracts away aprobingmechanism for detecting if thebotOutputevent is supported by the connected bot, by checking thebotReadyevent forpipecat-aiversion information.In case the client doesn't receive abotOutputevent within a configurable timeout (default is 300ms) it will fallback to botLlm* and botTts* events for collecting messages.The
EventsPanelwill also logbotOutputevents now.