-
Notifications
You must be signed in to change notification settings - Fork 1.2k
text input from datastream for v1.0 #1521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d371b70
a316ecb
07e79ac
6ff00cb
fffa2dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,8 @@ | |
|
||
@dataclass(frozen=True) | ||
class RoomInputOptions: | ||
text_enabled: bool = True | ||
"""Whether to subscribe to text input""" | ||
audio_enabled: bool = True | ||
"""Whether to subscribe to audio""" | ||
video_enabled: bool = False | ||
|
@@ -48,6 +50,7 @@ class RoomOutputOptions: | |
DEFAULT_ROOM_INPUT_OPTIONS = RoomInputOptions() | ||
DEFAULT_ROOM_OUTPUT_OPTIONS = RoomOutputOptions() | ||
LK_PUBLISH_FOR_ATTR = "lk.publish_for" | ||
LK_TEXT_INPUT_TOPIC = "lk.room_text_input" | ||
|
||
|
||
class BaseStreamHandle: | ||
|
@@ -226,6 +229,7 @@ def __init__( | |
""" | ||
self._options = options | ||
self._room = room | ||
self._agent: Optional["PipelineAgent"] = None | ||
self._tasks: set[asyncio.Task] = set() | ||
|
||
# target participant | ||
|
@@ -263,6 +267,12 @@ def __init__( | |
for participant in self._room.remote_participants.values(): | ||
self._on_participant_connected(participant) | ||
|
||
# text input from datastream | ||
if options.text_enabled: | ||
self._room.register_text_stream_handler( | ||
LK_TEXT_INPUT_TOPIC, self._on_text_input | ||
) | ||
|
||
@property | ||
def audio(self) -> AsyncIterator[rtc.AudioFrame] | None: | ||
if not self._audio_handle: | ||
|
@@ -287,7 +297,9 @@ async def start(self, agent: Optional["PipelineAgent"] = None) -> None: | |
# link to the first connected participant if not set | ||
self.set_participant(participant.identity) | ||
|
||
if not agent: | ||
# TODO(long): should we force the agent to be set or provide a set_agent method? | ||
self._agent = agent | ||
if not self._agent: | ||
return | ||
|
||
agent.input.audio = self.audio | ||
|
@@ -399,6 +411,28 @@ async def _capture_text(): | |
self._tasks.add(task) | ||
task.add_done_callback(self._tasks.discard) | ||
|
||
def _on_text_input( | ||
self, reader: rtc.TextStreamReader, participant_identity: str | ||
) -> None: | ||
if participant_identity != self._participant_identity: | ||
return | ||
|
||
async def _read_text(): | ||
if not self._agent: | ||
return | ||
|
||
text = await reader.read_all() | ||
logger.debug( | ||
"received text input", | ||
extra={"text": text, "participant": self._participant_identity}, | ||
) | ||
self._agent.interrupt() | ||
self._agent.generate_reply(user_input=text) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we are calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's make agent required There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I'll do that when merging the RoomInput and Output. |
||
|
||
task = asyncio.create_task(_read_text()) | ||
self._tasks.add(task) | ||
task.add_done_callback(self._tasks.discard) | ||
|
||
async def aclose(self) -> None: | ||
self._room.off("participant_connected", self._on_participant_connected) | ||
self._room.off("participant_disconnected", self._on_participant_disconnected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be clear about what topics we want to support. If the goal is to have this work ootb with chat components, then choosing a custom topic here might not be the best choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question is are we going to keep the chat components in python/js sdk, and having both the original and the datastream or only the datastream for it? If so I can adjust here accordingly. cc @davidzhao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the chat components are for client-side. but Python/Node agents should agree on the same topic so that it works with the client-side components
@lukasIO what do you recommend we should use? is client-side component listening to both the transcription topic and chat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently chat components only listen to the chat topic and also send their messages only on the chat topic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would this work with how we are sending transcriptions? do you suggest also sending transcriptions to chat topic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was my understanding, yes. But maybe I misunderstood or am forgetting something.
Why wouldn't you want it on the chat topic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly wondering if there's any conflicts between what the agent would want to use as input.. versus what is being spit out as output.
i.e. if there are two agents in the room, would that cause any cross talk.. or if the agent is being added to a livestream with a chat feature, would it automatically start interpreting random transcripts.
for that reason it seems it might be a good idea to be explicit about what is being sent to the agent as input?