Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[audio] Add pcm audio websocket with dialog support #4032

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

GiviMAD
Copy link
Member

@GiviMAD GiviMAD commented Jan 10, 2024

These PR adds a WebSocket Adapter that allows transferring PCM audio to a sink and source whose existence will be tied to the WebSocket connection (register/unregister the components on connection/disconnection) also allows to spawn a dialog processor instance connected to those (will also be tear down on disconnection).

The PR is incomplete and untested, I'll try to add a client to the UI and develop both PR in parallel.

Also IDK if placing the code in the voice bundle is correct, I did it there because it already requires the audio bundle. For me it makes more sense to do it on the audio bundle but I'll have to require both the voice and websocket bundles there.

Best regards!

@GiviMAD GiviMAD requested a review from a team as a code owner January 10, 2024 20:45
@kaikreuzer
Copy link
Member

Also IDK if placing the code in the voice bundle is correct

As the websocket adapter is for audio streams in general, I would say that it should ideally be placed in the audio bundle.
The dependency on the voice manager is very small; maybe you could come up with some small hook, which could be used by the voice bundle to register the dialog processing logic into the websocket adapter?

@GiviMAD
Copy link
Member Author

GiviMAD commented Jan 13, 2024

Also IDK if placing the code in the voice bundle is correct

As the websocket adapter is for audio streams in general, I would say that it should ideally be placed in the audio bundle. The dependency on the voice manager is very small; maybe you could come up with some small hook, which could be used by the voice bundle to register the dialog processing logic into the websocket adapter?

Yes, having it on the voice bundle is quite unintuitive, I haven't though on that solution seems like the correct one.

Thank you for the feedback!

@GiviMAD GiviMAD force-pushed the voice/audio_pcm_websocket branch from 11d3025 to 17970d5 Compare January 14, 2024 11:33
@GiviMAD GiviMAD changed the title [WIP] [audio/voice] Add pcm audio websocket with dialog support [WIP] [audio] Add pcm audio websocket with dialog support Jan 14, 2024
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
@GiviMAD
Copy link
Member Author

GiviMAD commented Jan 14, 2024

I have to refactor the code into the websocket.io bundle for it to build without modifying the BOM features, but I think it also makes sense to have it there with the other WebSocketAdapter implementations.

Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
@wborn wborn added the enhancement An enhancement or new feature of the Core label Jan 16, 2024
Signed-off-by: Miguel Álvarez <[email protected]>
@GiviMAD
Copy link
Member Author

GiviMAD commented Jan 16, 2024

I think these is ready for the review.

I was able to write a POC in the UI and everything seems to work.

A quick video recorded with my mobile:

video_2024-01-16.mp4

Still need a lot of work on the UI side, I'm going to create a linked issue there asking for some guidance, I wanted to first have a POC to know there are no big issues with the integration.

Best regards!

@GiviMAD GiviMAD changed the title [WIP] [audio] Add pcm audio websocket with dialog support [audio] Add pcm audio websocket with dialog support Jan 16, 2024
Signed-off-by: Miguel Álvarez <[email protected]>
@florian-h05
Copy link
Contributor

@openhab/core-maintainers Would be great if this could be reviewed for openHAB 5, as it would allow us to have a voice assistant inside the UI.

@florian-h05
Copy link
Contributor

What would be nice is to have the same WebSocket API format as Home Assistant so we could use ESPHome voice satellites. I don’t know how feasible that is, I think I found HA‘s voice assistant WS endpoint implementation here: https://github.com/home-assistant/core/blob/dev/homeassistant/components/assist_pipeline/websocket_api.py

@kaikreuzer
Copy link
Member

Would be great if this could be reviewed for openHAB 5, as it would allow us to have a voice assistant inside the UI.

I fully agree. Sorry @GiviMAD for having neglected that topic over the past months - I'll do my very best to drive this forward again for 5.0 (i.e. right after the 4.3 release)!

@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/creation-an-voice-audio-satellite-with-the-help-of-an-esp32/160591/3

@GiviMAD GiviMAD force-pushed the voice/audio_pcm_websocket branch from 4c5af80 to 779f5c8 Compare January 8, 2025 21:48
@GiviMAD
Copy link
Member Author

GiviMAD commented Jan 8, 2025

Hey @kaikreuzer and @florian-h05 sorry for the late reply. I have rebased the PR and done some changes to simplify its usage. Basically I have moved the audio conversion to the server. If you have any time to review it will be great having it available on the snapshots.

Also @dalgwen, I forgot to mention you here but if you can take a look in case you see something weird, it will be great.

Best regards!

@florian-h05
Copy link
Contributor

Thanks!

Did you have a look at HA‘s voice protocol implementation? API compatibility on the openHAB side would have the benefit that we could make use of HA‘s work in voice satellites, which would make things a lot easier for us as we would have voice satellites already available without the need to take care of developing them.

@GiviMAD
Copy link
Member Author

GiviMAD commented Jan 9, 2025

Thanks!

Did you have a look at HA‘s voice protocol implementation? API compatibility on the openHAB side would have the benefit that we could make use of HA‘s work in voice satellites, which would make things a lot easier for us as we would have voice satellites already available without the need to take care of developing them.

Yes, forgot to answer that sorry.

I will prefer to keep to the current implementation, over all because after the latest changes I think its easy to consume. As far as I see they use the ws binary protocol to transmit the source data and they require it to have a specific spec (16000hz, 16bits, 1channel) and the string protocol to send the sink data because they need to transmit identity of the stream and the spec of the format, that requires them to convert the data to base64 so it don't get corrupted. I use 8 bytes as a header for each chunk of audio data and always use the binary format to send the audio data. So the client can choose what spec send to the source it just needs to generate the correct header for the data, and the sink work in a similar way, you remove the first 8 bytes and they contain the format and the stream identifier. I also allow to force the format of the sink so you don't have to make the audio conversion and force a specific spec for the sink data.

Apart from that I think that supporting those devices will be nice, but I think is better to have it in a specific bundle that expose that exact ws protocol and also allow you to discover the connected devices to interact with them through oh instead of as part of the core. I need to look for one of those devices to see if I can do it.

@GiviMAD GiviMAD force-pushed the voice/audio_pcm_websocket branch from 43cef7f to f5c22ae Compare January 9, 2025 17:21
@florian-h05
Copy link
Contributor

Thanks for the explanation - I'm fine with that , just wanted to know what you think about it.

Signed-off-by: Miguel Álvarez Díez <[email protected]>
@GiviMAD GiviMAD force-pushed the voice/audio_pcm_websocket branch from f5c22ae to 631153e Compare January 9, 2025 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature of the Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants