-
-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[audio] Add pcm audio websocket with dialog support #4032
base: main
Are you sure you want to change the base?
Conversation
As the websocket adapter is for audio streams in general, I would say that it should ideally be placed in the audio bundle. |
Yes, having it on the voice bundle is quite unintuitive, I haven't though on that solution seems like the correct one. Thank you for the feedback! |
Signed-off-by: Miguel Álvarez <[email protected]>
11d3025
to
17970d5
Compare
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
I have to refactor the code into the websocket.io bundle for it to build without modifying the BOM features, but I think it also makes sense to have it there with the other WebSocketAdapter implementations. |
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
I think these is ready for the review. I was able to write a POC in the UI and everything seems to work. A quick video recorded with my mobile: video_2024-01-16.mp4Still need a lot of work on the UI side, I'm going to create a linked issue there asking for some guidance, I wanted to first have a POC to know there are no big issues with the integration. Best regards! |
Signed-off-by: Miguel Álvarez <[email protected]>
@openhab/core-maintainers Would be great if this could be reviewed for openHAB 5, as it would allow us to have a voice assistant inside the UI. |
What would be nice is to have the same WebSocket API format as Home Assistant so we could use ESPHome voice satellites. I don’t know how feasible that is, I think I found HA‘s voice assistant WS endpoint implementation here: https://github.com/home-assistant/core/blob/dev/homeassistant/components/assist_pipeline/websocket_api.py |
I fully agree. Sorry @GiviMAD for having neglected that topic over the past months - I'll do my very best to drive this forward again for 5.0 (i.e. right after the 4.3 release)! |
This pull request has been mentioned on openHAB Community. There might be relevant details there: https://community.openhab.org/t/creation-an-voice-audio-satellite-with-the-help-of-an-esp32/160591/3 |
Signed-off-by: Miguel Álvarez Díez <[email protected]>
4c5af80
to
779f5c8
Compare
Hey @kaikreuzer and @florian-h05 sorry for the late reply. I have rebased the PR and done some changes to simplify its usage. Basically I have moved the audio conversion to the server. If you have any time to review it will be great having it available on the snapshots. Also @dalgwen, I forgot to mention you here but if you can take a look in case you see something weird, it will be great. Best regards! |
Thanks! Did you have a look at HA‘s voice protocol implementation? API compatibility on the openHAB side would have the benefit that we could make use of HA‘s work in voice satellites, which would make things a lot easier for us as we would have voice satellites already available without the need to take care of developing them. |
Yes, forgot to answer that sorry. I will prefer to keep to the current implementation, over all because after the latest changes I think its easy to consume. As far as I see they use the ws binary protocol to transmit the source data and they require it to have a specific spec (16000hz, 16bits, 1channel) and the string protocol to send the sink data because they need to transmit identity of the stream and the spec of the format, that requires them to convert the data to base64 so it don't get corrupted. I use 8 bytes as a header for each chunk of audio data and always use the binary format to send the audio data. So the client can choose what spec send to the source it just needs to generate the correct header for the data, and the sink work in a similar way, you remove the first 8 bytes and they contain the format and the stream identifier. I also allow to force the format of the sink so you don't have to make the audio conversion and force a specific spec for the sink data. Apart from that I think that supporting those devices will be nice, but I think is better to have it in a specific bundle that expose that exact ws protocol and also allow you to discover the connected devices to interact with them through oh instead of as part of the core. I need to look for one of those devices to see if I can do it. |
43cef7f
to
f5c22ae
Compare
Thanks for the explanation - I'm fine with that , just wanted to know what you think about it. |
Signed-off-by: Miguel Álvarez Díez <[email protected]>
f5c22ae
to
631153e
Compare
These PR adds a WebSocket Adapter that allows transferring PCM audio to a sink and source whose existence will be tied to the WebSocket connection (register/unregister the components on connection/disconnection) also allows to spawn a dialog processor instance connected to those (will also be tear down on disconnection).
The PR is incomplete and untested, I'll try to add a client to the UI and develop both PR in parallel.
Also IDK if placing the code in the voice bundle is correct, I did it there because it already requires the audio bundle. For me it makes more sense to do it on the audio bundle but I'll have to require both the voice and websocket bundles there.
Best regards!