-
Notifications
You must be signed in to change notification settings - Fork 8
Small WebRTC transport for Android #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| val connectionUrl = "http://localhost:7860/api/offer" | ||
|
|
||
| val client = RTVIClient(SmallWebRTCTransport.Factory(context, connectionUrl), callbacks, options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in the future, we’ll need to change this so we receive the connectionUrl as a result from the authBundle, kind of the same way it’s happening with Daily right now.
But if we get to that point, we’ll also need to change the iOS SDK and Web SDK.
For now, this is the way to go.
| import kotlinx.serialization.Serializable | ||
|
|
||
| @Serializable | ||
| @ConsistentCopyVisibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking if these changes are intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was a fix for a Kotlin compiler warning
| private const val TAG = "SmallWebRTCTransport" | ||
| } | ||
|
|
||
| object AudioDevices { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also support Bluetooth and Wired Headsets ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Android provides APIs for managing this, I think it's best to avoid trying to abstract this in the transport and potentially conflict with what the user is doing. Even the "speakerphone" device is just setting a single boolean in the audio service, which it would probably be better for the user to do themselves.
| private var client: WebRTCClient? = null | ||
| private var selectedCam = CameraMode.Front | ||
|
|
||
| override fun initDevices(): Future<Unit, RTVIError> = resolvedPromiseOk(thread, Unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to implement it to initialize devices state and report initial available and selected devices ?
This is what I am triggering on iOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current audio devices implementation there shouldn't be anything to query.
|
|
||
|
|
||
| transportContext.callbacks.onInputsUpdated( | ||
| camera = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we respect the option if the camera is enabled or not ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually I think this call can be removed entirely.
| if (msgWithType.type == "signalling") { | ||
| val msg = JSON_INSTANCE.decodeFromJsonElement<SignallingEvent>(msgJson) | ||
|
|
||
| when (msg.message.type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have also the "renegotiate" message, where the server asks the client to renegotiate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this message is currently used on iOS (and neither is peerLeft IIRC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
The
renegotiateone, it is: -
The
peerLeft, not yet, because I have not implemented on iOS the reconnection logic yet, which we currently have on web. This is currently something pending in my TODO list. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, now implemented!
| ) | ||
| } | ||
|
|
||
| enableMic(transportContext.options.enableMic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do the same for the camera ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, this line actually needed removing
|
|
||
| Log.i(TAG, "Creating PeerConnection") | ||
|
|
||
| val iceServers = ArrayList<PeerConnection.IceServer>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to allow the users to configure the iceServers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that on iOS we let the user configure the URLs, however WebRTC exposes a number of other params:
public Builder setUsername(String username) {
this.username = username;
return this;
}
public Builder setPassword(String password) {
this.password = password;
return this;
}
public Builder setTlsCertPolicy(TlsCertPolicy tlsCertPolicy) {
this.tlsCertPolicy = tlsCertPolicy;
return this;
}
public Builder setHostname(String hostname) {
this.hostname = hostname;
return this;
}
public Builder setTlsAlpnProtocols(List<String> tlsAlpnProtocols) {
this.tlsAlpnProtocols = tlsAlpnProtocols;
return this;
}
public Builder setTlsEllipticCurves(List<String> tlsEllipticCurves) {
this.tlsEllipticCurves = tlsEllipticCurves;
return this;
}It might be best to wait until someone needs this functionality and then do a more comprehensive implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is probably something we will need to add there soon enough. But I agree, we can add that in a follow up PR.
| localAudioTrack = peerConnectionFactory.createAudioTrack("mic", audioSource).apply { | ||
| TrackRegistry.add(this) | ||
| audioSender = peerConnection.addTrack(this) | ||
| } | ||
|
|
||
| localVideoTrack = peerConnectionFactory.createVideoTrack("cam", videoSource).apply { | ||
| TrackRegistry.add(this) | ||
| videoSender = peerConnection.addTrack(this) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SmallWebRTCTransport on the server side, it is always expecting that we are going to work with transceivers.
- So, the first transceiver should send the audio track.
- The second transceiver should send the video track.
Like we are creating it here on iOS:
https://github.com/pipecat-ai/pipecat-client-ios-small-webrtc/blob/a6e4516b1fcbed772ca97a9616dddc9329097958/Sources/PipecatClientIOSSmallWebrtc/SmallWebRTCConnection.swift#L244C18-L251
This allow us to replace the tracks later without the need to renegotiate.
The transceiver order is always respecting. So for example, this is how we try to get the audio and video input tracks on Pipecat:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the client doesn't replace tracks, and as far as I could tell the server doesn't support replacing tracks either. The audio/video tracks are fetched once in _handle_client_connected(), and then never again:
async def _handle_client_connected(self):
# There is nothing to do here yet, the pipeline is still not ready
if not self._params:
return
self._audio_input_track = self._webrtc_connection.audio_input_track()
self._video_input_track = self._webrtc_connection.video_input_track()So I don't think right now there would be any benefit from creating the transceivers manually. Even so, I'll see if it's a quick change and do it if so.
As a side note, I think it's risky/fragile to rely on the transceiver order being preserved between the server and client. Ideally the server should dynamically detect tracks being started/stopped to avoid being dependent on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the client doesn't replace tracks, and as far as I could tell the server doesn't support replacing tracks either. The audio/video tracks are fetched once in _handle_client_connected(), and then never again:
When you're using transceivers, you can replace the track on the client side, and the server will continue receiving it under the same "mid". So, it just works. You don't need a new SDP renegotiation between the peers.
For example, you can start without sending a video track and later enable your camera to begin sending one. Since you're simply replacing the track on the transceiver, this process is transparent from the server's perspective.
One thing I've already implemented on the web, but not yet on iOS, is a signaling message to report the track’s status (whether it's enabled or not):
This shouldn't be necessary. In browsers, for example, if a track isn’t sending data, it automatically changes its state to muted. But this doesn’t happen with aiortc, so I had to implement a signaling message to explicitly inform aiortc when a track is enabled or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've modified the code to create the transceivers explicitly, and only create the tracks as needed. Can confirm that replacing the track on the transceiver seems to work.
| audioSender = peerConnection.addTrack(this) | ||
| } | ||
|
|
||
| localVideoTrack = peerConnectionFactory.createVideoTrack("cam", videoSource).apply { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should only create the video track if enableCam is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the audio, create it or not based on the enableMic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think enableCam and enableMic are just the initial states, and currently on iOS, setting them to false stops the client from ever sending those tracks. On Android we deliberately create the tracks regardless of whether they're hooked up to a source, to work around the server not being able to handle track changes later on.
I.e. if enableCam is initially set to false, but later the user calls showVideo(), this currently won't work on iOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. if enableCam is initially set to false, but later the user calls showVideo(), this currently won't work on iOS.
If this is happening, this is an issue that we need to fix on iOS.
On Android we deliberately create the tracks regardless of whether they're hooked up to a source, to work around the server not being able to handle track changes later on.
You can change the tracks later on, which is why we're using transceivers on the server. We're using two transceivers. Currently one for audio and one for video. And you can simply replace the track on each transceiver.
You don't need to have a track to create a transceiver. You can just specify the kind ("audio" or "video").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've changed this. We should definitely revisit iOS though as, as far as I can tell, the tracks are only ever created in SmallWebRTCConnection.init -> createMediaSenders(), and only if self.enableCam is set (which means that if it's initially false, then there will never be a track).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is on my todo list. We need to fix it there for sure.
|
Thanks for the comments @filipi87! This is ready for re-review. |
filipi87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for applying the changes, @marcus-daily. Looks great! 🚀
|
Great, thanks @filipi87! |
No description provided.