Skip to content

Conversation

@marcus-daily
Copy link
Contributor

No description provided.

@marcus-daily marcus-daily requested a review from filipi87 May 6, 2025 13:49
@marcus-daily marcus-daily marked this pull request as ready for review May 6, 2025 13:49

val connectionUrl = "http://localhost:7860/api/offer"

val client = RTVIClient(SmallWebRTCTransport.Factory(context, connectionUrl), callbacks, options)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in the future, we’ll need to change this so we receive the connectionUrl as a result from the authBundle, kind of the same way it’s happening with Daily right now.
But if we get to that point, we’ll also need to change the iOS SDK and Web SDK.
For now, this is the way to go.

import kotlinx.serialization.Serializable

@Serializable
@ConsistentCopyVisibility
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking if these changes are intentional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was a fix for a Kotlin compiler warning

private const val TAG = "SmallWebRTCTransport"
}

object AudioDevices {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also support Bluetooth and Wired Headsets ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Android provides APIs for managing this, I think it's best to avoid trying to abstract this in the transport and potentially conflict with what the user is doing. Even the "speakerphone" device is just setting a single boolean in the audio service, which it would probably be better for the user to do themselves.

private var client: WebRTCClient? = null
private var selectedCam = CameraMode.Front

override fun initDevices(): Future<Unit, RTVIError> = resolvedPromiseOk(thread, Unit)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to implement it to initialize devices state and report initial available and selected devices ?
This is what I am triggering on iOS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current audio devices implementation there shouldn't be anything to query.



transportContext.callbacks.onInputsUpdated(
camera = false,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we respect the option if the camera is enabled or not ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually I think this call can be removed entirely.

if (msgWithType.type == "signalling") {
val msg = JSON_INSTANCE.decodeFromJsonElement<SignallingEvent>(msgJson)

when (msg.message.type) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have also the "renegotiate" message, where the server asks the client to renegotiate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this message is currently used on iOS (and neither is peerLeft IIRC)

Copy link

@filipi87 filipi87 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now implemented!

)
}

enableMic(transportContext.options.enableMic)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the same for the camera ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, this line actually needed removing


Log.i(TAG, "Creating PeerConnection")

val iceServers = ArrayList<PeerConnection.IceServer>()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to allow the users to configure the iceServers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that on iOS we let the user configure the URLs, however WebRTC exposes a number of other params:

      public Builder setUsername(String username) {
        this.username = username;
        return this;
      }

      public Builder setPassword(String password) {
        this.password = password;
        return this;
      }

      public Builder setTlsCertPolicy(TlsCertPolicy tlsCertPolicy) {
        this.tlsCertPolicy = tlsCertPolicy;
        return this;
      }

      public Builder setHostname(String hostname) {
        this.hostname = hostname;
        return this;
      }

      public Builder setTlsAlpnProtocols(List<String> tlsAlpnProtocols) {
        this.tlsAlpnProtocols = tlsAlpnProtocols;
        return this;
      }

      public Builder setTlsEllipticCurves(List<String> tlsEllipticCurves) {
        this.tlsEllipticCurves = tlsEllipticCurves;
        return this;
      }

It might be best to wait until someone needs this functionality and then do a more comprehensive implementation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is probably something we will need to add there soon enough. But I agree, we can add that in a follow up PR.

Comment on lines 176 to 184
localAudioTrack = peerConnectionFactory.createAudioTrack("mic", audioSource).apply {
TrackRegistry.add(this)
audioSender = peerConnection.addTrack(this)
}

localVideoTrack = peerConnectionFactory.createVideoTrack("cam", videoSource).apply {
TrackRegistry.add(this)
videoSender = peerConnection.addTrack(this)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SmallWebRTCTransport on the server side, it is always expecting that we are going to work with transceivers.

  • So, the first transceiver should send the audio track.
  • The second transceiver should send the video track.

Like we are creating it here on iOS:
https://github.com/pipecat-ai/pipecat-client-ios-small-webrtc/blob/a6e4516b1fcbed772ca97a9616dddc9329097958/Sources/PipecatClientIOSSmallWebrtc/SmallWebRTCConnection.swift#L244C18-L251

This allow us to replace the tracks later without the need to renegotiate.

The transceiver order is always respecting. So for example, this is how we try to get the audio and video input tracks on Pipecat:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the client doesn't replace tracks, and as far as I could tell the server doesn't support replacing tracks either. The audio/video tracks are fetched once in _handle_client_connected(), and then never again:

async def _handle_client_connected(self):
        # There is nothing to do here yet, the pipeline is still not ready
        if not self._params:
            return

        self._audio_input_track = self._webrtc_connection.audio_input_track()
        self._video_input_track = self._webrtc_connection.video_input_track()

So I don't think right now there would be any benefit from creating the transceivers manually. Even so, I'll see if it's a quick change and do it if so.

As a side note, I think it's risky/fragile to rely on the transceiver order being preserved between the server and client. Ideally the server should dynamically detect tracks being started/stopped to avoid being dependent on this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the client doesn't replace tracks, and as far as I could tell the server doesn't support replacing tracks either. The audio/video tracks are fetched once in _handle_client_connected(), and then never again:

When you're using transceivers, you can replace the track on the client side, and the server will continue receiving it under the same "mid". So, it just works. You don't need a new SDP renegotiation between the peers.

For example, you can start without sending a video track and later enable your camera to begin sending one. Since you're simply replacing the track on the transceiver, this process is transparent from the server's perspective.

One thing I've already implemented on the web, but not yet on iOS, is a signaling message to report the track’s status (whether it's enabled or not):

This shouldn't be necessary. In browsers, for example, if a track isn’t sending data, it automatically changes its state to muted. But this doesn’t happen with aiortc, so I had to implement a signaling message to explicitly inform aiortc when a track is enabled or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've modified the code to create the transceivers explicitly, and only create the tracks as needed. Can confirm that replacing the track on the transceiver seems to work.

audioSender = peerConnection.addTrack(this)
}

localVideoTrack = peerConnectionFactory.createVideoTrack("cam", videoSource).apply {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should only create the video track if enableCam is true.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the audio, create it or not based on the enableMic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enableCam and enableMic are just the initial states, and currently on iOS, setting them to false stops the client from ever sending those tracks. On Android we deliberately create the tracks regardless of whether they're hooked up to a source, to work around the server not being able to handle track changes later on.

I.e. if enableCam is initially set to false, but later the user calls showVideo(), this currently won't work on iOS.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. if enableCam is initially set to false, but later the user calls showVideo(), this currently won't work on iOS.

If this is happening, this is an issue that we need to fix on iOS.

On Android we deliberately create the tracks regardless of whether they're hooked up to a source, to work around the server not being able to handle track changes later on.

You can change the tracks later on, which is why we're using transceivers on the server. We're using two transceivers. Currently one for audio and one for video. And you can simply replace the track on each transceiver.

You don't need to have a track to create a transceiver. You can just specify the kind ("audio" or "video").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've changed this. We should definitely revisit iOS though as, as far as I can tell, the tracks are only ever created in SmallWebRTCConnection.init -> createMediaSenders(), and only if self.enableCam is set (which means that if it's initially false, then there will never be a track).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is on my todo list. We need to fix it there for sure.

@marcus-daily
Copy link
Contributor Author

Thanks for the comments @filipi87! This is ready for re-review.

@marcus-daily marcus-daily requested a review from filipi87 May 29, 2025 12:34
Copy link

@filipi87 filipi87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying the changes, @marcus-daily. Looks great! 🚀

@marcus-daily
Copy link
Contributor Author

Great, thanks @filipi87!

@marcus-daily marcus-daily merged commit a152231 into main May 30, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants