-
Notifications
You must be signed in to change notification settings - Fork 79
Description
From my understanding, the current NACK reporter generates a NACK only after at least 33ms (NACK_MIN_INTERVAL) have passed since the last NACK. Each missing packet can be reported up to 5 times (MAX_NACKS). The NACK window length is determined by MAX_MISORDER. When a new highest sequence number arrives, the sliding window advances and drops any incoming packets with
sequence_number < new_highest_sequence_number - MAX_MISORDER.
Unfortunately, the current NACK reporter becomes ineffective under several common wireless network conditions such as LTE and 5G.
Low Jitter and Low Packet Loss
Assume:
- jitter ≈ 30 ms
- packet_loss ≈ 1%
In this scenario, the receiver’s jitter buffer is typically small (around 30–50 ms). Because the NACK reporter waits at least 33 ms before sending a NACK, there is already a minimum delay of 33 ms just to notify the sender about a missing packet. In practice, the retransmission will take 33 ms + 1 RTT before it reaches the SFU.
Given the small jitter buffer on the client side, the likelihood that the retransmitted packet becomes unusable by the time it arrives is high.
Proposal:
The NACK reporter should send the first NACK as soon as a sequence gap is detected, or at least use the observed jitter as a baseline before applying throttling.
High RTT and Low Packet Loss
Assume:
- RTT ≈ 200 ms
- packet_loss ≈ 1%
- received sequence: P1, P2, _, P4
When the NACK reporter sees a gap after receiving P4, it generates a NACK for P3 in the next report. It then continues sending NACKs for P3 until the packet is received or until the MAX_NACKS limit is reached. Once it hits MAX_NACKS, the window advances and the packet is treated as permanently lost—even if the retransmission eventually arrives.
This issue occurs whenever:
NACK_MIN_INTERVAL * MAX_NACKS < RTT
In these cases, the reporter exhausts all retry attempts before the retransmitted packet has any chance of arriving, since the retransmission itself requires at least one RTT.
Proposal:
The NACK reporter should be RTT-aware. Instead of using a static NACK_MIN_INTERVAL, it could use a dynamic value such as RTT / 2, or otherwise base the interval on the estimated RTT.
Expected Outcome: Less bandwidth used for retransmission, and and the stream should be smoother. Each NACK will have a higher chance to recover, and more responsive to a legitimate packet loss.
Context: I've been observing this unrecoverable packet loss issues typically while I'm on LTE. I have an SFU implementation with RTP mode and no jitter buffer. The SFU just acts a dumb pipe. This is the actual obfuscated ping result (definitely not a friendly network, but great for an experiment 🙂):
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=1 ttl=47 time=132 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=2 ttl=47 time=226 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=3 ttl=47 time=246 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=4 ttl=47 time=206 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=5 ttl=47 time=99.3 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=6 ttl=47 time=97.9 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=7 ttl=47 time=90.0 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=8 ttl=47 time=103 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=9 ttl=47 time=152 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=10 ttl=47 time=282 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=11 ttl=47 time=178 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=12 ttl=47 time=250 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=13 ttl=47 time=98.4 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=14 ttl=47 time=192 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=15 ttl=47 time=96.2 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=16 ttl=47 time=237 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=17 ttl=47 time=363 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=18 ttl=47 time=502 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=19 ttl=47 time=322 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=20 ttl=47 time=228 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=21 ttl=47 time=251 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=22 ttl=47 time=171 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=23 ttl=47 time=193 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=24 ttl=47 time=155 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=25 ttl=47 time=238 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=26 ttl=47 time=157 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=27 ttl=47 time=181 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=28 ttl=47 time=204 ms
64 bytes from 123-77-192-44.example.net (123.77.192.44): icmp_seq=29 ttl=47 time=227 ms
--- example-host ping statistics ---
29 packets transmitted, 29 received, 0% packet loss, time 28035ms
rtt min/avg/max/mdev = 90.028/202.622/502.172/88.494 ms
Please let me know what you guys think or correct my understanding. I'm happy to make the contribution too!