Skip to content

Diagnosing Network Problems with WebRTC Applications

Anton Venema Mar 13, 2022 4:44:27 PM

What happens when a video call works perfectly in one environment, but degrades rapidly in another? What about an audio call that sounds good most of the time, but occasionally cuts out?

Sometimes these issues are the result of a bug. Android, in particular, is notorious for cases where code that executes flawlessly on one device can fall apart or even crash on another.

Unlike iOS, where software and hardware are designed in tandem according to strict standards from a single manufacturer, Android’s approach allows for variation at both the hardware and software level that sometimes results in quirky and unpredictable behaviour. This has a particularly notable effect on applications that push the limits of what smartphones are capable of and interface with a wide range of hardware and software components (e.g. anything that uses real-time communications).

Sometimes, however, these issues are the result of “bad” network conditions, which is to say one or more measurable characteristics are outside what we would call the “good” range. If you are experiencing issues where things work well on one network but not another, then a few of the key metrics to look at are:

  • Bandwidth
  • Latency (Lag)
  • Loss
  • Jitter

 

Bandwidth

Bandwidth refers to the rate at which data can be transferred between two endpoints.

You can think of it as the minimum size of the Internet pipeline between you and the remote party, which is to say that it is the size of the most constrained network leg between the two. You could have an extremely fast connection, but that won’t matter much if the other side is connected to a WiFi hotspot or corporate VPN with only a few KB/s available.

An out-of-control sender that exceeds the available bandwidth of a receiver will quickly overwhelm the network and cause severe degradation in the stream quality. The symptoms of an overloaded connection are:

  • Permanent video freezing
  • Permanent video frame-rate drops
  • Choppy audio
  • Dropped connections

If you experience symptoms like this, it’s easy to test bandwidth using websites like fast.com or speedtest.net.

In your application, you can run quick bandwidth tests by simply hosting a large file on a server and seeing how much can be downloaded in a second or two. It’s a simple approach, but highly effective in troubleshooting a problematic network.

While the connection is active, the best solution is for senders to approximate as closely as possible the available bandwidth to the receiver using RTCP feedback and heuristics. IceLink exposes the RTCP traffic as part of its API to allow applications to fine-tune the media streams in real-time to adapt to their needs (e.g. prioritizing audio over video, preferring to scale image size over encoder quality, etc.).

 

Latency

Latency is the amount of time it takes (usually measured in milliseconds) to get from one network interface to another.

Round-trip time (RTT) is closely correlated, as it is the time it takes to get from one network interface to another and then back again. If x is the latency from A to B and y is the latency from B to A, then the RTT is calculated as x + y.

By itself, the primary effect latency has on a video call is simply a delay in the time to hear and see the sender’s audio and video. A small latency (less than 100ms) may not even be noticeable in a two-way call. If the stream is one-way audio broadcast, even a significant latency may not be a problem depending on the use case.

Video, however, is an entirely different story.

An efficient video stream relies heavily on the use of negative acknowledgements (NACKs) sent by the receiver to the sender whenever a packet is lost or dropped. The sender has the opportunity to resend the missing packet, avoiding the need for a full frame refresh (keyframe) to be sent.

A high latency network drastically reduces the effectiveness of this approach, since the round-trip time may very well exceed the length of time the receiver can wait. In order for the video to stay synchronized with audio, the receiver can only wait up to the length of time the audio is buffered for playback, at which point is has to give up and move on. Audio is not typically affected by this, since it does not rely on keyframes.

The symptoms of a high-latency connection are:

  • Lag in audio/video playback
  • Occasional video freezing
  • Occasional video frame-rate drops
  • Smooth audio

If you are looking at your IceLink logs, you will also typically see a high number of picture loss indications (PLIs), which are used by the receiver to request a full frame refresh.

 

Loss

Loss is the number, or percentage, of packets that are dropped or lost in a stream over a period of time.

Most media streams will drop a few packets here and there, especially on WiFi networks. In a real-time media stream, packet loss is generally preferred as it allows the connected devices to drop data/frames rather than introduce lag into the playback.

How loss is handled depends on the nature of the data being sent. If forward error correction (FEC) is enabled for a media stream, a lost packet can sometimes be recovered automatically based on the existing data already received.

If FEC is not available or fails, an intelligent audio decoder (like Opus) can look at the playout waveform thus far and generate data that closely approximates what the missing audio packet may have contained - a technique known as packet loss concealment (PLC).

In cases where the audio decoder doesn’t support this or the packet loss is too great, zero-byte packets can be generated to fill the gap. This causes the audio to cut out, but keeps the playback buffer ready to handle whatever comes next.

While audio packets can be recovered through the use of NACKs and retransmission, it is not generally recommended for real-time media for a few reasons:

  1. The recovered packets often arrive too late to be useful (audio waits for no one).
  2. Since audio is already low bitrate, the NACK requests can increase the audio bandwidth requirements by a significant percent.
  3. Unless the packet loss is extreme (in which case NACKs probably wouldn’t help anyway), the cutouts are often not significant enough to impact the conversation.
  4. Unlike video, recovery from packet loss is immediate with no need for a keyframe.

Video loss is typically handled through NACKs as described in the previous section since the network cost of a full frame refresh is so high.

If the NACKs fail, however, we typically have no choice and must send a PLI to request one.

A modest amount of packet loss is often unnoticeable on an otherwise “good” network. A high amount of packet loss will result in the following symptoms:

  • Frequent video freezing
  • Frequent video frame-rate drops
  • Choppy audio

If you are looking at your IceLink logs, you will typically see a lot of NACKs being sent, probably a few picture loss indications (PLIs), and lots of generated PLC.

If the loss is excessive and recovery attempts are unable to compensate, the video will start to freeze, the frame rate will drop, and the audio will become difficult to understand with frequent cuts in and out. Packet loss has the most severe impact when combined with a high latency network, which cripples the effectiveness of NACK-based retransmissions.

If the root problem is an overloaded network device, then the solution is to lower the bitrate which will in turn reduce network demands.

This is a frequent problem with WiFi networks which are overloaded or suffering interference from neighbouring networks.

 

Jitter

Jitter is a measure of the consistency of timing within a network stream. In other words, how much packet delivery deviates from the expected arrival time.

Since UDP does not guarantee delivery order, jitter occurs on every connection. Each hop on the network path is an opportunity for jitter to occur, so higher latency networks or networks with higher hop counts are more likely to experience high levels of jitter.

Since a media decoder/playback component must process packets in order, problems arise if each packet received is processed immediately.

Once a packet has been processed by the decoder, any “older” packets must be discarded. This could very easily result in throwing out packets that could have improved audio quality or eliminated the need to retransmit a video packet.

Eliminating the effects of jitter requires the media receiver to run received packets through a “jitter buffer”.

A jitter buffer is responsible for delaying the processing of media packets just enough to smooth out delivery times and ensure correct packet order for the next stage in the processing pipeline. A greater delay will do a better job at eliminating the effects of network jitter, but at the cost of introducing additional latency to the pipeline. Since network conditions can vary widely over the course of a call, the best jitter buffers are variable: increasing or decreasing their internal delay as needed.

Assuming the jitter buffer can adapt quickly to the changing network conditions, the symptoms of high jitter will be:

  • Bursts of video freezing
  • Bursts of video frame-rate drops
  • Bursts of choppy audio

The symptoms are essentially the same as those for packet loss, but go away as the jitter buffer adapts its internal delay. Looking at your IceLink logs, you should see the size of the jitter buffer increasing/decreasing to meet changing demands.

 

What About Other Network Issues?

See symptoms that don’t match up with anything described so far?

If a network measures poorly in more than one area, like loss combined with high latency, the symptoms can be a bit more complicated. In cases like these, looking at how the symptoms change over time is often the key to a better understanding.