Contact Us
Free Trial
New call-to-action
New call-to-action
New call-to-action

Voice Detection with the Client SDK

by Anton Venema, on January 24, 2019

Voice Detection 
Voice detection is an important technique used in audio processing to help detect the presence or absence of human speech.  Being able to monitor and mute inactive speakers has many benefits but primarily it helps:
 
  • Reduce background noise for an improved user experience
  • Save network bandwidth by avoiding the unnecessary transmission of audio packets. 
 

How you can monitor and mute inactive speakers

The LiveSwitch and IceLink SDKs make it easy to monitor audio levels. It’s as simple as wiring up a single event handler to your `LocalMedia` instance:
 

localMedia.OnAudioLevel += (level) =>
{
    Log.WriteLine($"Local media {localMedia.Id} audio level is {level}.");
};

Adding the above snippet to your app will flood your log with local microphone capture levels as soon as you start the local media.

The same event is available for monitoring inbound remote audio levels using `RemoteMedia`:


remoteMedia.OnAudioLevel += (level) =>
{
    Log.WriteLine($"Remote media {remoteMedia.Id} audio level is {level}.");
};

You can also work with audio tracks directly, whether you are creating your own or using the prebuilt tracks that underpin `LocalMedia` and `RemoteMedia`:


audioTrack.OnLevel += (level) =>
{
    Log.WriteLine($"Audio track {audioTrack.Id} level is {level}.");
};

Warning: this event is raised for every single audio frame on a time-sensitive audio thread. Unless you’ve got a big battery or a really fast device, make sure any work done in your event handler is performed as quickly as possible.

Let’s try muting our microphone when we’re not speaking.

Muting itself is trivial:


localMedia.MuteAudio();

… and we can easily do it in response to a certain audio level threshold:


localMedia.OnAudioLevel += (level) =>
{
    if (level < InactiveThreshold)
    {
        localMedia.MuteAudio();
    }
    Log.WriteLine($"Local media {localMedia.Id} audio level is {level}.");
};

There’s a problem with this, though. Once we call this method, the audio levels all read out as 0.0 going forward. This is technically correct - the audio buffers are perfectly silent - but it means we have no way to detect when we start speaking again.

Let’s try it a different way, using the `OnRaiseFrame` event of the `AudioSource`. This event gives direct access to the audio buffers as they are raised by the source:


localMedia.AudioSource.OnRaiseFrame += (frame) =>
{
    var buffer = frame.LastBuffer;
    var level = buffer.CalculateLevel();
    if (level < InactiveThreshold)
    {
        buffer.Mute();
    }
    Log.WriteLine($"Local media {localMedia.Id} audio level is {level}.");
};

Using this event, we can calculate the audio level every time, even while muting. It also automatically “unmutes” (by doing nothing) when the level rises above our threshold.

The LiveSwitch and IceLink client SDKs are designed for maximum flexibility and provide unprecedented media pipeline access to build virtually anything you can dream. Can you shoot yourself in the foot? Absolutely. All that power comes with a bit of a learning curve, so we are always hard at work enhancing our documentation to make it easier to find the solution you’re looking for.

Can’t figure out how to do something? Got a big dream but need some assistance? Contact our support team to get answers to your questions or reach out to our professional services team for help today.

Professional-Services-1

Topics:Insider

Anton Venema

As Frozen Mountain’s CTO, Anton is one of the world’s foremost experts on RTC solutions, as well as the technical visionary and prime architect of our products, IceLink and WebSync, and our custom solutions. Anton is responsible for ensuring that Frozen Mountain’s products exceed the needs of today and predict the needs of tomorrow.