Practical Concerns in Acoustic Echo Control for PC

Echo control differs from many other audio processing tasks in that it depends on two streams: the farend (audio to be played on the speaker) and the nearend (audio recorded from the microphone). In a hands-free call, the nearend typically contains an echoed version of the farend. In order to identify and remove this echo component from the stream, the signals must be time-aligned in some fashion. It is this need for time-alignment that is at the root of many of the practical difficulties in acoustic echo control (AEC) that are not apparent for other tasks such as noise suppression and coding which operate on only a single stream.

There are two crucial and sometimes unmentioned factors which contribute to this time-alignment problem for AEC on a PC platform:

1. The AEC will be running on a non-real-time operating system. This means that processing is not guaranteed to take place at any particular time, but will instead be performed in some kind of best-effort manner. The effect is that the delay between the farend and nearend signals is unknown a priori and will probably change over time. When the CPU is heavily loaded this is aggravated; it's even possible that buffers will overflow and we'll lose some of the stream data. It's necessary to compensate for this delay to achieve our desired time-alignment.

2. There is a wide array of available hardware devices which can be used in combination. Recording and playout devices (often soundcards, but alternately webcams and other USB devices) run on hardware clocks just as the CPU does. This controls the rate at which data is recorded or played out. If these clocks differ, and data is recorded at a different rate than it is played out, the farend and nearend streams will drift away from each other. This phenomenon is aptly labeled clock drift. Again, it's necessary to compensate for this effect to achieve time-alignment.

The wide variety of hardware leads to another issue. In any practical scenario there will be some amount of non-linear distortion in the echo path. Poor or overdriven speakers and microphones are usually the cause. This type of distortion can be heard for instance if a user speaks very loudly into the microphone, causing signal saturation. The traditional echo canceller uses a linear filter to remove echo, which by definition cannot model this distortion. An effective algorithm must be prepared for this eventuality.

In the literature these practical considerations are sometimes not given their due weight. An AEC algorithm that performs well "in the lab" can be surprisingly underwhelming in a real scenario. It is therefore important that we use actual field recordings including time-alignment mismatches to test performance.

GIPS relies on an integration between our AEC algorithm and VoiceEngine's cross-platform sound device handling to effectively contend with these practical considerations.

Practical Concerns in Acoustic Echo Control for PC

Tags:

Around TMCnet Blogs

Latest Whitepapers

TMCnet Videos