Amulet Devices Voice Remote for Windows Media Center

Amulet Devices Blog

Kinect Audio Smarts are seriously impressive!

Thu, August 4, 2011 by

Anyone who’s ever used the latest speech recognition technologies with a plain old microphone will tell you that the results can be surprisingly good, as long as you stay close to the mic, have no background noise and turn the mic off when you’re not speaking.

But if you wanted to place a mic under your TV and have it listen to you while the TV is on, then good luck! The mic won’t be able to hear you over the TV audio and worse still, the TV content will trigger the speech recognition causing it to execute all sorts of spurious commands.

Well that’s now changed. There are several audio “smarts” built into Microsoft’s Kinect game controller that allow it to sit next to a TV and make speech recognition possible at a range of several meters! I’ve made a couple of videos that show just such a scenario:

I think the Kinect video capabilities such as skeletal tracking and depth sensing have tended to overshadow the audio capabilities of the unit. From my experience with Kinect so far, I think its audio smarts are seriously impressive.

The beam-forming technology uses four microphones and some digital signal processing to focus a narrow beam that is steerable towards the person talking, enabling sound within the beam to be dramatically amplified over sounds outside of the beam. Also, the use of Audio Echo Cancellation (AEC) means that the Kinect will take the sound produced by the TV and subtract it from the sound picked up by the mics, which means the TV can be on and playing media while you’re saying speech commands and the speech recogniser can still understand what was said.

It’s not all plain sailing. I found the echo cancellation to be problematic at first; this turned out to be a requirement that sound be coming from the speakers at all times when AEC is enabled. If there was a couple of seconds gap between songs and the AEC was on then the Kinect would have a bit of a fit.

I tried to get around this by playing a looping silent audio WAV file and this did fool the Kinect and prevent it having a seizure when the sound stopped. But in practice this technique seemed to degrade the speech recognition performance. Now I’m turning the Kinect AEC on and off in sync with the audio output and things are much better.

If you look at the video closely you will see the AEC text display that shows the AEC switching on and off. While it’s possible for the AEC to let you be heard over music playing at a reasonable volume, if you’re a heavy metal head who needs 100dB’s or nothing, then it won’t be able to cope.

It’s difficult to convey the comparative sound levels accurately in the videos as I had to have the camera mic point to me so I could be heard over the music; this had the effect of making it look like the music was set very low. It wasn’t , the level of the music was at a good level for listening. The setup is very useable — you can see that while there are several commands recognised from backround noise etc, indicated by the red percentage values that pop up on screen from time to time, the fact that they are displayed in red means they are being disregarded as they don’t meet the confidence threshold that I set.

While the system does get the very occasional mis-recognition where I might be listening to something and it will suddenly play someone else, the inconvenience of telling it to resume what it was doing the odd time is far outweighed by the convenience of being able to bark at the TV from the couch!

I plan to make another couple of videos soon; one which will show the beam being set to one extreme and how you can then say the same command from that direction and be recognised and then say it from the other extreme and be ignored, and another video
showing the edge of the distance limits — I’ve used it here up to 20 feet.

We’re considering releasing the software used in the demo free if we get enough interest. It’s a cut down version of the regular Amulet Devices software that’s tailored to work with Kinect but controls audio tracks only. A commercial version that allows voice control
of all media, including the TV guide etc, will follow when Microsoft release a commercial version of the Kinect SDK.

Comments are closed.


Other Posts