Amulet Devices Voice Remote for Windows Media Center

Amulet Devices Blog

Kinect Audio Smarts are seriously impressive!

Thu, August 4, 2011 by Steve

Anyone who’s ever used the latest speech recognition technologies with a plain old microphone will tell you that the results can be surprisingly good, as long as you stay close to the mic, have no background noise and turn the mic off when you’re not speaking.

But if you wanted to place a mic under your TV and have it listen to you while the TV is on, then good luck! The mic won’t be able to hear you over the TV audio and worse still, the TV content will trigger the speech recognition causing it to execute all sorts of spurious commands.

Well that’s now changed. There are several audio “smarts” built into Microsoft’s Kinect game controller that allow it to sit next to a TV and make speech recognition possible at a range of several meters! I’ve made a couple of videos that show just such a scenario:

I think the Kinect video capabilities such as skeletal tracking and depth sensing have tended to overshadow the audio capabilities of the unit. From my experience with Kinect so far, I think its audio smarts are seriously impressive.

The beam-forming technology uses four microphones and some digital signal processing to focus a narrow beam that is steerable towards the person talking, enabling sound within the beam to be dramatically amplified over sounds outside of the beam. Also, the use of Audio Echo Cancellation (AEC) means that the Kinect will take the sound produced by the TV and subtract it from the sound picked up by the mics, which means the TV can be on and playing media while you’re saying speech commands and the speech recogniser can still understand what was said.

It’s not all plain sailing. I found the echo cancellation to be problematic at first; this turned out to be a requirement that sound be coming from the speakers at all times when AEC is enabled. If there was a couple of seconds gap between songs and the AEC was on then the Kinect would have a bit of a fit.

I tried to get around this by playing a looping silent audio WAV file and this did fool the Kinect and prevent it having a seizure when the sound stopped. But in practice this technique seemed to degrade the speech recognition performance. Now I’m turning the Kinect AEC on and off in sync with the audio output and things are much better.

If you look at the video closely you will see the AEC text display that shows the AEC switching on and off. While it’s possible for the AEC to let you be heard over music playing at a reasonable volume, if you’re a heavy metal head who needs 100dB’s or nothing, then it won’t be able to cope.

It’s difficult to convey the comparative sound levels accurately in the videos as I had to have the camera mic point to me so I could be heard over the music; this had the effect of making it look like the music was set very low. It wasn’t , the level of the music was at a good level for listening. The setup is very useable — you can see that while there are several commands recognised from backround noise etc, indicated by the red percentage values that pop up on screen from time to time, the fact that they are displayed in red means they are being disregarded as they don’t meet the confidence threshold that I set.

While the system does get the very occasional mis-recognition where I might be listening to something and it will suddenly play someone else, the inconvenience of telling it to resume what it was doing the odd time is far outweighed by the convenience of being able to bark at the TV from the couch!

I plan to make another couple of videos soon; one which will show the beam being set to one extreme and how you can then say the same command from that direction and be recognised and then say it from the other extreme and be ignored, and another video
showing the edge of the distance limits — I’ve used it here up to 20 feet.

We’re considering releasing the software used in the demo free if we get enough interest. It’s a cut down version of the regular Amulet Devices software that’s tailored to work with Kinect but controls audio tracks only. A commercial version that allows voice control
of all media, including the TV guide etc, will follow when Microsoft release a commercial version of the Kinect SDK.

15 Responses to “Kinect Audio Smarts are seriously impressive!”

  1. [...] Amulet have posted a really interesting video showing a custom Windows Media Center addin being controlled by voice with a Microsoft Kinect. The video shows how the Kinect can locate where in the room the voice controls are coming from and how well it works with the custom UI. [...]

  2. Ian Dixon says:

    Very impressive, I would love to see you release a version of this. I will be bringing my kids Kinect down to my living room :)

  3. NetFloorLive says:

    very interested in this free demo of the Kinect Media Center. would love to get in installed on my Kinect Media Center PC now.

  4. Vu says:

    great demos! Can you go a bit into the voice recognition technology? I’ve used Dragon Naturally Speaking and Microsoft’s built-in speech application in Windows and you have to spend forever training it to your voice. And if someone else wanted to use the voice recognition, they would have to train their voice as well.

    When I saw the E3 demo where you could speak a search term to search for Bing, it kept going through my head how they’re doing that. Does the Kinect grab that section of audio, send it to Microsoft’s cloud for it to be processed and then for the Bing search engine to be pulled? How can you say words that have never been trained and still be recognized? Are you training it for your albums?

  5. TJ says:

    Although it is limited, the capability is awesome. I’d be very interested in trying this out on my media center.

  6. camcorder says:

    There is noticeably a lot to know about this. I believe you made some nice points in features also.

  7. stewart says:

    Great news, good videos, now get your products available in the UK, I’ve been waiting for months

  8. Wanilton says:

    WOW, very good…I liked

  9. kromseesall says:

    Very nice demo. Is there going to be a release date or beta of this?

  10. Terry Gore says:

    I would jump all over both the pre-release and the final release that I’m sure you would charge for. This is incredibly sexy and ohh so cool.

  11. SteveS says:

    WOuld love to see this released…

  12. Marty says:

    Just wanted to find out status and stay in loop on updates for amulet devices software to control media center using Kinect. I currently use xbox 360 and a media center extender for streaming (from computer) recorded tv shows and movies (and ripped movies). I’m considering a Kinect purchase for this 2011 Christmas. Would be nice to be able to use the voice capabilities of the Kinect to control media center instead of buying another voice capable hardware (ie Amulet devices remote) for this. And I have no idea if MS is ever planning media center voice control using Kinect.

  13. bill says:

    Please make it ,and when will it be available

  14. Eddy says:

    Hi all,

    The release version is now available for download at Amulet Voice Kinect

    Eddy

Leave a Reply

 

Other Posts