daniel bigham.ca

This section lists all blog posts, regardless of topic.

People prefer angry faced cars
October 13, 2008

http://science.slashdot.org/science/08/10/13/0631240.shtml

Check out this story. I find it very interesting, and I would tend to agree that cars look a bit angry. Perhaps it's a testosterone look: Do we want our cars to look strong and masculine? I've always thought it curious how face-like car designs are. Is it coincidence?

Disappointment with voice recognition
October 13, 2008

My initial playing around with Microsoft's Speech API was quite successful... its speech recognition accuracy seemed very solid, even if I took my headset off and rested it beside the monitor, or put in on the ground. (Wow!) That was until I tried using it in dictation mode, which increases its vocabulary from a few words to a few thousand words. Suddenly it was recognizing a short sentence correctly with < 5% accuracy.

I figured by reducing its vocabulary to say 150 words I'd get the accuracy back, but I couldn't find any easy way to do that. As a work around, I created a custom grammar that supports statements consisting of 1, 2, 3, 4, 5, or 6 words, each word needing to come from a list of 150 possibilities. Even then its accuracy was remarkably poor. For example, here is what it recognizes "My name is Daniel" as on five different attempts:

"My name is Daniel" was recognized as:

my name is Daniel

I name is Daniel

my name is Daniel

nineteen's Daniel

"I live in Waterloo" was recognized as:

"mon"

"man"

"I man name Waterloo"

"Bigham eighteen mon"

"mon"

"I live name one million"

...

In other words, about 50% in the best cases, and < 10% in many other cases. I would have hoped that the accuracy would have been at least 95% with such a small vocabulary.

Exercise 23: Grace
October 13, 2008

Summary

Construct a Windows application in C# that runs in the tray and constantly listens for audio input. If the word "Grace" is recognized, then accept audio commands for the next two seconds. Or, if the word "Grace" is detected as the first word in a recognized string, then act on the audio command that is found in the latter part of the string. In addition:

Use a grammar.

Log all activity to a log file using log4net

Respond to the command "What time is it?", or "What is the time?", or "What's the time?", or "What's the current time?" by reading the time using the Expressivo Jennifer voice. The time should be spoken as, for example, "four oh nine".

Test the program in a kitchen or living room setup with a laptop where the laptop is across the room from the speaker.

older >>