Saturday, November 3, 2012

System.Speech - Talk to Your Computer


.net 4 introduced a new namespace: System.Speech. Yes - that means that your computer can talk.
Why would you want it? First of all it's cool - with less than 10 lines of code your computer can say whatever you want. Secondly we are living in a new era - the physical keyboard and the mouse are being replaced by touch screens (even now you are probably reading this post from your Smartphone). The present brings touch screens and the not so far future will bring voice control and much more – "There is no such thing as Science Fiction any more".

So if you are building a new app and you want to give it some extra features you should start with some voice control. In order to make the computer listen and recognize that you're saying we'll use the SpeechRecognitionEngine class. 
SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine();
recognizer.SetInputToDefaultAudioDevice();
recognizer.LoadGrammar(
new Grammar(new GrammarBuilder("hello computer")));
SetInputToDefaultAudioDevice method tells the recognizer to use the built in audio device in your computer, you can use the SetInputToAudioStream method to set a custom device.
The last method is the most interesting one: LoadGrammer receives the input that the user will say to the computer. The SpeechRecognitionEngine has two events:
recognizer.SpeechDetected()
recognizer.SpeechRecognized()
The SpeechDetected event fires whenever the computer detects any sound that can be converted to speech (you talking on the phone near the computer will raise this event). The SpeechRecognized will rise when the detected speech is matching to the loaded grammer. You can supply a single string to the LoadGrammer method or you can use the Choises class:
recognizer.LoadGrammar(new Grammar(new Choices(new []
                {
                   
"hello computer", "I am fine how are you", "goodbye"
                })));
Clear right?
The next thing we'll do is subscribe to "SpeechRecognized" event:
recognizer.SpeechRecognized += Speak;
recognizer.RecognizeAsync(RecognizeMode.Multiple);
RecognizeAsync(RecognizeMode.Multiple) means that the recognition won't stop after the engine detects a speech.

The last thing that's left to do is to make the computer respond to our input:
private void Speak(object sender, SpeechRecognizedEventArgs e)
{
    SpeechSynthesizer speechSynthesizer =
new SpeechSynthesizer();

   
switch (e.Result.Text)
    {
       
case "hello computer":
            {
                speechSynthesizer.Speak(
"hello Dennis, how are you?");
               
break;
            }
    }
}

Yes, the computer will say this :)

Earlier we talked about voice control. I want to tell my computer to open chrome browser, soliter or to start some complex calculation. To do so we'll need to add a new grammer and a matching action in the method that is subscribed to the SpeechRecognized event:
case "open chrome":
           {
              Process.Start(
"chrome");
              
break;
           }

The future is here.

2 comments:

  1. Super cool. Added it to my code right now.

    ReplyDelete
  2. Do you have any code example to show how to set audio input to something other than the default audio device? Using MMSYSTEM, I have access to the device ID of any of the wave audio devices installed on the system. How do I set the speech input ID to the waveindeviceid or any of the wave drivers installed on my system? For example, I Microsoft TAPI and it can provide the wave/in and wave/out device ID's for the wave driver associated with any particular phone line. This is also the same device id used in the WaveInOpen() Windows API function. Since speech is in telephony applications, it seems very important to be able to use this standard method to get at wave input and output devices with installed wave drivers. Or do I just need to give up and program directly to SAPI 5.4 and ignore the .NET namespaces?

    ReplyDelete