Speech to Text in WPF

Nipun Tomar
14y
72.4k
0
1

Article

One of the new features that came out with .NET 3.5 and 4.0 is the addition of the System.Speech library. This library is a collection of classes that enables speech recognition (Speech to Text) and speech synthesis (text-to-speech).

In continuation of a previous contribution Text to Speech in WPF, here is a small sample that will recognize the speech and show the resultant text. You can use the System.Speech.Recognition namespace to write speech recognition for desktop applications. You can have two choices:

SpeechRecognizer
SpeechRecognitionEngine

The Difference is that the SpeechRecognizer uses the shared recognizer, the same recognizer that Vista/7 uses for speech recognition. With this you can access the speech toolbar to interact with the user. The SpeechRecognitionEngine is all done in your application's own process, thus you cannot use the speech toolbar, and you must explicitly tell it when to start recognition.

The speech recognition engine is accessed directly in managed applications by using the classes in System.Speech.Recognition or, alternatively, by the Speech API (SAPI) when used in unmanaged applications.

Here is a small sample of using System.Speech.Recognition. Add a reference to System.Speech.

Create WPF window as below

<Window x:Class="Speech_to_Text.MainWindow"

xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"

xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"

Title="Speech to Text" Height="300" Width="525">

<Grid>

<Grid.RowDefinitions>

</Grid.RowDefinitions>

<Grid.ColumnDefinitions>

</Grid.ColumnDefinitions>

<Label Name="LabelHypothesized" Grid.Row="1" Grid.Column="0" Foreground="Green" >Hypothesized</Label>

<Label Name="LabelRecognized" Grid.Row="1" Grid.Column="1" Foreground="Green" >Recognized</Label>

<Label Name="LabelStatus" Grid.Row="2" Grid.Column="0" FontSize="10" Foreground="Red">Status:</Label>

<Label Name="Label1" Grid.Row="2" Grid.Column="3" FontSize="10">Speak "End Dictate" to stop.</Label>

</Grid>

</Window>

Now let's start with the code

Add using directive

using System.Speech.Recognition;
Initialize speechsynthesizer object

private SpeechRecognitionEngine recognizer;
Add speechsynthesizer events on window load

private void Window_Loaded(object sender, RoutedEventArgs e)
{
   //initialize recognizer and synthesizer
   InitializeRecognizerSynthesizer();
}
/// <summary>
/// initialize recognizer and synthesizer along with their events
/// </summary>
private void InitializeRecognizerSynthesizer()
{
   var selectedRecognizer = (from e in SpeechRecognitionEngine.InstalledRecognizers()
                                where e.Culture.Equals(Thread.CurrentThread.CurrentCulture)
   select e).FirstOrDefault();
   recognizer = new SpeechRecognitionEngine(selectedRecognizer);
   recognizer.AudioStateChanged+=new EventHandler<AudioStateChangedEventArgs>(recognizer_AudioStateChanged);
   recognizer.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(recognizer_SpeechHypothesized);
   recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
   synthesizer = new SpeechSynthesizer();
}
Add event handlers

private void recognizer_AudioStateChanged(object sender, AudioStateChangedEventArgs e)
{
   switch (e.AudioState)
   {
   case AudioState.Speech:
   LabelStatus.Content = "Listening";
   break;
   case AudioState.Silence:
   LabelStatus.Content = "Idle";
   break;
   case AudioState.Stopped:
   LabelStatus.Content = "Stopped";
   break;
   }
}
private void recognizer_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
     Hypothesized++;
   LabelHypothesized.Content = "Hypothesized: " + Hypothesized.ToString();
}
private void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
   Recognized++;
   LabelRecognized.Content = "Recognized: " + Recognized.ToString();
   if (RecogState == State.Off)
   return;
   float accuracy = (float)e.Result.Confidence;
   string phrase = e.Result.Text;
   {
     if (phrase == "End Dictate")
     {
     RecogState = State.Off;
     recognizer.RecognizeAsyncStop();
     ReadAloud("Dictation Ended");
       return;
     }
     TextBox1.AppendText(" " + e.Result.Text);
   }
}
And finally the ButtonStart_click

private void ButtonStart_Click(object sender, RoutedEventArgs e)
{
   switch (RecogState)
   {
   case State.Off:
   RecogState = State.Accepting;
   ButtonStart.Content = "Stop";
   recognizer.RecognizeAsync(RecognizeMode.Multiple);
   break;
        case State.Accepting:
   RecogState = State.Off;
   ButtonStart.Content = "Start";
   recognizer.RecognizeAsyncStop();
   break;
   }
}

The resulting screen of the application will be as: