One of the new features that came out with .NET 3.5 and 4.0 is the addition of the System.Speech library. This library is a collection of classes that enables speech recognition (Speech to Text) and speech synthesis (text-to-speech).
In continuation of a previous contribution
Text to Speech in WPF, here is a small sample that will recognize the speech and show the resultant text. You can use the System.Speech.Recognition namespace to write speech recognition for desktop applications. You can have two choices:
- SpeechRecognizer
- SpeechRecognitionEngine
The Difference is that the SpeechRecognizer uses the shared recognizer, the same recognizer that Vista/7 uses for speech recognition. With this you can access the speech toolbar to interact with the user. The SpeechRecognitionEngine is all done in your application's own process, thus you cannot use the speech toolbar, and you must explicitly tell it when to start recognition.
The speech recognition engine is accessed directly in managed applications by using the classes in System.Speech.Recognition or, alternatively, by the Speech API (SAPI) when used in unmanaged applications.
Here is a small sample of using System.Speech.Recognition. Add a reference to System.Speech.
Create WPF window as below
<Window x:Class="Speech_to_Text.MainWindow"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
Title="Speech to Text" Height="300" Width="525">
<Grid>
<Grid>
<Grid.RowDefinitions>
<RowDefinition Height="*"/>
<RowDefinition Height="30"/>
<RowDefinition Height="25"/>
</Grid.RowDefinitions>
<Grid.ColumnDefinitions>
<ColumnDefinition Width="120"/>
<ColumnDefinition Width="120"/>
<ColumnDefinition Width="120"/>
<ColumnDefinition Width="*"/>
</Grid.ColumnDefinitions>
<TextBox Name="TextBox1" Grid.Row="0" Grid.Column="0" Grid.ColumnSpan="4" TextWrapping="Wrap" />
<Label Name="LabelHypothesized" Grid.Row="1" Grid.Column="0" Foreground="Green" >Hypothesized</Label>
<Label Name="LabelRecognized" Grid.Row="1" Grid.Column="1" Foreground="Green" >Recognized</Label>
<Button Name="ButtonStart" Grid.Row="1" Grid.Column="3" Content="Start" Click="ButtonStart_Click" Width="80" IsEnabled="False"></Button>
<Label Name="LabelStatus" Grid.Row="2" Grid.Column="0" FontSize="10" Foreground="Red">Status:</Label>
<Label Name="Label1" Grid.Row="2" Grid.Column="3" FontSize="10">Speak "End Dictate" to stop.</Label>
</Grid>
</Grid>
</Window>
Now let's start with the code
- Add using directive
using System.Speech.Recognition;
- Initialize speechsynthesizer object
private SpeechRecognitionEngine recognizer;
- Add speechsynthesizer events on window load
private void Window_Loaded(object sender, RoutedEventArgs e)
{
//initialize recognizer and synthesizer
InitializeRecognizerSynthesizer();
}
/// <summary>
/// initialize recognizer and synthesizer along with their events
/// </summary>
private void InitializeRecognizerSynthesizer()
{
var selectedRecognizer = (from e in SpeechRecognitionEngine.InstalledRecognizers()
where e.Culture.Equals(Thread.CurrentThread.CurrentCulture)
select e).FirstOrDefault();
recognizer = new SpeechRecognitionEngine(selectedRecognizer);
recognizer.AudioStateChanged+=new EventHandler<AudioStateChangedEventArgs>(recognizer_AudioStateChanged);
recognizer.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(recognizer_SpeechHypothesized);
recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
synthesizer = new SpeechSynthesizer();
}
- Add event handlers
private void recognizer_AudioStateChanged(object sender, AudioStateChangedEventArgs e)
{
switch (e.AudioState)
{
case AudioState.Speech:
LabelStatus.Content = "Listening";
break;
case AudioState.Silence:
LabelStatus.Content = "Idle";
break;
case AudioState.Stopped:
LabelStatus.Content = "Stopped";
break;
}
}
private void recognizer_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
Hypothesized++;
LabelHypothesized.Content = "Hypothesized: " + Hypothesized.ToString();
}
private void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Recognized++;
LabelRecognized.Content = "Recognized: " + Recognized.ToString();
if (RecogState == State.Off)
return;
float accuracy = (float)e.Result.Confidence;
string phrase = e.Result.Text;
{
if (phrase == "End Dictate")
{
RecogState = State.Off;
recognizer.RecognizeAsyncStop();
ReadAloud("Dictation Ended");
return;
}
TextBox1.AppendText(" " + e.Result.Text);
}
}
- And finally the ButtonStart_click
private void ButtonStart_Click(object sender, RoutedEventArgs e)
{
switch (RecogState)
{
case State.Off:
RecogState = State.Accepting;
ButtonStart.Content = "Stop";
recognizer.RecognizeAsync(RecognizeMode.Multiple);
break;
case State.Accepting:
RecogState = State.Off;
ButtonStart.Content = "Start";
recognizer.RecognizeAsyncStop();
break;
}
}
The resulting screen of the application will be as: