Introduction
In my previous article
Programming Speech in WPF - Speech Synthesis, I covered text-to-speech
functionality in WPF. This article is about speech to text, also known as speech
recognition.
Speech Recognition is a reverse process of Speech Synthesis that converts speech
to text. There are two major applications for speech recognition. The first
application is people who are for some reason unable to type but can speak to
the system and system will type text for them. For example, in endoscopic
applications a surgeon can evaluate the patient and speak to the system. While
surgeon is doing the evaluation, his hands are buys but he can speak. The second
application is speech command enabled applications where instead of using mouse,
we can use voice to run and execute an application commands.
Windows Vista and Window 7 comes with built-in Speech Recognition controls that
allow you to setup speech related options such as voice settings, microphone,
and other voice recognition settings. Let's take a quick look at what Control
Panel has to offer related to Speech Recognition.
Go to Control Panel and open Speech Recognition Options. You will see a dialog
looks like Figure 1.
Figure 1
As you can see from Figure 5, there are options to start speech recognition,
setup your microphone, take speech tutorial, train your computer, and open
reference card. You may want to click on these options one by one to understand
Speech Recognition better.
If you click on first link Start Speech Recognition, it will activate speech
recognition on the system and system will start listening sounds around your
computer.
Next option is Set up Microphone. This option allows you to tell system what
microphone to use if you have more than one. Otherwise system will use default
microphone.
Next option Take Speech Tutorial is a step by step tutorial that teaches you how
to use various system controls.
Next option, Train your computer to better understand is very important. Before
you want build and test your application, I recommend you use this option and
follow step by steps of the wizard. This wizard will understand your voice and
ensures the accuracy of commands you sends to the system. If you do not train
your computer for your voice, computer may not understand your command properly.
The component that is responsible for controlling and managing speech
recognition is called Windows Desktop Speech Technology Recognition Engine (SR
Engine).
When you build a Speech Recognition application and you do not setup microphone
and voice settings, system will launch wizards and it will ask you to setup
these settings. On Windows Vista machine, when first time you will use its and
some Speech Recognition controls, you will notice a Windows application like
figure 2.
Figure 2
That tells me that SR Engine is ready. We just need to enable this by saying
first command start listening.
If you right click on Speech Recognition control, you will see various options
that allow you to turn speech recognition on, off and put it in sleep mode as
you see in Figure 3.
Figure 3
Speech Recognition API
Speech Recognition functionality is defined in the System.Speech.Recognition
namespace. Before you start using Speech Recognition related functionality, you
must import these two namespaces in your application:
using System.Speech;
using System.Speech.Recognition;
SpeechRecognizer
The SpeechRecognizer is the main component of Speech Recognition API. The
SpeechRecognition class is listens and catches the spoken text from the system
and converts it to text or text commands.
protected SpeechRecognizer spRecognizer
= new SpeechRecognizer();
Enabling SpeechRecognizer
The State property returns the current state of SpeechRecognizer that can either
by in Stopped or Listening. The Enabled property controls if the
SpeechRecognizer is enabled and ready to listen or not. Listing 9 enables
SpeechRecognizer by setting Enabled property to true.
SpeechRecognizer spRecognizer
= new SpeechRecognizer();
spRecognizer.Enabled = true;
Listing 9
Reading Text
The SpeechRecignized event of SpeechRecognizer is raised when the recognition
engine detects speech, and has found one or more phrases with sufficient
confidence levels. This event is used to get the speech that is detected by the
speech engine.
The code snippet in Listing 10 sets the SpeechRecognized event handler and gets
the text recognized by the speech engine and copies it in a string.
SpeechRecognizer spRecognizer
= new SpeechRecognizer();
spRecognizer.Enabled
= true;
spRecognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(spRecognizer_SpeechRecognized);
void spRecognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
string
str = e.Result.Text;
}
Listing 10
Grammar and GrammarBuilder
One of the key usages of speech-enabled applications to build software product
that listens to your commands and execute functionality based on the given
commands. For example, instead of using a menu items to open and close files, we
can build a system that will open and close a file when speech command Open File
and Close File are sent to the speech system. The Grammar object of
SpeechRecognizer handles these commands and the Grammar class is used to create
a Grammar component.
The Grammar object in WPF represents a grammar document. The Grammar object
fully supports the W3C Speech Recognition Grammar Specification (SRGS) and
Context Free Grammar (CFG) specifications. You create a Grammar object by
passing a GrammarBuilder object as a parameter in its constructor. Listing 11
creates a Grammar object by passing a Grammer Builder object as the default
parameter of its constructor.
GrammarBuilder gBuilder
= new GrammarBuilder();
//
Construct GrammarBuilder here
//
Create a Grammar from a GrammarBuilder
Grammar speechGrammar
= new Grammar(gBuilder);
Listing 11
A GrammarBuilder object is used to provide a simple mechanism to build speech
grammar. Add and Append methods of GrammarBuilder are used to add and append
speech text, phrases and other GrammarBuilder objects to a grammar.
The methods of GrammarBuilder take parameters of either string or Choices
object. The Choices object represents a list of alternative items to make up an
element in a speech grammar.
The code snippet in Listing 12 creates a Grammar Builder using some Choices
objects and then builds a Grammar object that can be load into a
SpeechRecognizer.
private Grammar CreateGrammarDocument()
{
GrammarBuilder gBuilder
= new GrammarBuilder();
//
Construct GrammarBuilder here
gBuilder.Append(new Choices("Phone", "Email", "Text"));
gBuilder.Append("my");
gBuilder.Append(new Choices("Mom", "Dad", "Brother", "Sister"));
//
Create a Grammar from a GrammarBuilder
Grammar speechGrammar
= new Grammar(gBuilder);
return speechGrammar;
}
Listing 12
Here is a list of few sentences that can be constructed using Listing 12.
-
Phone my Mom
-
Text my Brother
-
Email my Mom
-
Phone my Brother
-
Email my Dad
Loading and Unloading Grammar
The LoadGrammar method of SpeechRecognizer
synchronously loads a specific grammar into a SpeechRecognizer. The code snippet
in Listing 13 calls LoadGrammar method and loads a grammar.
SpeechRecognizer spRecognizer
= new SpeechRecognizer();
spRecognizer.LoadGrammar(CreateGrammarDocument());
Listing 13
The LoadGrammarSync method of SpeechRecognizer asynchronously loads a specific
grammar into a SpeechRecognizer. The code snippet in Listing 14 calls
LoadGrammarAsync method and loads a grammar.
SpeechRecognizer spRecognizer
= new SpeechRecognizer();
spRecognizer.LoadGrammarAsync(CreateGrammarDocument());
Listing 14
The UnloadGrammar method unloads a given Grammar and UnloadAllGrammars method
unloads all grammars in a SpeechRecognizer object. The code snippet in Listing
15 shows how to upload grammars using UnloadGrammar and UnloadAllGrammars
methods.
spRecognizer.UnloadGrammar(g);
spRecognizer.UnloadAllGrammars();
Listing 15
SRGS
Speech Recognition Grammar Specification (SRGS) is a W3C recommendation to build
grammar that is used in speech enabled applications. More details about SRGS can
be found at http://www.w3.org/TR/speech-grammar/.
The System.Speech.Recognition.SrgsGrammar namespace defines all functionality
related to SRGS. The SrgsDocument class represents a SRGS document. The
namespace also have classes for grammar objects such as SrgsElement, SgrsItem,
SrgsOneOf, SrgsRule, SrgsText, SrgsToken and so on. In WPF, each object has its
own class. Discussion of these classes in details is out of scope of this
chapter.
The following code snippet creates a Rule and sets its scope.
SrgsRule rootRule
= new SrgsRule("Months
and Days");
rootRule.Scope = SrgsRuleScope.Public;
The following code snippet adds an element to a Rule.
rootRule.Elements.Add(new SrgsItem("Months
and Days Grammar "));
And the following code snippet adds a rule to a document.
SrgsText textItem
= new SrgsText("Start
of the Document.");
SrgsRule textRule
= new SrgsRule("TextItem");
textRule.Elements.Add(textItem);
document.Rules.Add(textRule);
Listing 16 creates a complete SRGS document dynamically and saves this document
in an XML file. As you can see from Listing 16, the code adds rules for months
and days of week and some extra items as rules.
private SrgsDocument BuildDynamicSRGSDocument()
{
//
Create SrgsDocument
SrgsDocument document
= new SrgsDocument();
//
Create Root Rule
SrgsRule rootRule
= new SrgsRule("MonthsandDays");
rootRule.Scope
= SrgsRuleScope.Public;
rootRule.Elements.Add(new SrgsItem("Months
and Days Grammar "));
//
Create months
SrgsOneOf oneOfMonths
= new SrgsOneOf(
new SrgsItem("January"),
new SrgsItem("February"),
new SrgsItem("March"),
new SrgsItem("April"),
new SrgsItem("May"),
new SrgsItem("June"),
new SrgsItem("July"),
new SrgsItem("August"),
new SrgsItem("September"),
new SrgsItem("October"),
new SrgsItem("November"),
new SrgsItem("December")
);
SrgsRule ruleMonths
= new SrgsRule("Months",
oneOfMonths);
SrgsItem of
= new SrgsItem("of");
SrgsItem year
= new SrgsItem("year");
SrgsItem ruleMonthsItem
= new SrgsItem(new SrgsRuleRef(ruleMonths),
of, year);
//
Create Days
SrgsOneOf oneOfDays
= new SrgsOneOf(
new SrgsItem("Monday"),
new SrgsItem("Tuesday"),
new SrgsItem("Wednesday"),
new SrgsItem("Thursday"),
new SrgsItem("Friday"),
new SrgsItem("Saturday"),
new SrgsItem("Sunday")
);
SrgsRule ruleDays
= new SrgsRule("Days",
oneOfDays);
SrgsItem week
= new SrgsItem("week");
SrgsItem ruleDaysItem
= new SrgsItem(new SrgsRuleRef(ruleDays),
of, week);
//
Add items to root Rule
rootRule.Elements.Add(ruleMonthsItem);
rootRule.Elements.Add(ruleDaysItem);
//
Add all Rules to Document
document.Rules.Add(rootRule,
ruleMonths, ruleDays);
//
Add some extra sperate Rules
SrgsText textItem
= new SrgsText("Start
of the Document.");
SrgsRule textRule
= new SrgsRule("TextItem");
textRule.Elements.Add(textItem);
document.Rules.Add(textRule);
SrgsItem stringItem
= new SrgsItem("Item
as String.");
SrgsRule itemRule
= new SrgsRule("ItemRule");
itemRule.Elements.Add(stringItem);
document.Rules.Add(itemRule);
SrgsItem elementItem
= new SrgsItem();
SrgsRule elementRule
= new SrgsRule("ElementRule");
elementRule.Elements.Add(elementItem);
document.Rules.Add(elementRule);
//
Set Document Root
document.Root
= rootRule;
//
Save Created SRGS Document to XML file
XmlWriter writer
= XmlWriter.Create("DynamicSRGSDocument.Xml");
document.WriteSrgs(writer);
writer.Close();
return document;
}
Listing 16
The document generated by code Listing 16 looks like Figure 4.
Figure 4
We can load a SRGS document as a parameter in the Grammar constructor to create
a grammar from a SrgsDocument. The code snippet in Listing 17 loads a SRGS
Grammar document by calling SpeechRecognizer's LoadGrammar.
SpeechRecognizer spRecognizer
= new SpeechRecognizer();
spRecognizer.LoadGrammar(new Grammar(BuildDynamicSRGSDocument()));
Listing 17
Summary
Speech API (SAPI) 5.3 is a managed API comes with Windows Vista. This chapter
demonstrated how we can use SAPI in a WPF application to build speech-enabled
applications. First part of this article covered the text-to-speech (TTS) or
Speech Synthesis, Programming Speech in WPF - Speech Synthesis where we built an
application that convert text to speech. The second part of the article
discussed speech recognition where we built an application that captures the
speech from a voice device and convert to text. We also saw how to build speech
grammars and use these grammars in speech-enabled applications.