Programming Speech in WPF - Speech Recognition

Mahesh Chand
13y
100k
0
3

Article

Introduction

In my previous article Programming Speech in WPF - Speech Synthesis, I covered text-to-speech functionality in WPF. This article is about speech to text, also known as speech recognition.

Speech Recognition is a reverse process of Speech Synthesis that converts speech to text. There are two major applications for speech recognition. The first application is people who are for some reason unable to type but can speak to the system and system will type text for them. For example, in endoscopic applications a surgeon can evaluate the patient and speak to the system. While surgeon is doing the evaluation, his hands are buys but he can speak. The second application is speech command enabled applications where instead of using mouse, we can use voice to run and execute an application commands.

Windows Vista and Window 7 comes with built-in Speech Recognition controls that allow you to setup speech related options such as voice settings, microphone, and other voice recognition settings. Let's take a quick look at what Control Panel has to offer related to Speech Recognition.

Go to Control Panel and open Speech Recognition Options. You will see a dialog looks like Figure 1.

Figure 1

As you can see from Figure 5, there are options to start speech recognition, setup your microphone, take speech tutorial, train your computer, and open reference card. You may want to click on these options one by one to understand Speech Recognition better.

If you click on first link Start Speech Recognition, it will activate speech recognition on the system and system will start listening sounds around your computer.

Next option is Set up Microphone. This option allows you to tell system what microphone to use if you have more than one. Otherwise system will use default microphone.

Next option Take Speech Tutorial is a step by step tutorial that teaches you how to use various system controls.

Next option, Train your computer to better understand is very important. Before you want build and test your application, I recommend you use this option and follow step by steps of the wizard. This wizard will understand your voice and ensures the accuracy of commands you sends to the system. If you do not train your computer for your voice, computer may not understand your command properly.

The component that is responsible for controlling and managing speech recognition is called Windows Desktop Speech Technology Recognition Engine (SR Engine).

When you build a Speech Recognition application and you do not setup microphone and voice settings, system will launch wizards and it will ask you to setup these settings. On Windows Vista machine, when first time you will use its and some Speech Recognition controls, you will notice a Windows application like figure 2.

Figure 2

That tells me that SR Engine is ready. We just need to enable this by saying first command start listening.

If you right click on Speech Recognition control, you will see various options that allow you to turn speech recognition on, off and put it in sleep mode as you see in Figure 3.

Figure 3

Speech Recognition API

Speech Recognition functionality is defined in the System.Speech.Recognition namespace. Before you start using Speech Recognition related functionality, you must import these two namespaces in your application:

using System.Speech;

using System.Speech.Recognition;

SpeechRecognizer

The SpeechRecognizer is the main component of Speech Recognition API. The SpeechRecognition class is listens and catches the spoken text from the system and converts it to text or text commands.

protected SpeechRecognizer spRecognizer = new SpeechRecognizer();

Enabling SpeechRecognizer

The State property returns the current state of SpeechRecognizer that can either by in Stopped or Listening. The Enabled property controls if the SpeechRecognizer is enabled and ready to listen or not. Listing 9 enables SpeechRecognizer by setting Enabled property to true.

SpeechRecognizer spRecognizer = new SpeechRecognizer();

spRecognizer.Enabled = true;

Listing 9

Reading Text

The SpeechRecignized event of SpeechRecognizer is raised when the recognition engine detects speech, and has found one or more phrases with sufficient confidence levels. This event is used to get the speech that is detected by the speech engine.

The code snippet in Listing 10 sets the SpeechRecognized event handler and gets the text recognized by the speech engine and copies it in a string.

SpeechRecognizer spRecognizer = new SpeechRecognizer();

spRecognizer.Enabled = true;

spRecognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(spRecognizer_SpeechRecognized);

void spRecognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)

{

string str = e.Result.Text;

}

Listing 10

Grammar and GrammarBuilder

One of the key usages of speech-enabled applications to build software product that listens to your commands and execute functionality based on the given commands. For example, instead of using a menu items to open and close files, we can build a system that will open and close a file when speech command Open File and Close File are sent to the speech system. The Grammar object of SpeechRecognizer handles these commands and the Grammar class is used to create a Grammar component.

The Grammar object in WPF represents a grammar document. The Grammar object fully supports the W3C Speech Recognition Grammar Specification (SRGS) and Context Free Grammar (CFG) specifications. You create a Grammar object by passing a GrammarBuilder object as a parameter in its constructor. Listing 11 creates a Grammar object by passing a Grammer Builder object as the default parameter of its constructor.

GrammarBuilder gBuilder = new GrammarBuilder();

// Construct GrammarBuilder here

// Create a Grammar from a GrammarBuilder

Grammar speechGrammar = new Grammar(gBuilder);

Listing 11

A GrammarBuilder object is used to provide a simple mechanism to build speech grammar. Add and Append methods of GrammarBuilder are used to add and append speech text, phrases and other GrammarBuilder objects to a grammar.

The methods of GrammarBuilder take parameters of either string or Choices object. The Choices object represents a list of alternative items to make up an element in a speech grammar.

The code snippet in Listing 12 creates a Grammar Builder using some Choices objects and then builds a Grammar object that can be load into a SpeechRecognizer.

private Grammar CreateGrammarDocument()

{

GrammarBuilder gBuilder = new GrammarBuilder();

// Construct GrammarBuilder here

gBuilder.Append(new Choices("Phone", "Email", "Text"));

gBuilder.Append("my");

gBuilder.Append(new Choices("Mom", "Dad", "Brother", "Sister"));

// Create a Grammar from a GrammarBuilder

Grammar speechGrammar = new Grammar(gBuilder);

return speechGrammar;

}

Listing 12

Here is a list of few sentences that can be constructed using Listing 12.

Phone my Mom
Text my Brother
Email my Mom
Phone my Brother
Email my Dad

Loading and Unloading Grammar

The LoadGrammar method of SpeechRecognizer synchronously loads a specific grammar into a SpeechRecognizer. The code snippet in Listing 13 calls LoadGrammar method and loads a grammar.

SpeechRecognizer spRecognizer = new SpeechRecognizer();

spRecognizer.LoadGrammar(CreateGrammarDocument());

Listing 13

The LoadGrammarSync method of SpeechRecognizer asynchronously loads a specific grammar into a SpeechRecognizer. The code snippet in Listing 14 calls LoadGrammarAsync method and loads a grammar.

SpeechRecognizer spRecognizer = new SpeechRecognizer();

spRecognizer.LoadGrammarAsync(CreateGrammarDocument());

Listing 14

The UnloadGrammar method unloads a given Grammar and UnloadAllGrammars method unloads all grammars in a SpeechRecognizer object. The code snippet in Listing 15 shows how to upload grammars using UnloadGrammar and UnloadAllGrammars methods.

spRecognizer.UnloadGrammar(g);
spRecognizer.UnloadAllGrammars();

Listing 15

SRGS

Speech Recognition Grammar Specification (SRGS) is a W3C recommendation to build grammar that is used in speech enabled applications. More details about SRGS can be found at http://www.w3.org/TR/speech-grammar/.

The System.Speech.Recognition.SrgsGrammar namespace defines all functionality related to SRGS. The SrgsDocument class represents a SRGS document. The namespace also have classes for grammar objects such as SrgsElement, SgrsItem, SrgsOneOf, SrgsRule, SrgsText, SrgsToken and so on. In WPF, each object has its own class. Discussion of these classes in details is out of scope of this chapter.

The following code snippet creates a Rule and sets its scope.

SrgsRule rootRule = new SrgsRule("Months and Days");

rootRule.Scope = SrgsRuleScope.Public;

The following code snippet adds an element to a Rule.

rootRule.Elements.Add(new SrgsItem("Months and Days Grammar "));

And the following code snippet adds a rule to a document.

SrgsText textItem = new SrgsText("Start of the Document.");

SrgsRule textRule = new SrgsRule("TextItem");

textRule.Elements.Add(textItem);

document.Rules.Add(textRule);

Listing 16 creates a complete SRGS document dynamically and saves this document in an XML file. As you can see from Listing 16, the code adds rules for months and days of week and some extra items as rules.

private SrgsDocument BuildDynamicSRGSDocument()

{

// Create SrgsDocument

SrgsDocument document = new SrgsDocument();

// Create Root Rule

SrgsRule rootRule = new SrgsRule("MonthsandDays");

rootRule.Scope = SrgsRuleScope.Public;

rootRule.Elements.Add(new SrgsItem("Months and Days Grammar "));

// Create months

SrgsOneOf oneOfMonths = new SrgsOneOf(

new SrgsItem("January"),

new SrgsItem("February"),

new SrgsItem("March"),

new SrgsItem("April"),

new SrgsItem("May"),

new SrgsItem("June"),

new SrgsItem("July"),

new SrgsItem("August"),

new SrgsItem("September"),

new SrgsItem("October"),

new SrgsItem("November"),

new SrgsItem("December")

);

SrgsRule ruleMonths = new SrgsRule("Months", oneOfMonths);

SrgsItem of = new SrgsItem("of");

SrgsItem year = new SrgsItem("year");

SrgsItem ruleMonthsItem = new SrgsItem(new SrgsRuleRef(ruleMonths), of, year);

// Create Days

SrgsOneOf oneOfDays = new SrgsOneOf(

new SrgsItem("Monday"),

new SrgsItem("Tuesday"),

new SrgsItem("Wednesday"),

new SrgsItem("Thursday"),

new SrgsItem("Friday"),

new SrgsItem("Saturday"),

new SrgsItem("Sunday")

);

SrgsRule ruleDays = new SrgsRule("Days", oneOfDays);

SrgsItem week = new SrgsItem("week");

SrgsItem ruleDaysItem = new SrgsItem(new SrgsRuleRef(ruleDays), of, week);

// Add items to root Rule

rootRule.Elements.Add(ruleMonthsItem);

rootRule.Elements.Add(ruleDaysItem);

// Add all Rules to Document

document.Rules.Add(rootRule, ruleMonths, ruleDays);

// Add some extra sperate Rules

SrgsText textItem = new SrgsText("Start of the Document.");

SrgsRule textRule = new SrgsRule("TextItem");

textRule.Elements.Add(textItem);

document.Rules.Add(textRule);

SrgsItem stringItem = new SrgsItem("Item as String.");

SrgsRule itemRule = new SrgsRule("ItemRule");

itemRule.Elements.Add(stringItem);

document.Rules.Add(itemRule);

SrgsItem elementItem = new SrgsItem();

SrgsRule elementRule = new SrgsRule("ElementRule");

elementRule.Elements.Add(elementItem);

document.Rules.Add(elementRule);

// Set Document Root

document.Root = rootRule;

// Save Created SRGS Document to XML file

XmlWriter writer = XmlWriter.Create("DynamicSRGSDocument.Xml");

document.WriteSrgs(writer);

writer.Close();

return document;

}

Listing 16

The document generated by code Listing 16 looks like Figure 4.

Figure 4

We can load a SRGS document as a parameter in the Grammar constructor to create a grammar from a SrgsDocument. The code snippet in Listing 17 loads a SRGS Grammar document by calling SpeechRecognizer's LoadGrammar.

SpeechRecognizer spRecognizer = new SpeechRecognizer();

spRecognizer.LoadGrammar(new Grammar(BuildDynamicSRGSDocument()));

Listing 17

Summary

Speech API (SAPI) 5.3 is a managed API comes with Windows Vista. This chapter demonstrated how we can use SAPI in a WPF application to build speech-enabled applications. First part of this article covered the text-to-speech (TTS) or Speech Synthesis, Programming Speech in WPF - Speech Synthesis where we built an application that convert text to speech. The second part of the article discussed speech recognition where we built an application that captures the speech from a voice device and convert to text. We also saw how to build speech grammars and use these grammars in speech-enabled applications.

Mindcracker

Founded in 2003, Mindcracker is the authority in custom software development and innovation. We put best practices into action. We deliver solutions based on consumer and industry analysis.