Description
This article provides some elementary information about how to implement Speech Recognition capabilities into your applications. The tools we would use to speech enable would be the speech SDK 5.1. Speech SDK 5.1 is the latest release in the speech product line from Microsoft. Speech SDK 5.1 can be used in various programming languages.
Introduction
Speech is one of the most natural way to interact. When it comes to computers it is no different. If an application can be controlled solely by way of voice commands then the opportunity that lies is unlimited. Even though the idea of using speech as an input mechanism for an application is not new there are not a lot of applications that use speech as in input. In other words, speech is still a big opportunity that is yet to be explored.
Microsoft speech SDK is one of the many tools that enable a developer to add speech capability into an application. Speech SDK can be used in either C#, C++, VB or any COM compliant language.
Broadly, speech can be divided in to two paradigms. Text to speech conversion and speech recognition. In this article, I shall be focusing on speech recognition conversion.
Command & Control Vs. Dictation
Speech recognition can be of two types based on the grammar that the recognition is based on. (Grammar is in other words the list of possible recognition outputs that can be generated.) An application can limit the possible combination of the words spoken by choosing proper grammar.
In a command and control scenario a developer provides a limited set of possible word combinations, and the speech recognition engine matches the words spoken by the user to the limited list. In command and control the accuracy of recognition is very high. It is always better for applications to implement command and control as the higher accuracy of recognition makes the application respond better.
In Dictation mode the recognition engine compares the input speech to the whole list of the dictionary words. For the dictation mode to have a high accuracy of recognition is it important that the user has prior trained the recognition engine by speaking in to it. The training or creating of a profile can be done by using the speech properties in the control panel.
Speech Recognition Engines
There are two different speech recognition engines, namely a Shared Recognition engine and an InProc recognition engine. A shared recognition engine can be shared across applications. This is the engine one would use when there could be multiple applications looking for speech input. A shared recognition context is recommended for most speech applications. On large speech applications that run on server alone an InProc speech recognition context is better suited.
The speech recognition engine interacts with applications using events that could be subscribed to by the application. A couple of the most important events are the recognition event and the hypothesis event. these event are raised when the engine make a good recognition or a hypothesis respectively. The code along with this article will show how to subscribe to these events.
Sample Application
The sample application with this article gives a idea to a developer on the steps one need to take in order to speech enable the menus in an application. The listing below shows a few of the important steps.
// Get an insance of RecoContext. I am using the shared RecoContext.
objRecoContext = new SpeechLib.SpSharedRecoContext();
// Assign a eventhandler for the Hypothesis Event.
objRecoContext.Hypothesis += new _ISpeechRecoContextEvents_HypothesisEventHandler(Hypo_Event);
// Assign a eventhandler for the Recognition Event.
objRecoContext.Recognition += new _ISpeechRecoContextEvents_RecognitionEventHandler(Reco_Event);
//Creating an instance of the grammer object.
grammar = objRecoContext.CreateGrammar(0);
//Activate the Menu Commands.
menuRule = grammar.Rules.Add"MenuCommands",SpeechRuleAttributes.SRATopLevel|
SpeechRuleAttributes.SRADynamic,1);
object PropValue = "";
menuRule.InitialState.AddWordTransition(null,"New"," ,SpeechGrammarWordType.SGLexical,"New", 1, ref PropValue, 1.0F );
menuRule.InitialState.AddWordTransition(null,"Open"," ",SpeechGrammarWordType.SGLexical,"Open", 2, ref PropValue, 1.0F );
//Commit the grammar rules for reco.
grammar.Rules.Commit();
grammar.CmdSetRuleState("MenuCommands", SpeechRuleState.SGDSActive);
The screenshot below shows the main form of the sample application.
Summary
This article gives an introduction to speech recognition using Speech SDK 5.1.