Text to Speech in Visual C# 2005


Introduction:

This article describes an application used to exercise some of the Text To Speech features available to .NET developers through the Microsoft Speech 5.1 SDK.  This article does not address the newer speech server related libraries nor does it address web based deployments of speech related technologies.

The application performs several functions although all work in basically the same manner.  The application is intended to provide a introduction to working with the TTS library by illustrating how to go about gaining access to and manipulating voices, and playing text out as synthesized voice.  The application provides examples of generating speech as you type, passing canned phrases to TTS, and passing entire text files to TTS.

Getting Started:

In order to get started, unzip the included project and open the solution in the Visual Studio 2005 environment.  You will note that the project contains a file cleverly named "Form1.cs".  This form contains all of the code necessary to get a start with programming TTS.

To begin, you may not have the necessary references on your machine as the application requires the installation of Microsoft's speech 5.1 SDK and the Microsoft sample TTS engine library.  These may be downloaded with the SDK at no cost from this URL:

Speech 5.1 SDK:  http://www.microsoft.com/downloads/details.aspx?FamilyId=5E86EC97-40A7-453F-B0EE-6583171B4530&displaylang=en

You may also obtain a couple of additional voices (the SDK includes Microsoft Mary, Microsoft Mike, and Microsoft Sam) by downloading and the Microsoft Reader and additional TTS components found on this URL: (not required, but you will gain two additional voices if you do add these to your system)

http://www.microsoft.com/reader/downloads/default.asp

You do not need to activate the reader for this to work, however, you can't install the additional voices unless you have the reader installed.

If you have any other voices on your system, they may also be exposed to the application.  For example, my Toshiba laptop has an additional voice called "TOSHIBA male adult (U.S.)" and this voice also appears as available to this application at runtime.

If you need to update the project references, do so prior to attempting to run the application.  Once you have installed the speech SDK, go back to the project and run a build.  If the references are absent, remove these (highlighted) references: (Figure 1)

texttospeech1.gif

Figure 1:  Speech Related Project References

After removing the old references, right click on the project and select "Add Reference".  Once the dialog opens, select the COM tab (then go get a cup of coffee while it takes forever to load) and when you get back, look for and add these two references (figures 2 and 3) (Note: You really don't need the second reference to the sample TTS engine):

texttospeech2.gif


Figure 2:  Add the Microsoft Speech Object Library Reference

texttospeech3.gif

Figure 3:  Add the Sample TTS Engine Type Library Reference

Having added these references, go ahead and do a build to see if anything else is missing.  If anything turns up, add the missing project references in the same manner and build again.  Once you have a  good build, go ahead and run the application.  On start, you will see this form appear:

texttospeech4.gif

Figure 4:  The main form of the TTS Reader application

Looking at the form note that it has five control groups:  "Configuration", "Speak As You type", "Speak Specific Phrases", "Speak On Enter", and "Load a Text File and Read It".

Configuration.

This control group contains two controls, the speaker combo box, and the speech rate track bar control.  The speaker combo box is populated with the names of each of the TTS speaker voices, you may change the current speaker by selecting a different option form this combo box. 

The rate track bar control will speed up or reduce the cadence of the synthesized speech.  It is set to contain five positions and whenever its value is changed, the rate of speech will be altered to execute at the newly set rate.

Speak As You Type.

This control group contains a single text box which has been configured such that, whenever the user hits the space bar, the speaker will read the contents of the text box and, once finished reading, it will clear the text box.  The intent here was to see if you could type as you go and speak through TTS.  It seemed like a nice idea and it seems like it would be worthwhile for someone lacking the capacity for speech to use a function like this to speak by typing.  In reality, the action is a little choppy and the speech rendered is not too terrific.  With the application running, you may key in a word and listen to the results for yourself.  If you type slow enough, it is adequate but it is not quite quick enough to use as a form of conversation.

Speak Specific Phrases.

This control group contains a single combo box; whenever a new value is selected from the box, it will immediately be read by the speaker. 

Speak On Enter.

This appears to be a far more viable way to conduct a conversation using TTS as a voice medium.  This control works in a manner very similar to the "Speak As You Type" option, however, it reads and clears the text box only after the user hits the "enter" key.  You may try typing in a sentence and then hitting the enter key to get a feel for how that works.

Load a Text File and Read It.

This control group contains a single multi-line text box control and three buttons:  "Open File", "Stop", and "Read File".  Click on the "Open File" button and use the open file dialog box to navigate to any text file.  The text file will load into the text box and with a file loaded, you may click on the "Read File" button to have the speaker read the contents of the text box end to end.  TTS does a fair job of this however I will point out that punctuation and abbreviations do not work out too well for the 5.1 SDK.

You may also key text into the text box and evoke the "Read File" function to read the contents of the text box.

The Code.

The code is pretty straight forward and easy to follow.  The class definition begins as follows:

using SpeechLib;

using System.Environment;

using System.DateTime;

 

Public Class Form1

 

#region "Declarations"

   

    public SpVoice vox = new SpVoice();

    public int RateOfSpeech = 3;

 

#endregion

 

    private void Form1_Load(object sender, EventArgs e)

        {

            ISpeechObjectToken Token;

            foreach (int Token in vox.GetVoices)

            {

                cboVoxOptions.Items.Add(Token.GetDescription());

            }

            cboVoxOptions.SelectedIndex = 0;

            string str = Environment.UserName.ToString();

            SayGreeting(str);

        }

 

As you can see, the imports section includes the speech library.  A declaration region was next defined and two variables were declared within that region.  The first creates an instance of an SpVoice and note that the declaration is made with events.  The other variable, RateOfSpeech, is used to keep track of the current rate of speech selected using the rate of speech track bar control.

In form load, we begin by collecting all of the current voices and adding them to the combo box used to select a speaker.  The current index is set to zero such that, when the form loads, a current speaker will be defined.

The last two lines of the form load subroutine are used to capture the user's name (however it may be defined on the target machine) and to pass the name to the Say Greeting subroutine.  The "Say Greeting" subroutine is used to present a welcome message to the user through TTS.  The "Say Greeting" subroutine is written as follows:

    public void SayGreeting(string strUser)

        {

            vox.Voice = vox.GetVoices().Item(cboVoxOptions.SelectedIndex);

            DateTime dt;

            dt = Now;

            vox.Rate = RateOfSpeech;

            vox.Speak("".ToString, SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);

            try

            {

                vox.Speak("Greetings " + strUser + " from Text To Speech",     

                SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);

                vox.Speak("Today's Date is " + dt.ToShortDateString,

                SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);

                vox.Speak("The time is " + dt.ToShortTimeString, SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);

            }

            catch (Exception ex)

            {

                MsgBox(ex.ToString, MsgBoxStyle.Exclamation, "I'm Speechless");

            }

        }

 

As you can see, the subroutine formats a message containing the passed in user name as well as the date and time and then reads that message aloud using the current speaker voice.  Note the use of the SVSFPurgeBeforeSpeak flag; it is there to ensure that the speaker will finish the last statement before progressing on to the next one.

Next up is the track bar control's handler, it is written as follows:

    private void tbarRateOfSpeech_Scroll(object sender, System.EventArgs e)

        {

            this.RateOfSpeech = tbarRateOfSpeech.Value;

        }

This function merely sets the rate of speech variable to contain the current track bar value.  The variable is used to set the rate property for the speaker whenever the speaker is passed text to read.

Following the track bar control handler, you will see the following code:

    private void TextBox1_KeyPress(object sender, System.Windows.Forms.KeyPressEventArgs e)

        {

            vox.Rate = RateOfSpeech;

            if (e.KeyChar == Microsoft.VisualBasic.ChrW(Keys.Space) | e.KeyChar == Microsoft.VisualBasic.ChrW

            (Keys.Enter))

            {

                vox.Speak(TextBox1.Text, SpeechVoiceSpeakFlags.SVSFDefault);

                TextBox1.Text = "";

            }

        }

 

This bit of code is used to drive the Speak As You Type function, here the rate of speech is set to the current rate of speech variable's value and the text box is set to look for a space key hit; whenever a space is entered, the code will pass the contents of the text box to the speaker, the speaker will read the text, and then the text box will be cleared and made ready for the next word to be typed.

The next bit of code will drive the Speak On Enter function, the code is identical to that used in the Speak As You Type function but rather than reading out the contents of the text box on space, the contents will be read out whenever the user hits the enter key.  That code looks like this:

    private void TextBox2_KeyPress(object sender, System.Windows.Forms.KeyPressEventArgs e)

        {

            vox.Rate = RateOfSpeech;

            if (e.KeyChar == Microsoft.VisualBasic.ChrW(Keys.Enter))

            {

                vox.Speak(TextBox2.Text, SpeechVoiceSpeakFlags.SVSFDefault);

                TextBox2.Text = "";

            }

        }

 

The last pieces of code to look at manage the function used to read from a text file.  The first item is used to open a file open dialog and read a text file into the control group's text box.  That code looks like this:

 

    private void btnOpenFile_Click(object sender, System.EventArgs e)

        {

            vox.Rate = RateOfSpeech;

            if (OpenFileDialog1.ShowDialog() == Windows.Forms.DialogResult.OK)

            {

                System.IO.StreamReader sr = new System.IO.StreamReader(OpenFileDialog1.FileName);

                this.txtReadFile.Text = sr.ReadToEnd.ToString();

                sr.Close();

            }

        }


The next bit is used to read the file, it looks like this:

 

    private void btnReadFile_Click(object sender, System.EventArgs e)

        {

            vox.Rate = RateOfSpeech;

            vox.Speak(txtReadFile.Text.ToString(), SpeechVoiceSpeakFlags.SVSFlagsAsync);

        }

You will note that the function is basically the same as that used to read from one of the other form text boxes (note that the speak flag is set to the asynchronous mode).  The next item to look at is used to stop the speaker from continuing to read from the text; that code looks like this:

    private void btnStop_Click(object sender, System.EventArgs e)

        {

            vox.Speak("", SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);

        }

This subroutine passes an empty string to the speaker and in so doing stops the speaker from continuing.

The last bit of code in the application is used to change the speaker's voice to one selected from the speaker combo box, that code looks like this:

    private void cboVoxOptions_SelectedIndexChanged(object sender, System.EventArgs e)

        {

            vox.Voice = vox.GetVoices().Item(cboVoxOptions.SelectedIndex);

        }

}

 

Summary.

This article and code sample was intended to provide a very easy introduction into TTS based speech synthesis; there are a great many more things that you can do with the speech SDK than have been addressed in this document.  A review of the contents of the speech SDK will provide greater details on the use of the speech libraries.

NOTE: THIS ARTICLE IS CONVERTED FROM VB.NET TO C# USING A CONVERSION TOOL. ORIGINAL ARTICLE CAN BE FOUND ON VB.NET Heaven (http://www.vbdotnetheaven.com/).