What is SSML?
Speech Synthesis Mark-up Language is a mark-up language that is an XML base
for speech synthesis applications. It is recommended for W3C
in voice browser applications. Furthermore, ASK supports a subset of
SSML tags.
You can use SSM
in your skill response, which will help you have additional control
over speech generation. Skill automatically handles
punctuation, like speaking a sentence ending in a question mark or
pausing for a few seconds.
Let's understand
how to construct output speech using SSML:
- // build the SSML response
- var speech = new SsmlOutputSpeech();
- speech.Ssml = "<speak>This is SSML respnose.</speak>";
- // build the response using ResponseBuilder
- var finalResponse = ResponseBuilder.Tell(speech);
- return finalResponse;
In the above example
we have created an instance of SSMlOutputSpeech. We set the value of SSML property, provided the speak tag in SSML property. You now need to
pass an instance of the same to ResponseBuilder.
Now let's
go over some of the tags.
- <speak>
- I want to tell you a secret.
- <amazon:effect name="whispered">I am not a real human.</amazon:effect>.
- Can you believe it?
- </speak>
-
-
- Now lets understand with example,
- // build the SSML response
- var speech = new SsmlOutputSpeech();
- speech.Ssml = @"<speak> I want to tell you a secret. <amazon:effect name = ""whispered"" > I am not a real human.</amazon:effect>. Can you believe it? </ speak >";
-
-
- // build the response using ResponseBuilder
- var finalResponse = ResponseBuilder.Tell(speech);
- return finalResponse;
In the above example,
you have set the value of name property as whispered. It speaks
the first line as usual, however, the line in <amazon: effect>
tag will be whispered.
You can try the above
sample in Voice & Tone in Test tab of your skill. As shown below:
Audio:
For playing mp3 files in response, you need to use an audio tag. It
has an src attribute that needs the path of the mp3 file. However, you need to
take care of a few points while using mp3 as described below:
-
Mp3 files
must be hosted on internet-accessible https endpoint along with
trusted non-self-signed SSL certificate.
-
Audio file
cannot be longer than 240 seconds
-
Bit rate
must be 48 kbps
-
Sample
rate must be 22050Hz, 24000Hz or 16000Hz
You need to use
the audio tag inside speak tag as shown below:
- // build the SSML response
- var speech = new SsmlOutputSpeech();
- speech.Ssml = @"<speak>Welcome to Ride Hailer.<audio src=""soundbank://soundlibrary/transportation/amzn_sfx_car_accelerate_01""/>You can order a ride, or request a fare estimate. Which will it be?</speak>";
- // build the response using ResponseBuilder
- var finalResponse = ResponseBuilder.Tell(speech);
- return finalResponse;
As shown in the above example, we have used a sound library. However, you can store mp3
file on S3 and provide the path. You can also use the below library:
- Break: With the help of this tag you can add a pause in speech. You can set the
length of pause with the strength or time attribute as described below
- time: This
indicates the number of seconds or milliseconds you can specify maximum 10s or
10000ms
- strength: you can use the below values
- none
- weak
- x-weak
- medium
- strong
- x- strong
If you will not
specify any value, then it will take medium as the default.
Lets' see the
below example:
- // build the SSML response
- var speech = new SsmlOutputSpeech();
- speech.Ssml = @"<speak> There is a five second pause here <break time =""3s""/> then the speech continues. However you can keep till ten seconds</speak>";
- // build the response using ResponseBuilder
- var finalResponse = ResponseBuilder.Tell(speech);
- return finalResponse;