Using text-to-speech in announcements
Overview
You can configure Announcer and Menu applets to announce text in a voice of your choosing. You can either define the exact text to be announced or the data sources that will contain the text to be announced at run time. You can also specify the voice you want to use for the text in the applet, or use a data source that will contain the name of the voice at runtime. For information about the Announcer and Menu applets, see Announcer applet and Menu applet.
Voice interaction plans only
You can use this applet in an interaction plan that routes only calls, and not other types of interactions.
Neural voices
If enabled for your account, you can use neural voices to achieve more natural-sounding announcements in your text-to-speech applets. VCC currently supports Amazon Polly and Google Cloud TTS engines. This provides access to standard, neural, and generative voice models (including Wavenet, Neural2, Studio, and Chirp3-HD) across multiple languages and dialects.
Backward compatibility
Existing plans with unprefixed voice names continue to render via Amazon Polly.
For more information, see:
Limitations
Category | Limitation |
|---|---|
Character limit | Maximum 3,000 characters per request for all engines. |
SSML support | Chirp3-HD voices do not support SSML. If markup is provided, tags are stripped, and only plain text is synthesized. Other Google models and Amazon Polly fully support SSML. |
Latency | Generative voices (Chirp3-HD) may have a longer render time when processing new text for the first time. |
Voice identification
Google voices use a
Google-prefix (for example,Google-en-GB-Wavenet-A).Amazon voices are either unprefixed or labeled in parentheses (for example,
Burcu (Turkish, neural)).
Selecting voices
Go into the Menu applet/Announcer applet in your interaction plan.
Click on the dropdown menu next to the Voice field to select the voice.
Manual entry & fallback
If the interface uses a text box instead of a dropdown, you must include the
Google-prefix for Google voices.If a chosen neural voice is unavailable or does not have a neural version, the applet defaults to the standard version.
Speech Synthesis Markup Language (SSML)
If enabled for your account, you can use Speech Synthesis Markup Language(SSML), a W3C standard, in your text-to-speech applets. You can use SSML to control aspects of the speech synthesis, e.g., adding a pause between sentences/paragraphs or spelling out a specific word.
For your text to be recognized as SSML, it needs to begin with a <speak> tag and end with a </speak> tag. Please find below an example of a recognized SSML:
<speak>
Here is a word spelled out:
<say-as interpret-as='spell-out'>hello</say-as>.
</speak>Unsupported SSML tags
Not all existing SSML tags are supported by AWS Polly. For information on available and unavailable tags, see AWS Supported SSML tags page.
How do I use text-to-speech in announcements?
The applet sends your configured text to the Amazon Polly service which turns your text into speech in the specified voice. The service returns the speech and the applet plays it.
To use text-to-speech in announcements, perform the following steps:
In the same interaction plan as your data sources (if using data sources), create an Announcer or Menu applet.
In the Announcement type list, click Text to speech. Depending on which applet you are creating, you can configure the voice that announces the menu options in different ways.
Whether you use a static or dynamic value for the voice, it must match the voice ID provided by Amazon Polly. The voice is not case-sensitive. For a list of voices available in Amazon Polly and their IDs, see Voices in Amazon Polly (Amazon help).
In Text, define the text you want to be announced.
The Text field can contain text, digits, or a combination. You can include both static values and dynamic values from data sources.
To use a dynamic value from a data source, type the dollar sign ($). A list of available data sources appears. Click the data source that will contain the required value at run time
For example, Balance is $(account(1)|balance).
The Text field can contain a maximum of 1500 characters.Optionally, you can validate the audio.
Optionally enable barge-in.
Configure the rest of your interaction plan as required and click Update.
Now when Vonage Contact Center routes a call through the applet, the caller hears the configured text in the specified Amazon Polly voice.