Using text-to-speech in announcements

Using text-to-speech in announcements

Overview

You can configure Announcer and Menu applets to announce text in a voice of your choosing. You can either define the exact text to be announced or the data sources that will contain the text to be announced at run time. You can also specify the voice you want to use for the text in the applet, or use a data source that will contain the name of the voice at runtime. For information about the Announcer and Menu applets, see Announcer applet and Menu applet.

Voice interaction plans only

You can use this applet in an interaction plan that routes only calls, and not other types of interactions.

Neural voices

If enabled for your account, you can use neural voices to achieve more natural-sounding announcements in your text-to-speech applets. VCC currently supports Amazon Polly and Google Cloud TTS engines. This provides access to standard, neural, and generative voice models (including Wavenet, Neural2, Studio, and Chirp3-HD) across multiple languages and dialects.

Backward compatibility

Existing plans with unprefixed voice names continue to render via Amazon Polly.

For more information, see:

Limitations

Category

Limitation

Category

Limitation

Character limit

Maximum 3,000 characters per request for all engines.

SSML support

Chirp3-HD voices do not support SSML. If markup is provided, tags are stripped, and only plain text is synthesized. Other Google models and Amazon Polly fully support SSML.

Latency

Generative voices (Chirp3-HD) may have a longer render time when processing new text for the first time.

Voice identification

  • Google voices use a Google- prefix (for example, Google-en-GB-Wavenet-A).

  • Amazon voices are either unprefixed or labeled in parentheses (for example, Burcu (Turkish, neural)).

Selecting voices

  1. Go into the Menu applet/Announcer applet in your interaction plan.

  2. Click on the dropdown menu next to the Voice field to select the voice.

Manual entry & fallback

  • If the interface uses a text box instead of a dropdown, you must include the Google- prefix for Google voices.

  • If a chosen neural voice is unavailable or does not have a neural version, the applet defaults to the standard version.

Speech Synthesis Markup Language (SSML)

If enabled for your account, you can use Speech Synthesis Markup Language(SSML), a W3C standard, in your text-to-speech applets. You can use SSML to control aspects of the speech synthesis, e.g., adding a pause between sentences/paragraphs or spelling out a specific word. 

For your text to be recognized as SSML, it needs to begin with a <speak> tag and end with a </speak> tag. Please find below an example of a recognized SSML:

<speak> Here is a word spelled out: <say-as interpret-as='spell-out'>hello</say-as>. </speak>

Unsupported SSML tags

Not all existing SSML tags are supported by AWS Polly. For information on available and unavailable tags, see AWS Supported SSML tags page.

How do I use text-to-speech in announcements?

To use any dynamic elements in the Announcer and Menu applets, you must configure data sources. These data sources must contain the required values for the elements when Vonage Contact Center routes the call to the applet.

The applet sends your configured text to the Amazon Polly service which turns your text into speech in the specified voice. The service returns the speech and the applet plays it.

To use text-to-speech in announcements, perform the following steps:

  1. In the same interaction plan as your data sources (if using data sources), create an Announcer or Menu applet.

  2. In the Announcement type list, click Text to speech. Depending on which applet you are creating, you can configure the voice that announces the menu options in different ways.

    Whether you use a static or dynamic value for the voice, it must match the voice ID provided by Amazon Polly. The voice is not case-sensitive. For a list of voices available in Amazon Polly and their IDs, see Voices in Amazon Polly (Amazon help).

  3. In Text, define the text you want to be announced.
    The Text field can contain text, digits, or a combination. You can include both static values and dynamic values from data sources.
    To use a dynamic value from a data source, type the dollar sign ($). A list of available data sources appears. Click the data source that will contain the required value at run time
    For example, Balance is $(account(1)|balance)
    The Text field can contain a maximum of 1500 characters.

  4. Optionally, you can validate the audio.

  5. Optionally enable barge-in.

  6. Configure the rest of your interaction plan as required and click Update.

Now when Vonage Contact Center routes a call through the applet, the caller hears the configured text in the specified Amazon Polly voice.

Support and documentation feedback

For general assistance, please contact Customer Support.

For help using this documentation, please send an email to docs_feedback@vonage.com. We're happy to hear from you. Your contribution helps everyone at Vonage! Please include the name of the page in your email.