Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A categorization profile contains categories and subcategories for the Conversation Analyzer feature. Conversation Analyzer uses the profile to categorize transcripts of call recordings. The profile also contains any substitution and redaction rules you provide. Using the substitution and redaction rules, Conversation Analyzer refines the transcribed text.

The categorization profile applies to the associated account. For information about where you can view the categorized recordings and refined transcripts, see Listening to, viewing, and commenting on a call recordinginteraction content.

Categorization profiles are written in JavaScript Object Notation (JSON). For information about JSON, see https://www.json.org/.

Categorization profile structure

A categorization profile consists of the following top-level elements:

...

skipCallsUnder (an integer)

...

rules (an array of one or more categorization rule objects).

Info

Not used.

...

  • party (a name/value pair)
  • expression (a name/value pair)

subcategories (an array of one or more nested subcategory objects).

Info

Not used.

...

  • party (a name/value pair)
  • find (a name/value pair)
  • replace (a name/value pair)

...

Supervisors and administrators can manage profiles using Category Editor.

Info

Supervisors and administrators can export to, restore from or create categorization profiles using JSON. However, we recommend using Category Editor for these tasks.


Panel
borderColor#eeeeee
bgColorwhite
titleColorwhite
borderWidth1
titleBGColor#232323
borderStylesolid
titleIn this page

Table of Contents
depth2

...

Note
titleAnalyzing transcripts

Conversation Analyzer analyzes transcripts in several steps:

  1. Conversation Analyzer identifies characters in the transcripts. Characters are either word or non-word nonword characters. Characters from the Unicode categories (see expression and find value validation), plus apostrophes, such as letters, spacing and non-spacing marks, connectors, and decimal numbers are word characters. Other characters are non-word nonword characters and act as word separators. Non-word Nonword characters include !, £, $, %, ^, &, *, (, ), and -.
  2. Conversation Analyzer uses findings from step 1 to identify the individual words in the transcripts.
  3. Conversation Analyzer looks for words in the transcripts that match the rules in the categorization profile:
    1. Conversation Analyzer applies substitution rules first, replacing text if found.
    2. Conversation Analyzer tags the processed transcripts with the corresponding categories if found.

Profile name and language

Conversation Analyzer uses a call's Language and ConversationAnalyzerProfile data source values to identify the categorization profile to use to categorize and refine the call recording. Both language and name . To set the profile's name and language in Category Editor, use the Profile name field and the Language drop-down when creating a categorization profile. Both the Language and Profile name field values need to match the Language and ConversationAnalyzerProfile data source values to identify the profile. For information about how the Language and ConversationAnalyzerProfile data sources get their values, see Overview of Conversation Analyzer.

...

For more information about creating profiles, see Managing categorization profiles.

Skip calls under

You can configure Conversation Analyzer to not process short calls. Short calls are those that are shorter than your configured threshold. This threshold is in seconds and is defined in the skipCallsUnder parameter. The skipCallsUnder parameter is a single integer—in whole seconds only—and you can configure skipCallsUnder for each categorization profile. By default, skipCallsUnder is 0 resulting in Conversation Analyzer processing all calls.

customDictionary

The custom dictionary allows you to specify words that are not common for the language you're using, but are common for your use case. For example, you can specify your product or brand names, locations and so on. Custom dictionary is currently only available in US node and for language en-us. The dictionary helps improve the quality of transcripts.

Note

The dictionary is just a hint for transcription engine. There is no guarantee that the engine will use the hints during transcription.

Example customDictionary

...

Code Block
  "customDictionary": {
     "phrases": ["New Voice Media", "Elastix","Contact Pad","Connect","Conversation Analyzer","Spectrum"]
  }

You can specify words in phrases array. You can use as many hints as you want, but we recommend that you keep the list short. Words must not contain following characters: !, (, ), and `(grave accent, not apostrophe).

Info

The following tip can help increase the success of your custom dictionary:

  • Where a brand or product is made up of multiple words, separate the words in the dictionary. For example, include "New Voice Media" instead of "NewVoiceMedia".
  • Use phrases that include a brand or product name as well as the name on its own. The transcription engine is more likely to transcribe longer phrases than a single word. For example, "Welcome to New Voice Media" is more effective than "New Voice Media". List phrases before the name on its own.
  • Limit the number of dictionary entries to the most important values. A large number of entries in the dictionary may conflict.

Categorization rules

As part of transcribing recordings, Conversation Analyzer categorizes the textual contents of the transcript, by identifying specific words and phrases that correspond to defined categories. A category is a collection of subcategories, which in turn contain a series of rules. Each rule consists of a word or phrase and the party who said that word or phrase. If the transcript contains the word or phrase and was spoken by the specified party, the transcript matches the category.

For example, you may want to track how polite your agents are when speaking with customers. Create a category of 'Politeness' that contains subcategories that look for phrases such as 'Please', 'Thank you' and 'You're welcome'. You may also want to ensure that agents are promoting a new product or service. You would need to create a category for the product or service with subcategories identifying incidences of the agent using terms relating to the product or service.

Note
Conversation Analyzer applies categorization rules to processed transcripts—text that Conversation Analyzer has applied substitution rules to—rather than the original text. Keep this in mind when you create your categories.

Example categorization profile

In the following example, the categorization profile—Cat_example—contains one category—CatsCats contains two subcategories, Cat details and Cat position. Each of these subcategories contains three rules—one rule for each party. The substitution array contains no rules. If more than one rule applies to some text in the transcript, that text will appear in multiple categories.

...

Code Block
languagejava
{
    "name": "Cat_example",
	"language": "en-us",
    "categories": [
        {
			"name": "Cats",
			"rules": [],
			"subcategories": [
				{
					"name": "Cat position",
                    "rules": [
                        {
                            "party": "customer",
                            "expression": "cat * sat mat ~2"
                        },
                        {
                            "party": "agent",
                            "expression": "cat mat ~3"
                        },
                        {
                            "party": "either",
                            "expression": "cat * stairs"
                        }
                    ],
                    "subcategories": []
                },
                {
                    "name": "Cat details",
                    "rules": [	
				        {
                            "party": "customer",
                            "expression": "cat is ## years old"
                        },
                        {
                            "party": "agent",
                            "expression": "your cat?"
                        },
                        {
                            "party": "either",
                            "expression": "cat has # legs"
                        }
			        ],
                    "subcategories": []
                }
			]
		}
    ]
    "substitution": []
}

The following sections describe the party and expression name/value pairs.

party

party defines the speaker who must say the word or phrase defined by the rule expression for the transcript to match the category. party can be customer, agent, or either. either means that the rule applies to what the agent, the customer or both parties said.

The format of party is "party": "valuewhere value can be:

  • customer
  • agent
  • either

expression

The expression name/value pair in a rule defines the text that must appear in the transcript to match the category.

The categorization expression language describes the format of an expression. The language supports simple expressions where the presence of the exact word or phrase would result in a match. For information about the categorization expression language, see Categorization expression language.

Substitution and redaction rules

Along with applying categorization rules to a conversation transcript, Conversation Analyzer applies substitution and redaction rules to refine the output:

  • Substitution rules replace commonly mis-transcribed words and improve the spelling of words. You will most likely require these rules for proper nouns, such as place, company or product names. For example, Conversation Analyzer may transcribe 'Basingstoke' as 'Beijing spoke', or 'NewVoiceMedia' as 'new voice media'. Create rules that replace the incorrect word or words.
  • Redaction rules replace sensitive information such as credit card details. Redaction rules are specific type of substitution rules in that instead of using them to refine and clarify phrases in the transcript output, you use them to obscure the content. Use a redaction rule to replace specified text with text such as '(redacted)', '(removed)', or 'xxxxxxxxxxxxxx'.

Example categorization profile (substitution and redaction rules only)

In the following example, the categorization profile—Subs_example—contains three substitution rules. The categories array contains no rules.

...

Code Block
languagejava
{
    "name": "Subs_example",
	"language": "en-us",
    "categories": [ ]
    "substitution": [
        {
            "party": "agent",
            "find": "new voice media",
            "replace": "NewVoiceMedia"
        },
        {
            "party": "either",
            "find": "Beijing spoke",
            "replace": "Basingstoke"
        },
        {
            "party": "customer",
            "find": "my card number is *",
            "replace": "xxxx xxxx xxxx xxxx"
        }
    ]
}

The following sections describe the find and replace name/value pairs. For information about the party name/value pair, see party.

find

The find name/value pair in a rule defines the text that must appear in the transcript to match the substitution rule.

The categorization expression language describes the format of the value in the find name/value pair. The language supports simple values where the presence of the exact word or phrase would result in a match. For information about the categorization expression language, see Categorization expression language.

replace

The replace name/value pair in a rule defines the text that will replace the found text.

Applying substitution and redaction rules result in Conversation Analyzer modifying transcript text. Because of this, you must take extra care when writing your rules. For more information about substitution rules, see Substitution and redaction rules continued.

Categorization expression language

The categorization expression language describes the required format of the values you provide in the expression and find name/value pairs. Conversation Analyzer can then use these values to locate matching text in the transcripts.

Use the categorization expression language to define the categorization, substitution and redaction rules.

expression and find value validation

Valid expression and find values contain only alphanumeric, apostrophe and space characters; that is, values can contain spaces (U+0020), apostrophes (U+0027), and characters from the following Unicode categories:

Values can be no more than 100 characters long.

Wildcards in values

The categorization expression language supports the following wildcards within the values. Examples refer to the expression name/value pair, but exactly the same rules apply to find name/value pairs.

...

Wildcard

...

Description

...

Example expressions

...

The following words will match the example expression: "who" and "why". For an example of an expression using the ? wildcard, see Example 2. Expression using the ? character wildcard.

...

The following words will match the example expression: "sit", "sits", "sitting". For an example of an expression using the * wildcard, see Example 3. Expression using the * character wildcard.

Note

To use * to represent a character or characters, ensure that the * is contiguous with the characters in the containing word.

You can also use * to represent a word or words. For information, see Wildcard representing zero to many words.

...

Only digits will match the example expression, not text.

Text containing "123" will match the example expression but text containing "one two three" will not.

For an example of an expression using the # wildcard, see Example 4. Expression using the # character wildcard.

...

The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits happily on the mat".

For an example of an expression using the * wildcard, see Example 6. Expression using the * word wildcard.

Note

To use * to represent a word or words, type a space between the * and any other characters in the expression.

You can also use * to represent a character or characters. For information, see Wildcard representing zero to many characters.

...

A phrase that contains N or fewer words between the specified words will match the example expression.

The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits on the mat".

For an example of an expression using the ~N wildcard, see Example 7. Expression using the ~N wildcard.

Note

If used, the expression must appear at the end of the phrase.

If the expression contains more than two words, ~N applies to the number of words between any of the specified words.

For an example of an expression using using the ~N wildcard with more than two words, see Example 8. Expression using the ~N wildcard.

expression examples 

Example 1. Simple expression

"expression": "the cat sat"

With a simple expression, only the exact word or phrase will satisfy the rule.

Example 2. Expression using the ? character wildcard

"expression": "the cat? sat"

The ? in the expression represents a single character that must appear after "cat" but before "sat" in matching text.

...

Text

...

Does it match?

...

Explanation

...

The ? in the expression represents the "s" in the text.

...

The expression does not allow any additional characters after "the".

Example 3. Expression using the * character wildcard

"expression": "sit*"

The * in the expression represents zero to many characters that can appear after "sit" in matching text.

...

Text

...

Does it match?

...

Explanation

...

The * in the expression represents the "s" in the text.

...

The * in the expression represents the "ting" in the text.

...

Example 4. Expression using the # character wildcard

"expression": "### ###"

Matching text must contain two sets of three digits, separated by a non-word character and no other characters.

...

Text

...

Does it match?

...

Explanation

...

Example 5. Expression using the ?? wildcard

"expression": "wh?? cat"

The ?? in the expression represents two characters must appear after "wh" and before "cat" in matching text.

...

Text

...

Does it match?

...

Why

...

The ?? in the expression represents the "at" in the text.

...

The ?? in the expression only represents two characters after "wh" not three.

Example 6. Expression using the * word wildcard

"expression": "the cat sits * on the mat"

The text must contain the phrase "the cat sits on the mat" with zero to many words between "sits" and "on".

...

Text

...

Does it match?

...

Why

...

The * in the expression requires zero to many words in its place.

...

The * in the expression represents "happily" in the text.

...

The * in the expression appears after "sits", not before.

Example 7. Expression using the ~N wildcard

"expression": "cat mat ~3"

The text must contain the words "cat" and "mat" with up to three words between them.

...

Text

...

Does it match?

...

Why

...

Example 8. Expression using the ~N wildcard

"expression": "cat sat mat ~3"

The text must contain the words "cat", "sat" and "mat" with up to three words between each of them. In this example, matching text may contain three words between "cat" and "sat" and also three words between "sat" and "mat".

...

Text

...

Does it match?

...

Why

...

Example 9. Expression using the ~N and * word wildcards

"expression": "cat * sat mat ~2"

Even when used with a ~N wildcard in an expression, a * word wildcard can represent any number of words. In this example, matching text can contain any number of words between "cat" and "sat", but a maximum of two words between "sat" and "mat".

...

Text

...

Does it match?

...

Why

...

The text contains nine words between "cat" and "sat", and two words between "sat" and "mat".

...

Substitution and redaction rules continued

Overlapping substitution and redaction rules

Overlapping occurs when more than one rule matches the same transcript text. Because substitution and redaction rules actually modify the transcript text, overlapping rules can cause a conflict whereby multiple rules try to replace text with different values. To handle overlapping, Conversation Analyzer uses the following logic when applying the rules:

  • The order of the rules in the profile determine their priority; the first rule has the highest priority.
  • If rules overlap, the higher priority rule takes precedence over the lower priority. The lower priority rule is discarded.
  • A discarded rule does not block any other lower priority rules.

...

titleExamples of overlapping rules

In all the examples, party has been removed for simplicity.

Info

Example 1. We want to replace "credit card" with "payment method" and remove credit card number.

Transcription text:

"My credit card is 1234567890123456"

Substitution rules:

Rule 1:

"find": "credit card",
"replace": "payment method"

Rule 2:

"find": "credit card #* ~5",
"replace": "(credit card information redacted)"

Intended text:

"My (credit card information redacted)"

Processed text:

"My payment method is 1234567890123456"

Why:

Rules 1 and 2 overlap. In this scenario, converastion analyzer applies rule 1—because rule 1 has higher priority—and discards rule 2. The result is that the credit card number is still exposed

Solution:

Write your redaction rules first, followed by your substitution rules.

Info

Example 2. We want to remove all strings of three or more numbers because they can contain sensitive information. However, we want to label PIN numbers differently to credit card numbers.

Transcription text:

"My PIN is 1234"

Substitution rules:

Rule 1:

"find": "###*",
"replace": "(redacted)"

Rule 2:

"find": "credit card ################ ~5",
"replace": "(credit card has been redacted)"

Rule 3:

"find": "PIN #### ~5",
"replace": "(PIN has been redacted)"

Intended text:

"My (PIN has been redacted)"

Processed text:

"My PIN is (redacted)"

Why: 

Rules 1 and 3 overlap. In this scenario, converastion analyzer applies rule 1—because rule 1 has higher priority—and discards rule 3. The result is that instead of applying the more specific rule "(PIN has been redacted)", we applied the more general one.

Solution:

Write more specific rules first, followed by more general—catch-all—rules later.

Info

Example 3. Due to the highly sensitive nature of passwords, we want to remove user account names, and wipe out the whole text containing password.

Transcription text:

"My account name is administrator and my password is Jupiter, with upper case J"

Substitution rules:

Rule 1:

"find": "account name is * ",
"replace": "(account name redacted)"

Rule 2:

"find": "* password *",
"replace": "(password redacted)"

Intended text:

"My (account name redacted) and (password redacted)"

Processed text:

"My (account name redacted)"

Why:

In this scenario, Conversation Analyzer applies rule 1, because rule 1 has higher priority than rule 2. In removing the account name, the whole of the password text is removed too. Rule 2 does not match the remaining text.

Solution: 

Write your rules in order of most sensitive to least sensitive. Avoid using operators like * and ~ as much as possible. 

Info

Example 4. For a dogwalking service, we want to improve the transcription with more accurate, business-related words.

Transcription text:

"I have a big hunting dog"

Substitution rules:

Rule 1:

"find": "big hunting dog",
"replace": "hound"

Rule 2:

"find": "I have * dog",
"replace": "I am a dog owner"

Rule 3:

"find": "have",
"replace": "look after"

Processed text:

"I look after a hound"

Why:

In this scenario, Conversation Analyzer applies rule 1. Rule 2 overlaps rule 1 so Conversation Analyzer discards rule 2. Rule 3 overlaps rule 2 only, but because Conversation Analyzer has discarded rule 2, rule 3 can be applied.

Solution:

Write your substitution rules in order of importance.

Chaining substitution and redaction rules

Chaining occurs when one rule matches the output of another rule. Chaining only occurs when you re-analyze a recording. For information about re-analyzing recordings, see Configuring Conversation Analyzer.

Each time Conversation Analyzer applies substitution rules to a transcript, Conversation Analyzer overwrites the original transcript with the processed text. Rerunning the substitution rules can therefore further refine the text.

...

titleExample of chaining rules

In the example, party has been removed for simplicity.

Info

Example: Simple case to illustrate chaining.

Original transcript text:

"I have a dog"

Substitution rules:

Rule 1:

"find": "dog",
"replace": "big cat"

Rule 2:

"find": "cat",
"replace": "mouse"

Processed text:

"I have a big cat"

Reprocessed text:

"I have a big mouse"

Why:

Rule 2 matches part the output of rule 1. On the initial processing, Conversation Analyzer applies rule 1. Conversation Analyzer overwrites the original text with the replaced text. On reprocessing, Conversation Analyzer applies rule 2.

Solution:

Write rules so that they don't apply to the output of each other to avoid chaining.

Highlighting replaced text

After Conversation Analyzer has processed a transcript, substituting or redacting text as your rules require, you are unable to see what has changed. If you want to see where in the transcript Conversation Analyzer, for example, removed text, create a category that highlights the replaced text.

Note

If you substitute text with characters that are not valid in expression values, you will not be able to create a categorization rule to highlight the text. For example, if you create a substitution rule that replaces account numbers with "*********", a categorization rule with "expression": "*********" will be invalid.

...

titleExample of highlighting replaced text

In the example, party has been removed for simplicity.

Example: We want to see where account numbers have been removed from the transcript.

Original transcript text:

"My account number is 1234567890123456"

Substitution rule:

"find": "################",
"replace": "**** **** **** ****"

Processed text:

"My account number is **** **** **** ****"

Categorization rule:

"name": "Replaced text"
[...]
"expression": "**** **** **** ****"

...

the Skip calls under field when creating a categorization profile. For more information about creating profiles, see Managing categorization profiles.

The Skip calls under parameter is a single integer—in whole seconds only—and you can configure it for each categorization profile. By default, the Skip calls under value is set to 0 resulting in Conversation Analyzer processing all calls.

Panel
borderColor#eeeeee
bgColorwhite
titleColorwhite
borderWidth1
titleBGColor#FF8053
borderStylesolid
titleIn this section

Child pages (Children Display)
depth2