A categorization profile contains categories and subcategories for the Conversation Analyzer feature. Conversation Analyzer uses the profile to categorize transcripts of call recordings. The profile also contains any substitution and redaction rules you provide. Using the substitution and redaction rules, Conversation Analyzer refines the transcribed text.
The categorization profile applies to the associated account. For information about where you can view the categorized recordings and refined transcripts, see Listening to and commenting on an audio recording.
Categorization profiles are written in JavaScript Object Notation (JSON). For information about JSON, see https://www.json.org/.
Categorization profile structure
A categorization profile consists of the following top-level elements:
name
(a name/value pair)language
(a name/value pair)skipCallsUnder
(an integer)categories
(an array of category objects).
Each category consists of the following:name
(a name/value pair)rules
(an array of one or more categorization rule objects).Not used.
subcategories
(an array of one or more subcategory objects).
Each subcategory consists of the following:name
(a name/value pair)rules
(an array of one or more categorization rule objects).
Each categorization rule object consists of the following:
(a name/value pair)party
(a name/value pair)expression
subcategories
(an array of one or more nested subcategory objects).Not used.
substitution
(an array of substitution rule objects).
Each substitution rule consists of the following:- party (a name/value pair)
- find (a name/value pair)
replace
(a name/value pair)
Analyzing transcripts
Conversation Analyzer analyzes transcripts in several steps:
- Conversation Analyzer identifies characters in the transcripts. Characters are either word or non-word characters. Characters from the Unicode categories (see
expression
andfind
value validation), plus apostrophes, are word characters. Other characters are non-word characters and act as word separators. Non-word characters include !, £, $, %, ^, &, *, (, ), and -. - Conversation Analyzer uses findings from step 1 to identify the individual words in the transcripts.
- Conversation Analyzer looks for words in the transcripts that match the rules in the categorization profile:
- Conversation Analyzer applies substitution rules first, replacing text if found.
- Conversation Analyzer tags the processed transcripts with the corresponding categories if found.
name
and language
Conversation Analyzer uses a call's Language and ConversationAnalyzerProfile data source values to identify the categorization profile to use to categorize and refine the call recording. Both language
and name
need to match the Language and ConversationAnalyzerProfile data source values to identify the profile. For information about how the Language and ConversationAnalyzerProfile data sources get their values, see Overview of Conversation Analyzer.
skipCallsUnder
You can configure Conversation Analyzer to not process short calls. Short calls are those that are shorter than your configured threshold. This threshold is in seconds and is defined in the skipCallsUnder
parameter. The skipCallsUnder
parameter is a single integer—in whole seconds only—and you can configure skipCallsUnder
for each categorization profile. By default, skipCallsUnder
is 0 resulting in Conversation Analyzer processing all calls.
Categorization rules
As part of transcribing recordings, Conversation Analyzer categorizes the textual contents of the transcript, by identifying specific words and phrases that correspond to defined categories. A category is a collection of subcategories, which in turn contain a series of rules. Each rule consists of a word or phrase and the party who said that word or phrase. If the transcript contains the word or phrase and was spoken by the specified party, the transcript matches the category.
For example, you may want to track how polite your agents are when speaking with customers. Create a category of 'Politeness' that contains subcategories that look for phrases such as 'Please', 'Thank you' and 'You're welcome'. You may also want to ensure that agents are promoting a new product or service. You would need to create a category for the product or service with subcategories identifying incidences of the agent using terms relating to the product or service.
Example categorization profile
In the following example, the categorization profile—Cat_example
—contains one category—Cats
. Cats
contains two subcategories, Cat details
and Cat position
. Each of these subcategories contains three rules—one rule for each party. The substitution
array contains no rules. If more than one rule applies to some text in the transcript, that text will appear in multiple categories.
The following sections describe the party
and expression
name/value pairs.
party
party
defines the speaker who must say the word or phrase defined by the rule expression for the transcript to match the category. party
can be customer
, agent
, or either
. either
means that the rule applies to what the agent, the customer or both parties said.
The format of party is "party": "value"
where value
can be:
customer
agent
either
expression
The expression
name/value pair in a rule defines the text that must appear in the transcript to match the category.
The categorization expression language describes the format of an expression. The language supports simple expressions where the presence of the exact word or phrase would result in a match. For information about the categorization expression language, see Categorization expression language.
Substitution and redaction rules
Along with applying categorization rules to a conversation transcript, Conversation Analyzer applies substitution and redaction rules to refine the output:
- Substitution rules replace commonly mis-transcribed words and improve the spelling of words. You will most likely require these rules for proper nouns, such as place, company or product names. For example, Conversation Analyzer may transcribe 'Basingstoke' as 'Beijing spoke'. Create rules that replace the incorrect word or words.
- Redaction rules replace sensitive information such as credit card details. Redaction rules are specific type of substitution rules in that instead of using them to refine and clarify phrases in the transcript output, you use them to obscure the content. Use a redaction rule to replace specified text with text such as '(redacted)', '(removed)', or 'xxxxxxxxxxxxxx'.
Example categorization profile (substitution and redaction rules only)
In the following example, the categorization profile—Subs_example
—contains three substitution rules. The categories
array contains no rules.
The following sections describe the find
and replace
name/value pairs. For information about the party
name/value pair, see party
.
find
The find name/value pair in a rule defines the text that must appear in the transcript to match the substitution rule.
The categorization expression language describes the format of the value in the find
name/value pair. The language supports simple values where the presence of the exact word or phrase would result in a match. For information about the categorization expression language, see Categorization expression language.
replace
The replace name/value pair in a rule defines the text that will replace the found text.
Applying substitution and redaction rules result in Conversation Analyzer modifying transcript text. Because of this, you must take extra care when writing your rules. For more information about substitution rules, see Substitution and redaction rules continued.
Categorization expression language
The categorization expression language describes the required format of the values you provide in the expression
and find
name/value pairs. Conversation Analyzer can then use these values to locate matching text in the transcripts.
Use the categorization expression language to define the categorization, substitution and redaction rules.
expression
and find
value validation
Valid expression
and find
values contain only alphanumeric, apostrophe and space characters; that is, values can contain spaces (U+0020), apostrophes (U+0027), and characters from the following Unicode categories:
Unicode Category Name | Description |
---|---|
Ll | Letter, Lowercase. For example, a-z, ᵯ, ḅ, ṥ, ở, ﬓ |
Lu | Letter, Uppercase. For example, A-Z, Ý, Ŧ, Ǣ, Щ, 𝕐 |
Lt | Letter, Titlecase. For example, Dž, ᾎ, ᾟ, ᾭ |
Lo | Letter, Other (e.g. ª, ܗ, 爨) The Mongolian Letter "Manchu Ali Gali Lha" (U+18AA,) is not allowed within expression and find values. This character is used internally within the categorisation engine. If the character appears within spoken text, conversation analyzer treats the character as an apostrophe. |
Lm | Letter, Modifier. For example, ʰ, ᵓ, 〲, ꟹ |
Mn | Mark, Nonspacing. For example, ុ, ᜴ |
Nd | Number, Decimal Digit. For example, 0-9, ۳, ૮, ๗ |
Pc | Punctuation, Connector. For example, _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _ This category includes ten characters; the most commonly used is the LOW LINE character (_), u+005F. |
Values can be no more than 100 characters long.
replace
value validation
Values can be no more than 64 characters long.
Wildcards in values
The categorization expression language supports the following wildcards within the values. Examples refer to the expression
name/value pair, but exactly the same rules apply to find
name/value pairs.
Wildcard | Description | Example expressions | Details |
---|---|---|---|
? | Wildcard representing one character | Each ? represents one character. | |
wh? | The following words will match the example expression: "who" and "why". For an example of an expression using the | ||
wh?? | The following words will match the example expression: "what", "when", "whom". For an example of an expression using the ?? wildcard, see Example 5. Expression using the ?? wildcard. | ||
* | Wildcard representing zero to many characters | sit* | The following words will match the example expression: "sit", "sits", "sitting". For an example of an expression using the * wildcard, see Example 3. Expression using the * character wildcard. To use You can also use * to represent a word or words. For information, see Wildcard representing zero to many words. |
# | Wildcard representing one numeric character | ### | Only digits will match the example expression, not text. Text containing "123" will match the example expression but text containing "one two three" will not. For an example of an expression using the |
* | Wildcard representing zero to many words | cat * mat | The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits happily on the mat". For an example of an expression using the To use You can also use * to represent a character or characters. For information, see Wildcard representing zero to many characters. |
~N | Represents the number of words that can appear between the specified words in a phrase | cat mat ~4 | A phrase that contains N or fewer words between the specified words will match the example expression. The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits on the mat". For an example of an expression using the If used, the expression must appear at the end of the phrase. If the expression contains more than two words, For an example of an expression using using the |
expression
examples
Example 1. Simple expression
"expression": "the cat sat"
With a simple expression, only the exact word or phrase will satisfy the rule.
Example 2. Expression using the ?
character wildcard
"expression": "the cat? sat"
The ?
in the expression represents a single character that must appear after "cat" but before "sat" in matching text.
Text | Does it match? | Explanation |
---|---|---|
the cat sat | No | The ? in the expression requires a character in its place. |
the cats sat | Yes | The |
their cats sat | No | The expression does not allow any additional characters after "the". |
Example 3. Expression using the *
character wildcard
"expression": "sit*"
The *
in the expression represents zero to many characters that can appear after "sit" in matching text.
Text | Does it match? | Explanation |
---|---|---|
sit | Yes | The * in the expression requires zero to many characters in its place. |
sits | Yes | The |
sitting | Yes | The |
sat | No | The expression requires that "sit" appears in the text. |
Example 4. Expression using the #
character wildcard
"expression": "### ###"
Matching text must contain two sets of three digits, separated by a non-word character and no other characters.
Text | Does it match? | Explanation |
---|---|---|
123 456 | Yes | The expression matches two sets of three digits. |
123-456 | Yes | The expression matches two sets of three digits. The hyphen is a non-word character and separates the two sets of three digits. |
123456 | No | The expression requires two sets of three digits, not one set of six. |
123 abc 456 | No | The expression requires two consecutive sets of three digits, not two sets separated by any other characters. |
Example 5. Expression using the ??
wildcard
"expression": "wh?? cat"
The ??
in the expression represents two characters must appear after "wh" and before "cat" in matching text.
Text | Does it match? | Why |
---|---|---|
what cat | Yes | The |
when cat | Yes | The ?? in the expression represents the "en" in the text. |
who cat | No | The ?? in the expression requires two characters after "wh" not one. |
which cat | No | The |
Example 6. Expression using the * word wildcard
"expression": "the cat sits * on the mat"
The text must contain the phrase "the cat sits on the mat" with zero to many words between "sits" and "on".
Text | Does it match? | Why |
---|---|---|
the cat sits on the mat | Yes | The |
the cat sits happily on the mat | Yes | The |
the cat always sits on the mat | No | The |
Example 7. Expression using the ~N wildcard
"expression": "cat mat ~3"
The text must contain the words "cat" and "mat" with up to three words between them.
Text | Does it match? | Why |
---|---|---|
the cat mat | Yes | The text contains no words between "cat" and "mat" and the expression allows up to three. |
the cat likes mat | Yes | The text contains one word between "cat" and "mat", and the expression allows up to three. |
the cat sits on the mat | Yes | The text contains three words between "cat" and "mat", and the expression allows up to three. |
the cat always sits happily on the mat | No | The text contains five words between "cat" and "mat", but the expression only allows up to three. |
Example 8. Expression using the ~N wildcard
"expression": "cat sat mat ~3"
The text must contain the words "cat", "sat" and "mat" with up to three words between each of them. In this example, matching text may contain three words between "cat" and "sat" and also three words between "sat" and "mat".
Text | Does it match? | Why |
---|---|---|
the cat eagerly sat on the mat | Yes | The text contains one word between "cat" and "sat", and two words between "sat" and "mat"; the expression allows up to three. |
the cat eagerly and promptly sat on the green mat | Yes | The text contains three words between "cat" and "sat", and three words between "sat" and "mat"; the expression allows up to three. |
the cat sat on the green and blue mat | No | The text contains too many words (five) between "sat" and "mat". |
Example 9. Expression using the ~N and * word wildcards
"expression": "cat * sat mat ~2"
Even when used with a ~N
wildcard in an expression, a *
word wildcard can represent any number of words. In this example, matching text can contain any number of words between "cat" and "sat", but a maximum of two words between "sat" and "mat".
Text | Does it match? | Why |
---|---|---|
the cat sat on the mat | Yes | The text contains no words between "cat" and "sat", and two words between "sat" and "mat". |
the cat waited calmly whilst the mouse ran around and then sat on the mat | Yes | The text contains nine words between "cat" and "sat", and two words between "sat" and "mat". |
the cat always sat on the green mat | No | The text contains too many words (three) between "sat" and "mat". |
Substitution and redaction rules continued
Overlapping substitution and redaction rules
Overlapping occurs when more than one rule matches the same transcript text. Because substitution and redaction rules actually modify the transcript text, overlapping rules can cause a conflict whereby multiple rules try to replace text with different values. To handle overlapping, Conversation Analyzer uses the following logic when applying the rules:
- The order of the rules in the profile determine their priority; the first rule has the highest priority.
- If rules overlap, the higher priority rule takes precedence over the lower priority. The lower priority rule is discarded.
- A discarded rule does not block any other lower priority rules.
Chaining substitution and redaction rules
Chaining occurs when one rule matches the output of another rule. Chaining only occurs when you re-analyze a recording. For information about re-analyzing recordings, see Category Editor for Conversation Analyzer.
Each time Conversation Analyzer applies substitution rules to a transcript, Conversation Analyzer overwrites the original transcript with the processed text. Rerunning the substitution rules can therefore further refine the text.
Highlighting replaced text
After Conversation Analyzer has processed a transcript, substituting or redacting text as your rules require, you are unable to see what has changed. If you want to see where in the transcript Conversation Analyzer, for example, removed text, create a category that highlights the replaced text.
If you substitute text with characters that are not valid in expression
values, you will not be able to create a categorization rule to highlight the text. For example, if you create a substitution rule that replaces account numbers with "*********", a categorization rule with "expression": "*********"
will be invalid.