A categorization profile contains categories and subcategories for the Conversation Analyzer feature. Conversation Analyzer uses the profile to categorize transcripts of call recordings. The profile also contains any substitution and redaction rules you provide. Using the substitution and redaction rules, Conversation Analyzer refines the transcribed text.
The categorization profile applies to the associated account. For information about where you can view the categorized recordings and refined transcripts, see Listening to, viewing, and commenting on a call recordinginteraction content.
Categorization profiles are written in JavaScript Object Notation (JSON). For information about JSON, see https://www.json.org/.
Categorization profile structure
A categorization profile consists of the following top-level elements:
...
rules
(an array of one or more categorization rule objects).
Info |
---|
Not used. |
...
(a name/value pair)party
(a name/value pair)expression
subcategories
(an array of one or more nested subcategory objects).
Info |
---|
Not used. |
...
- party (a name/value pair)
- find (a name/value pair)
replace
(a name/value pair)
...
Supervisors and administrators can manage profiles using Category Editor.
Info |
---|
Supervisors and administrators can export to, restore from or create categorization profiles using JSON. However, we recommend using Category Editor for these tasks. |
Panel | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
|
...
Note | ||
---|---|---|
| ||
Conversation Analyzer analyzes transcripts in several steps:
|
Profile name and language
Conversation Analyzer uses a call's Language and ConversationAnalyzerProfile data source values to identify the categorization profile to use to categorize and refine the call recording. Both language
and name
. To set the profile's name and language in Category Editor, use the Profile name field and the Language drop-down when creating a categorization profile. Both the Language and Profile name field values need to match the Language and ConversationAnalyzerProfile data source values to identify the profile. For information about how the Language and ConversationAnalyzerProfile data sources get their values, see Overview of Conversation Analyzer.
Categorization rules
As part of transcribing recordings, Conversation Analyzer categorizes the textual contents of the transcript, by identifying specific words and phrases that correspond to defined categories. A category is a collection of subcategories, which in turn contain a series of rules. Each rule consists of a word or phrase and the party who said that word or phrase. If the transcript contains the word or phrase and was spoken by the specified party, the transcript matches the category.
For example, you may want to track how polite your agents are when speaking with customers. Create a category of 'Politeness' that contains subcategories that look for phrases such as 'Please', 'Thank you' and 'You're welcome'. You may also want to ensure that agents are promoting a new product or service. You would need to create a category for the product or service with subcategories indentifying incidences of the agent using terms relating to the product or service.
Note |
---|
Conversation Analyzer applies categorization rules to processed transcripts—text that Conversation Analyzer has applied substitution rules to—rather than the original text. Keep this in mind when you create your categories. |
Example categorization profile
In the following example, the categorization profile—Cat_example
—contains one category—Cats
. Cats
contains two subcategories, Cat details
and Cat position
. Each of these subcategories contains two rules—one rule for each party. The substitution
array contains no rules. If more than one rule applies to some text in the transcript, that text will appear in multiple categories.
...
Code Block | ||
---|---|---|
| ||
{
"name": "Cat_example",
"language": "en-us",
"categories": [
{
"name": "Cats",
"rules": [],
"subcategories": [
{
"name": "Cat position",
"rules": [
{
"party": "customer",
"expression": "cat * sat mat ~2"
},
{
"party": "agent",
"expression": "cat mat ~3"
}
],
"subcategories": []
},
{
"name": "Cat details",
"rules": [
{
"party": "customer",
"expression": "cat is ## years old"
},
{
"party": "agent",
"expression": "your cat?"
}
],
"subcategories": []
}
]
}
]
"substitution": []
} |
The following sections describe the party
and expression
name/value pairs.
party
party
defines the party who must say the word or phrase defined by the rule expression for the transcript to match the category. Party can be customer or agent.
The format of party is "party": "value"
where value
can be:
customer
agent
expression
The expression
name/value pair in a rule defines the text that must appear in the transcript to match the category.
The categorization expression language describes the format of an expression. The language supports simple expressions where the presence of the exact word or phrase would result in a match. For information about the categorization expression language, see Categorization expression language.
Substitution and redaction rules
Along with applying categorization rules to a conversation transcript, Conversation Analyzer applies substitution and redaction rules to refine the output:
- Substitution rules replace commonly mis-transcribed words and improve the spelling of words. You will most likely require these rules for proper nouns, such as place, company or product names. For example, Conversation Analyzer may transcribe 'Basingstoke' as 'Beijing spoke', or 'NewVoiceMedia' as 'new voice media'. Create rules that replace the incorrect word or words.
- Redaction rules replace sensitive information such as credit card details. Redaction rules are specific type of substitution rules in that instead of using them to refine and clarify phrases in the transcript output, you use them to obscure the content. Use a redaction rule to replace specified text with text such as '(redacted)', '(removed)', or 'xxxxxxxxxxxxxx'.
Example categorization profile (substitution and redaction rules only)
In the following example, the categorization profile—Subs_example
—contains three substitution rules. The categories
array contains no rules.
...
Code Block | ||
---|---|---|
| ||
{
"name": "Subs_example",
"language": "en-us",
"categories": [ ]
"substitution": [
{
"party": "agent",
"find": "new voice media",
"replace": "NewVoiceMedia"
},
{
"party": "customer",
"find": "Beijing spoke",
"replace": "Basingstoke"
},
{
"party": "customer",
"find": "my card number is *",
"replace": "xxxx xxxx xxxx xxxx"
}
]
} |
The following sections describe the find
and replace
name/value pairs. For information about the party
name/value pair, see party
.
find
The find name/value pair in a rule defines the text that must appear in the transcript to match the substitution rule.
The categorization expression language describes the format of the value in the find
name/value pair. The language supports simple values where the presence of the exact word or phrase would result in a match. For information about the categorization expression language, see Categorization expression language.
replace
The replace name/value pair in a rule defines the text that will replace the found text.
Applying substitution and redaction rules result in Conversation Analyzer modifying transcript text. Because of this, you must take extra care when writing your rules. For more information about substitution rules, see Substitution and redaction rules continued.
Categorization expression language
The categorization expression language describes the required format of the values you provide in the expression
and find
name/value pairs. Conversation Analyzer can then use these values to locate matching text in the transcripts.
Use the categorization expression language to define the categorization, substitution and redaction rules.
expression
and find
value validation
Valid expression
and find
values contain only alphanumeric, apostrophe and space characters; that is, values can contain spaces (U+0020), apostrophes (U+0027), and characters from the following Unicode categories:
Values can be no more than 100 characters long.
Wildcards in values
The categorization expression language supports the following wildcards within the values. Examples refer to the expression
name/value pair, but exactly the same rules apply to find
name/value pairs.
...
...
...
...
The following words will match the example expression: "who" and "why". For an example of an expression using the ?
wildcard, see Example 2. Expression using the ? character wildcard.
...
The following words will match the example expression: "sit", "sits", "sitting". For an example of an expression using the * wildcard, see Example 3. Expression using the * character wildcard.
Note |
---|
To use You can also use * to represent a word or words. For information, see Wildcard representing zero to many words. |
...
Only digits will match the example expression, not text.
Text containing "123" will match the example expression but text containing "one two three" will not.
For an example of an expression using the #
wildcard, see Example 4. Expression using the # character wildcard.
...
The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits happily on the mat".
For an example of an expression using the *
wildcard, see Example 6. Expression using the * word wildcard.
Note |
---|
To use You can also use * to represent a character or characters. For information, see Wildcard representing zero to many characters. |
...
A phrase that contains N or fewer words between the specified words will match the example expression.
The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits on the mat".
For an example of an expression using the ~N
wildcard, see Example 7. Expression using the ~N wildcard.
Note |
---|
If the expression contains more than two words, For an example of an expression using using the |
expression
examples
Example 1. Simple expression
"expression": "the cat sat"
With a simple expression, only the exact word or phrase will satisfy the rule.
Example 2. Expression using the ?
character wildcard
"expression": "the cat? sat"
The ?
in the expression represents a single character that must appear after "cat" but before "sat" in matching text.
...
...
...
...
The ?
in the expression represents the "s" in the text.
...
The expression does not allow any additional characters after "the".
Example 3. Expression using the *
character wildcard
"expression": "sit*"
The *
in the expression represents zero to many characters that can appear after "sit" in matching text.
...
...
...
...
The *
in the expression represents the "s" in the text.
...
The *
in the expression represents the "ting" in the text.
...
Example 4. Expression using the #
character wildcard
"expression": "### ###"
Matching text must contain two sets of three digits, separated by a non-word character and no other characters.
...
...
...
...
Example 5. Expression using the ??
wildcard
"expression": "wh?? cat"
The ??
in the expression represents two characters must appear after "wh" and before "cat" in matching text.
...
...
...
...
The ??
in the expression represents the "at" in the text.
...
The ??
in the expression only represents two characters after "wh" not three.
Example 6. Expression using the * word wildcard
"expression": "the cat sits * on the mat"
The text must contain the phrase "the cat sits on the mat" with zero to many words between "sits" and "on".
...
...
...
...
The *
in the expression requires zero to many words in its place.
...
The *
in the expression represents "happily" in the text.
...
The *
in the expression appears after "sits", not before.
Example 7. Expression using the ~N wildcard
"expression": "cat mat ~3"
The text must contain the words "cat" and "mat" with up to three words between them.
...
...
...
...
Example 8. Expression using the ~N wildcard
"expression": "cat sat mat ~3"
The text must contain the words "cat", "sat" and "mat" with up to three words between each of them. In this example, matching text may contain three words between "cat" and "sat" and also three words between "sat" and "mat".
...
...
...
...
Example 9. Expression using the ~N and * word wildcards
"expression": "cat * sat mat ~2"
Even when used with a ~N
wildcard in an expression, a *
word wildcard can represent any number of words. In this example, matching text can contain any number of words between "cat" and "sat", but a maximum of two words between "sat" and "mat".
...
...
...
...
The text contains nine words between "cat" and "sat", and two words between "sat" and "mat".
...
Substitution and redaction rules continued
Overlapping substitution and redaction rules
Overlapping occurs when more than one rule matches the same transcript text. Because substitution and redaction rules actually modify the transcript text, overlapping rules can cause a conflict whereby multiple rules try to replace text with different values. To handle overlapping, Conversation Analyzer uses the following logic when applying the rules:
- The order of the rules in the profile determine their priority; the first rule has the highest priority.
- If rules overlap, the higher priority rule takes precedence over the lower priority. The lower priority rule is discarded.
- A discarded rule does not block any other lower priority rules.
...
title | Examples of overlapping rules |
---|
In all the examples, party
has been removed for simplicity.
Info |
---|
Example 1. We want to replace "credit card" with "payment method" and remove credit card number. Transcription text: "My credit card is 1234567890123456" Substitution rules: Rule 1:
Rule 2:
Intended text: "My (credit card information redacted)" Processed text: "My payment method is 1234567890123456" Why: Rules 1 and 2 overlap. In this scenario, converastion analyzer applies rule 1—because rule 1 has higher priority—and discards rule 2. The result is that the credit card number is still exposed Solution: Write your redaction rules first, followed by your substitution rules. |
Info |
---|
Example 2. We want to remove all strings of three or more numbers because they can contain sensitive information. However, we want to label PIN numbers differently to credit card numbers. Transcription text: "My PIN is 1234" Substitution rules: Rule 1:
Rule 2:
Rule 3:
Intended text: "My (PIN has been redacted)" Processed text: "My PIN is (redacted)" Why: Rules 1 and 3 overlap. In this scenario, converastion analyzer applies rule 1—because rule 1 has higher priority—and discards rule 3. The result is that instead of applying the more specific rule "(PIN has been redacted)", we applied the more general one. Solution: Write more specific rules first, followed by more general—catch-all—rules later. |
Info |
---|
Example 3. Due to the highly sensitive nature of passwords, we want to remove user account names, and wipe out the whole text containing password. Transcription text: "My account name is administrator and my password is Jupiter, with upper case J" Substitution rules: Rule 1:
Rule 2:
Intended text: "My (account name redacted) and (password redacted)" Processed text: "My (account name redacted)" Why: In this scenario, Conversation Analyzer applies rule 1, because rule 1 has higher priority than rule 2. In removing the account name, the whole of the password text is removed too. Rule 2 does not match the remaining text. Solution: Write your rules in order of most sensitive to least sensitive. Avoid using operators like * and ~ as much as possible. |
Info |
---|
Example 4. For a dogwalking service, we want to improve the transcription with more accurate, business-related words. Transcription text: "I have a big hunting dog" Substitution rules: Rule 1:
Rule 2:
Rule 3:
Processed text: "I look after a hound" Why: In this scenario, Conversation Analyzer applies rule 1. Rule 2 overlaps rule 1 so Conversation Analyzer discards rule 2. Rule 3 overlaps rule 2 only, but because Conversation Analyzer has discarded rule 2, rule 3 can be applied. Solution: Write your substitution rules in order of importance. |
Chaining substitution and redaction rules
Chaining occurs when one rule matches the output of another rule. Chaining only occurs when you re-analyze a recording. For information about re-analyzing recordings, see Configuring Conversation Analyzer.
Each time Conversation Analyzer applies substitution rules to a transcript, Conversation Analyzer overwrites the original transcript with the processed text. Rerunning the substitution rules can therefore further refine the text.
...
title | Example of chaining rules |
---|
In the example, party
has been removed for simplicity.
Info |
---|
Example: Simple case to illustrate chaining. Original transcript text: "I have a dog" Substitution rules: Rule 1:
Rule 2:
Processed text: "I have a big cat" Reprocessed text: "I have a big mouse" Why: Rule 2 matches part the output of rule 1. On the initial processing, Conversation Analyzer applies rule 1. Conversation Analyzer overwrites the original text with the replaced text. On reprocessing, Conversation Analyzer applies rule 2. Solution: Write rules so that they don't apply to the output of each other to avoid chaining. |
Highlighting replaced text
After Conversation Analyzer has processed a transcript, substituting or redacting text as your rules require, you are unable to see what has changed. If you want to see where in the transcript Conversation Analyzer, for example, removed text, create a category that highlights the replaced text.
Note |
---|
If you substitute text with characters that are not valid in |
...
title | Example of highlighting replaced text |
---|
In the example, party
has been removed for simplicity.
Example: We want to see where account numbers have been removed from the transcript.
Original transcript text:
"My account number is 1234567890123456"
Substitution rule:
"find": "################",
"replace": "**** **** **** ****"
Processed text:
"My account number is **** **** **** ****"
Categorization rule:
"name": "Replaced text"
"expression": "**** **** **** ****"
[...]
...
information about creating profiles, see Managing categorization profiles.
Skip calls under
You can configure Conversation Analyzer to not process short calls. Short calls are those that are shorter than your configured threshold. This threshold is in seconds and is defined in the Skip calls under field when creating a categorization profile. For more information about creating profiles, see Managing categorization profiles.
The Skip calls under parameter is a single integer—in whole seconds only—and you can configure it for each categorization profile. By default, the Skip calls under value is set to 0 resulting in Conversation Analyzer processing all calls.
Panel | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
|