Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A categorization profile contains categories and subcategories for the Conversation Analyzer feature. Conversation Analyzer uses the profile to categorize transcripts of call recordings. The profile also contains any substitution and redaction rules you provide. Using the substitution and redaction rules, Conversation Analyzer refines the transcribed text.

The categorization profile applies to the associated account. For information about where you can view the categorized recordings and refined transcripts, see Listening to and commenting on a call recording.

...

  • name (a name/value pair)
  • language (a name/value pair)
  • categories (an array of category objects).
    Each category consists of the following:
    • name (a name/value pair)
    • rules (an array of one or more categorization rule objects).

      Each categorization rule object consists of the following:
    • party (a name/value pair)
    • expression (a name/value pair)
      Info

      Not used.


    • subcategories (an array of one or more subcategory objects).
      Each subcategory consists of the following:
      • name (a name/value pair)
      • rules (an array of one or more categorization rule objects).
        Each categorization rule object consists of the following:
        • party (a name/value pair)
        • expression (a name/value pair)
      • subcategories (an array of one or more nested subcategory objects).

        Info

        Not used.


    For more information about categorization rules, see Categorization rules.
  • substitution (an array of substitution rule objects).
    Each substitution rule consists of the following:
    • party (a name/value pair)
    • find (a name/value pair)
    • replace (a name/value pair)
    For more information about substitution rules, see Substitution and redaction rules.

...

As part of transcribing recordings, Conversation Analyzer categorizes the textual contents of the transcript, by identifying specific words and phrases that correspond to defined categories. A category is a collection of rules, with each rule consisting subcategories, which in turn contain a series of rules. Each rule consists of a word or phrase and the party who said that word or phrase. If the transcript contains the word or phrase and was spoken by the specified party, the transcript matches the category.

For example, you may want to track how polite your agents are when speaking with customers. Create a category of 'Politeness' that looks contains subcategories that look for phrases such as 'Please', 'Thank you' and 'You're welcome'. You may also want to ensure that agents are promoting a new product or service. You would need to create a specific category that identifies category for the product or service with subcategories indentifying incidences of the agent saying using terms relating to the product 's or service's name.

Note
Conversation Analyzer applies categorization rules to processed transcripts—text that Conversation Analyzer has applied substitution rules to—rather than the original text. Keep this in mind when you create your categories.

Example categorization profile

...

In the following example, the categorization profile—Cat_example—contains one category—Cat detailsCatsCats contains two subcategories, Cat details contains two rules—one rule for each party—and one subcategory— and Cat position.  Cat position also Each of these subcategories contains two rules—one rule for each party. The substitution array contains no rules. If more than one rule applies to some text in the transcript, that text will appear in multiple categories.

Expand


Code Block
languagejava
{
    "name": "Cat_example",
	"language": "en-us",
    "categories": [
        {
			"name": "Cats",
			"rules": [],
			"subcategories": [
      				{
					"name": "Cat detailsposition",
                    "rules": [	
				{
                        {
                            "party": "customer",
                            "expression": "cat * sat mat ~2"
                        },
                        {
                            "party": "customeragent",
                            "expression": "cat is ## years oldmat ~3"
                },        }
        {            ],
        "party": "agent",           "subcategories": []
        "expression": "your cat?"      },
          }
			],
			"subcategories": [
				{
					"name": "Cat position",      {
                    "rulesname": ["Cat details",
                    "rules": [	
				        {
                            "party": "customer",
                            "expression": "cat *is sat## matyears ~2old"
                        },
                        {
                            "party": "agent",
                            "expression": "cat mat ~3your cat?"
                        }
			                    ],
                    "subcategories": []
                },
			]
		}
    ]
    "substitution": []
}


...

Valid expression and find values contain only alphanumeric, apostrophe and space characters; that is, values can contain spaces (U+0020), apostrophes (U+0027), and characters from the following Unicode categories:

Values can be no more than 100 characters long.

...

Expand
titleExamples of overlapping rules

In all the examples, party has been removed for simplicity.

Info

Example 1. We want to replace "credit card" with "payment method" and remove credit card number.

Transcription text:

"My credit card is 1234567890123456"

Substitution rules:

Rule 1:

"find": "credit card",
"replace": "payment method"

Rule 2:

"find": "credit card #* ~5",
"replace": "(credit card information redacted)"

Intended text:

"My (credit card information redacted)"

Processed text:

"My payment method is 1234567890123456"

Why:

Rules 1 and 2 overlap. In this scenario, Conversation Analyzer converastion analyzer applies rule 1—because rule 1 has higher priority—and discards rule 2. The result is that the credit card number is still exposed

Solution:

Write your redaction rules first, followed by your substitution rules.


Info

Example 2. We want to remove all strings of three or more numbers because they can contain sensitive information. However, we want to label PIN numbers differently to credit card numbers.

Transcription text:

"My PIN is 1234"

Substitution rules:

Rule 1:

"find": "###*",
"replace": "(redacted)"

Rule 2:

"find": "credit card ################ ~5",
"replace": "(credit card has been redacted)"

Rule 3:

"find": "PIN #### ~5",
"replace": "(PIN has been redacted)"

Intended text:

"My (PIN has been redacted)"

Processed text:

"My PIN is (redacted)"

Why: 

Rules 1 and 3 overlap. In this scenario, Conversation Analyzer converastion analyzer applies rule 1—because rule 1 has higher priority—and discards rule 3. The result is that instead of applying the more specific rule "(PIN has been redacted)", we applied the more general one.

Solution:

Write more specific rules first, followed by more general—catch-all—rules later.


Info

Example 3. Due to the highly sensitive nature of passwords, we want to remove user account names, and wipe out the whole text containing password.

Transcription text:

"My account name is administrator and my password is Jupiter, with upper case J"

Substitution rules:

Rule 1:

"find": "account name is * ",
"replace": "(account name redacted)"

Rule 2:

"find": "* password *",
"replace": "(password redacted)"

Intended text:

"My (account name redacted) and (password redacted)"

Processed text:

"My (account name redacted)"

Why:

In this scenario, Conversation Analyzer applies rule 1, because rule 1 has higher priority than rule 2. In removing the account name, the whole of the password text is removed too. Rule 2 does not match the remaining text.

Solution: 

Write your rules in order of most sensitive to least sensitive. Avoid using operators like * and ~ as much as possible. 


Info

Example 4. For a dogwalking service, we want to improve the transcription with more accurate, business-related words.

Transcription text:

"I have a big hunting dog"

Substitution rules:

Rule 1:

"find": "big hunting dog",
"replace": "hound"

Rule 2:

"find": "I have * dog",
"replace": "I am a dog owner"

Rule 3:

"find": "have",
"replace": "look after"

Processed text:

"I look after a hound"

Why:

In this scenario, Conversation Analyzer applies rule 1. Rule 2 overlaps rule 1 so Conversation Analyzer discards rule 2. Rule 3 overlaps rule 2 only, but because Conversation Analyzer has discarded rule 2, rule 3 can be applied.

Solution:

Write your substitution rules in order of importance.


...