Categorization rules

Categorization rules

As part of transcribing recordings, Conversation Analyzer categorizes the textual contents of the transcript, by identifying key phrases based on the defined rules, and recording the subcategory or category those rules belong to. A category is a collection of subcategories, which in turn contain a series of rules. Each rule consists of a word or phrase and the party who said that word or phrase. If the transcript contains the word or phrase and was spoken by the specified party, Conversation Analyzer matches it against the category.

For example, you may want to track how polite your agents are when speaking with customers. Create a category of 'Politeness' that contains subcategories with rules that look for phrases such as 'Please', 'Thank you' and 'You're welcome'. You may also want to ensure that agents are promoting a new product or service. You would need to create a category for the product or service with subcategories identifying incidences of the agent using terms relating to the product or service. For information on how to create a categorization rule, see Managing categorization rules.

In this page

Categorization expression language

The categorization expression language describes the required format of the values you provide in the Expression and Find fields in Category Editor when creating categorization and substitution rules. Conversation Analyzer can then use these values to locate matching text in the transcripts. For more information, see Managing categorization rules and Managing substitution rules

Expression and Find value validation

Valid Expression and Find field values contain only alphanumeric, apostrophe and space characters; that is, values can contain spaces (U+0020), apostrophes (U+0027), and characters from the following Unicode categories:

Unicode Category Name

Description

Unicode Category Name

Description

Ll

Letter, Lowercase.

For example, a-z, ᵯ, ḅ, ṥ, ở, ﬓ

Lu

Letter, Uppercase.

For example, A-Z, Ý, Ŧ, Ǣ, Щ, 𝕐

Lt

Letter, Titlecase.

For example, Dž, ᾎ, ᾟ, ᾭ

Lo

Letter, Other (e.g. ª, ܗ, 爨)

The Mongolian Letter "Manchu Ali Gali Lha" (U+18AA,) is not allowed within expression and find values. This character is used internally within the categorisation engine. If the character appears within spoken text, Conversation Analyzer treats the character as an apostrophe.

Lm

Letter, Modifier.

For example, ʰ, ᵓ, 〲, ꟹ

Mn

Mark, Nonspacing.

For example, ុ, ᜴

Nd

Number, Decimal Digit.

For example, 0-9, ۳, ૮, ๗

Pc

Punctuation, Connector.

For example, _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _

This category includes ten characters; the most commonly used is the LOW LINE character (_), u+005F.

Values can be no more than 100 characters long.

Replace by value validation

Values can be no more than 64 characters long.

Wildcards in values

The categorization expression language supports the following wildcards within the values. Examples refer to the Expression field you fill in when creating categorization rules, but exactly the same rules apply to the Search phrase field in substitution rules.

Wildcard

Description

Example expressions

Details

Wildcard

Description

Example expressions

Details

?

Wildcard representing one character



Each ? represents one character.

wh?

The following words will match the example expression: "who" and "why". For an example of an expression using the ? wildcard, see Example 2. Expression using the ? character wildcard.

wh??

The following words will match the example expression: "what", "when", "whom". For an example of an expression using the ?? wildcard, see Example 5. Expression using the ?? wildcard.

*

Wildcard representing zero to many characters

sit*

The following words will match the example expression: "sit", "sits", "sitting". For an example of an expression using the * wildcard, see Example 3. Expression using the * character wildcard.

To use * to represent a character or characters, ensure that the * is contiguous with the characters in the containing word.

You can also use * to represent a word or words. For information, see Wildcard representing zero to many words.

#

Wildcard representing one numeric character 

###

Only digits will match the example expression, not text.

Text containing "123" will match the example expression but text containing "one two three" will not.

For an example of an expression using the # wildcard, see Example 4. Expression using the # character wildcard.

*

Wildcard representing zero to many words

cat * mat

The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits happily on the mat".

For an example of an expression using the * wildcard, see Example 6. Expression using the * word wildcard.

To use * to represent a word or words, type a space between the * and any other characters in the expression.

You can also use * to represent a character or characters. For information, see Wildcard representing zero to many characters.

Words between value

The Words between field is available when creating categorization rules or substitution rules. It represents the number of words that can appear between the specified words in a phrase. If set to a value different than 0, the ~N expression appears at the end of the rule name in the profile tree.

If the expression contains more than two words, the Words between value applies to the number of words between any of the specified words.

See below for examples.

Expression examples 

Support and documentation feedback

For general assistance, please contact Customer Support.

For help using this documentation, please send an email to docs_feedback@vonage.com. We're happy to hear from you. Your contribution helps everyone at Vonage! Please include the name of the page in your email.