Categorization rules

As part of transcribing recordings, Conversation Analyzer categorizes the textual contents of the transcript, by identifying key phrases based on the defined rules, and recording the subcategory or category those rules belong to. A category is a collection of subcategories, which in turn contain a series of rules. Each rule consists of a word or phrase and the party who said that word or phrase. If the transcript contains the word or phrase and was spoken by the specified party, Conversation Analyzer matches it against the category.

For example, you may want to track how polite your agents are when speaking with customers. Create a category of 'Politeness' that contains subcategories with rules that look for phrases such as 'Please', 'Thank you' and 'You're welcome'. You may also want to ensure that agents are promoting a new product or service. You would need to create a category for the product or service with subcategories identifying incidences of the agent using terms relating to the product or service. For information on how to create a categorization rule, see Managing categorization rules.

In this page

Categorization expression language

The categorization expression language describes the required format of the values you provide in the Expression and Find fields in Category Editor when creating categorization and substitution rules. Conversation Analyzer can then use these values to locate matching text in the transcripts. For more information, see Managing categorization rules and Managing substitution rules.

Expression and Find value validation

Valid Expression and Find field values contain only alphanumeric, apostrophe and space characters; that is, values can contain spaces (U+0020), apostrophes (U+0027), and characters from the following Unicode categories:

Unicode Category Name	Description
Ll	Letter, Lowercase. For example, a-z, ᵯ, ḅ, ṥ, ở, ﬓ
Lu	Letter, Uppercase. For example, A-Z, Ý, Ŧ, Ǣ, Щ, 𝕐
Lt	Letter, Titlecase. For example, ǅ, ᾎ, ᾟ, ᾭ
Lo	Letter, Other (e.g. ª, ܗ, 爨) The Mongolian Letter "Manchu Ali Gali Lha" (U+18AA,) is not allowed within expression and find values. This character is used internally within the categorisation engine. If the character appears within spoken text, Conversation Analyzer treats the character as an apostrophe.
Lm	Letter, Modifier. For example, ʰ, ᵓ, 〲, ꟹ
Mn	Mark, Nonspacing. For example, ុ, ᜴
Nd	Number, Decimal Digit. For example, 0-9, ۳, ૮, ๗
Pc	Punctuation, Connector. For example, _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, ＿ This category includes ten characters; the most commonly used is the LOW LINE character (_), u+005F.

Values can be no more than 100 characters long.

Replace by value validation

Values can be no more than 64 characters long.

Wildcards in values

The categorization expression language supports the following wildcards within the values. Examples refer to the Expression field you fill in when creating categorization rules, but exactly the same rules apply to the Search phrase field in substitution rules.

Wildcard	Description	Example expressions	Details
`?`	Wildcard representing one character		Each `?` represents one character.
		`wh?`	The following words will match the example expression: "who" and "why". For an example of an expression using the `?` wildcard, see Example 2. Expression using the ? character wildcard.
		`wh??`	The following words will match the example expression: "what", "when", "whom". For an example of an expression using the `??` wildcard, see Example 5. Expression using the ?? wildcard.
`*`	Wildcard representing zero to many characters	`sit*`	The following words will match the example expression: "sit", "sits", "sitting". For an example of an expression using the * wildcard, see Example 3. Expression using the character wildcard. To use `` to represent a character or characters, ensure that the `` is contiguous with the characters in the containing word. You can also use to represent a word or words. For information, see Wildcard representing zero to many words.
`#`	Wildcard representing one numeric character	`###`	Only digits will match the example expression, not text. Text containing "123" will match the example expression but text containing "one two three" will not. For an example of an expression using the `#` wildcard, see Example 4. Expression using the # character wildcard.
`*`	Wildcard representing zero to many words	`cat * mat`	The following phrases will match the example expression: "cat mat", "cat sits on the mat", and "cat always sits happily on the mat". For an example of an expression using the `` wildcard, see Example 6. Expression using the word wildcard. To use `` to represent a word or words, type a space between the `` and any other characters in the expression. You can also use * to represent a character or characters. For information, see Wildcard representing zero to many characters.

Words between value

The Words between field is available when creating categorization rules or substitution rules. It represents the number of words that can appear between the specified words in a phrase. If set to a value different than 0, the ~N expression appears at the end of the rule name in the profile tree.

If the expression contains more than two words, the Words between value applies to the number of words between any of the specified words.

See below for examples.

Expression examples

Example 1. Simple expression

Expression: the cat sat

With a simple expression, only the exact word or phrase will satisfy the rule.

Example 2. Expression using the `?` character wildcard

Expression: the cat? sat

The ? in the expression represents a single character that must appear after "cat" but before "sat" in matching text.

Text	Does it match?	Explanation
the cat sat	No	The `?` in the expression requires a character in its place.
the cats sat	Yes	The `?` in the expression represents the "s" in the text.
their cats sat	No	The expression does not allow any additional characters after "the".

Example 3. Expression using the `*` character wildcard

Expression: sit*

The * in the expression represents zero to many characters that can appear after "sit" in matching text.

Text	Does it match?	Explanation
sit	Yes	The `*` in the expression requires zero to many characters in its place.
sits	Yes	The `*` in the expression represents the "s" in the text.
sitting	Yes	The `*` in the expression represents the "ting" in the text.
sat	No	The expression requires that "sit" appears in the text.

Example 4. Expression using the `#` character wildcard

Expression: ### ###

Matching text must contain two sets of three digits, separated by a non-word character and no other characters.

Text	Does it match?	Explanation
123 456	Yes	The expression matches two sets of three digits.
123-456	Yes	The expression matches two sets of three digits. The hyphen is a non-word character and separates the two sets of three digits.
123456	No	The expression requires two sets of three digits, not one set of six.
123 abc 456	No	The expression requires two consecutive sets of three digits, not two sets separated by any other characters.

Example 5. Expression using the `??` wildcard

Expression: wh?? cat

The ?? in the expression represents two characters must appear after "wh" and before "cat" in matching text.

Text	Does it match?	Why
what cat	Yes	The `??` in the expression represents the "at" in the text.
when cat	Yes	The `??` in the expression represents the "en" in the text.
who cat	No	The `??` in the expression requires two characters after "wh" not one.
which cat	No	The `??` in the expression only represents two characters after "wh" not three.

Example 6. Expression using the `*` word wildcard

Expression: the cat sits * on the mat

The text must contain the phrase "the cat sits on the mat" with zero to many words between "sits" and "on".

Text	Does it match?	Why
the cat sits on the mat	Yes	The `*` in the expression requires zero to many words in its place.
the cat sits happily on the mat	Yes	The `*` in the expression represents "happily" in the text.
the cat always sits on the mat	No	The `*` in the expression appears after "sits", not before.

Example 7. Expression using the `Words between` field

Expression: cat mat

Words between: 3

The text must contain the words "cat" and "mat" with up to three words between them.

Text	Does it match?	Why
the cat mat	Yes	The text contains no words between "cat" and "mat" and the expression allows up to three.
the cat likes mat	Yes	The text contains one word between "cat" and "mat", and the expression allows up to three.
the cat sits on the mat	Yes	The text contains three words between "cat" and "mat", and the expression allows up to three.
the cat always sits happily on the mat	No	The text contains five words between "cat" and "mat", but the expression only allows up to three.

Example 8. Expression using the `Words between` field

Expression: cat sat mat

Words between: 3

The text must contain the words "cat", "sat" and "mat" with up to three words between each of them. In this example, matching text may contain three words between "cat" and "sat" and also three words between "sat" and "mat".

Text	Does it match?	Why
the cat eagerly sat on the mat	Yes	The text contains one word between "cat" and "sat", and two words between "sat" and "mat"; the expression allows up to three.
the cat eagerly and promptly sat on the green mat	Yes	The text contains three words between "cat" and "sat", and three words between "sat" and "mat"; the expression allows up to three.
the cat sat on the green and blue mat	No	The text contains too many words (five) between "sat" and "mat".

In this section

Managing categorization rules

Categorization rules

Categorization expression language

Expression and Find value validation

Replace by value validation

Wildcards in values

Words between value

Expression examples

Example 1. Simple expression

Example 2. Expression using the ? character wildcard

Example 3. Expression using the * character wildcard

Example 4. Expression using the # character wildcard

Example 5. Expression using the ?? wildcard

Example 6. Expression using the * word wildcard

Example 7. Expression using the Words between field

Example 8. Expression using the Words between field