The Alert Analyzer utility is a standalone process. It uses Natural Language Processing (NLP) techniques to analyze inbound event data. The Alert Analyzer divides text fields within the events into tokens. Based on the frequency of these tokens appearing in other events, it assigns an entropy value to the tokens and to the alerts in Moogsoft Enterprise.

See Entropy for more information on how Moogsoft Enterprise evaluates entropy and uses entropy thresholds to reduce the level of 'noise' from incoming event data.

See Configure Entropy Generation Schedule and Configure Entropy Thresholds with Alert Analyzer for information on how to use Alert Analyzer features in the Moogsoft Enterprise UI.

## Natural language processing analysis

The Alert Analyzer utility performs a number of linguistic analyses on events entering Moogsoft Enterprise. It then uses this linguistic analysis to calculate an entropy value for each token and then for every alert. See Entropy for more information.

## Tokenization of text

The Alert Analyzer splits a text string at word boundaries, such as spaces or punctuation marks, into blocks. Each block of text is known as a token. For example, the following description has five tokens:

Link down on port 2/32

## Token type identification

Commonly used word boundaries are often integral to the meaning of a token, for example, dots in IPV4 addresses. The Alert Analyzer identifies complete tokens of the following types within the structure of an event:

• v4

• v6

• OIDs

• Dates: Most standard formats.

• Numbers:

• Integers

• Real numbers

• With and without unit suffixes, for example, 99%, 12kb, 345ms.

• File paths:

• Forward slashes

• Backward slashes

• GUIDs

• Hexadecimal numbers: With the 0x prefix.

• URLs

• Email addresses: Most standard formats.

Identifying token types in arbitrary text is not an exact science and so, occasionally, the algorithms may identify tokens as a certain type which seems incorrect to a human.

After the Alert Analyzer has identified the token types, it can use them for masking and to identify tokens with high variation in a given alert.

Tokens that change between events for the same alert can cause that alert to be assigned an incorrectly high entropy value. The most obvious example involves dates and times. If the description of an event is to be analyzed but each event contains a different timestamp, that timestamp will have a high entropy and skew the entropy for that alert as a whole. For other token types that change frequently, such as URLs or IP addresses, it may be desirable to retain the higher entropy associated with that token type because the changing value is significant.

You can configure the Alert Analyzer to include or exclude specific token types in the entropy analysis for each event partition.

You should consider masking dates, times and numbers from the entropy calculation.

## Language processing techniques

The Alert Analyzer uses many standard techniques in language processing:

• Case folding

• Tokens that differ only by case, for example, 'WORD', 'Word' or 'word', are converted to the same case and considered equal.

• Case folding is applied to all token types.

• Stop words

• You can add common or meaningless words, such as 'a', 'be', 'not', to a stop words file so that they are removed from the entropy calculation.

• You can define a universal 'length' parameter so that any word at or below a certain length is treated as a stop word. For example, if set to '2', any words of one or two characters are ignored.

• Stop words are applied to all token types.

• Stemming

• A technique used to reduce a word to its root to remove plurals or different tenses in verbs. Words with the same root are considered equal.

• Note that some words, when stemmed, look unusual. For example, 'priority', 'priorities', prioritize, get stemmed to 'priorit'.

• If stemming is enabled, the stemmed form is stored in the reference database.

• Stemming is only applied to tokens of type 'word', that is, it is not applied to numbers, GUIDs, IP addresses, etc.

## Priority words

Priority words are similar in concept to stop words but, rather than removing that word from the analysis as occurs with stop words, a priority word is assigned an entropy value of 1. For example, if ‘reboot’ is defined as a priority word, any tokens containing the word ‘reboot’ are given an entropy value of 1 regardless of how frequently the word appears in events.

• Priority words are analyzed after stop words. If a token satisfies the criteria of a stop word, it is removed from the analysis and so cannot subsequently be considered as a priority word.

• The reference database contains the calculated entropies for all tokens regardless of whether they are classed as priority words.

## Partition-based analysis

You can configure the Alert Analyzer so that it calculates the entropy values for events for different partitions. As an example, you may want to run separate entropy calculations for different regions. In this type of configuration, the same token can be given multiple entropy values within the same Moogfarmd deployment based on its frequency in the events within each partition. You can set up different configuration options for the different partitions. For example, in a particular partition, IP addresses may be masked whilst for another partition that may be unnecessary. In general, if a deployment uses the “pre-partition” method in Moogfarmd, that deployment benefits from partition-based entropy calculations.

See Configure Entropy Generation Details for further information on the Graze API endpoints you can use to configure partitions in the Alert Analyzer.