Adjust Shingle Size to Define Similarity Across Multiple Attributes

What Is the Benefit Of Attribute Similarity?

Attribute similarity allows you to dictate the context of the Situation. You can combine multiple attributes that alerts should have in common to join a cluster together, and each attribute can have its own configured similarity. For some attributes you will need to do a full match. For others, you will implement fuzzy matching which allows you to configure the extent of the similarity of the unifying attribute between separate alerts.

When To Use It

As an example, consider an organization that has multiple sites, currently located in Paris and London, that are functionally separate. They want to be able to see the alerts and situations sorted by site so teams can focus on issues specific to their site. This means that they do not want to have alerts from one location clustered with alerts from the other location. In the future they intend to set up additional sites but still retain this separation.

Moogsoft Enterprise can get the site from the CMDB, however the entries are inconsistent. For example, London may be labeled "LONDON" or "LON". Some of the data was manually entered so there are some misspellings like "LonDDon". And Paris can be "Par" or "PARIS". You might think you need to normalize the data, but you can rely on a Cookbook to handle these data variances.

You can use shingle size similarity to differentiate the sites as this helps account for variances in data entry.

You can use the Moogfarmd log to establish what similarity you should set. In the current example, the Moogfarmd log snippet below shows the output from a Cookbook that evaluates the similarity between the three variants - 'LOND' and 'LONdres' are evaluated against the 'LON' value from the reference alert. From this you can extrapolate that you need at least a similarity of 0.5 to allow the three variants to drive the clustering between the corresponding alerts.

DEBUG: [7:Default Cookbook][20190326 14:44:10.476 +0000] [] +|Text [LOND] matches [LON], similarity [0.8]|+
DEBUG: [9:Default Cookbook][20190326 14:45:41.621 +0000] [] +|Text [LONdres] matches [LON], similarity [0.5]|+

You may need to run a number of tests to discover the optimal similarity value. You will need to set your similarity very low in order to allow different values to be clustered together in order to identify the similarity cutoff. For the example above initially we tried a shingle size of 2 with a similarity of 0.0.

The smaller the shingle size the more granular you allow the similarity to be performed. Use ‘Shingles’ when you have to perform comparison between single values or very short strings. If you want to perform comparison between long descriptions it is better to use ‘Words’ as Language Processing.