Moogsoft Docs

Situation Design Workflow

Good Situation design is fundamental to getting the most out of Moogsoft AIOps for your organization. This topic covers some strategies to help you get the most out of Situation design.

You should follow these steps to ensure that you maximize your Situation design:

Before you begin

In order for you to fully understand the topics in this section, you need to have:

Conduct discovery sessions

The first steps of Situation design are to identify the teams that will be using Moogsoft AIOps, and interview the operators within these teams about the information they need to see in Situations and the corresponding operational workflow.

Your goals for the discovery sessions are to:

  • Identify the content and context necessary to design Situations:

    • Content: The alerts clustered into a Situation.

    • Context: The interpretation of that list of alerts, such as

    content-context.png
  • Consider how you might label the Situations and ask clarifying questions during the discovery sessions. The criteria for defining "good" descriptions, like the criteria for "good" data and "good" Situation design, is highly dependent on the specific needs of your organization and operators. Always consult with your operators and users when planning and maintaining your deployments.

    See Situation Manager Labeler for more information on labeling Situations.

  • Identify the enrichment requirements. This will help with diagnostic activities, such as ticketing and team assignment, which will in turn help Situation design.

    See Enrichment for more information.

Design the clustering models

Now that you know the types of Situations your operators want to see, and the data you need to produce them, step 2 of the process is to design the clustering models.

In Moogsoft AIOps you can cluster alerts based on the following:

  • Time

  • Event payload

  • Custom data added during enrichment

  • Training - topology, user-defined rules, alert classification, or other domain knowledge pushed back into the system

In the majority of cases you will be using Cookbook to cluster alerts. This is a context-driven method to cluster alerts based on certain attribute similarities between individual alerts. A Cookbook can contain multiple clustering Recipes, each of which can analyze the input stream and create Situations based on specific scenarios.

See Clustering Algorithm Guide for more information on the clustering algorithms in Moogsoft AIOps.

Configure the clustering algorithms

You have audited the operators to identify the needs, and designed a model to produce the Situations to meet the needs. Now implement the architected solution so you can review the results with the stakeholders.

Watch the alert clustering section of the Implementer training below if you want a reminder of how to configure the clustering algorithm.

Moogsoft AIOps v7.1 Self Paced Implementer Training - Alert Clustering [55 min]

AlertClusteringVideo.png

See Configure Cookbooks and Recipes for details.

Use entropy to filter out unimportant alerts

You can use entropy to control clustering in Cookbooks.

Entropy is a measure of how unexpected or unpredictable an alert is. 

Moogsoft AIOps assigns every alert an Entropy score that is a value between 0 and 1. An event that re-occurs frequently receives a low entropy score, and is deemed operationally insignificant. Meanwhile, a more rare event receives a high score, and is considered to be operationally significant.

Entropy is a key noise reduction feature. By filtering out the alerts with low entropy score, you can keep the important alerts from getting buried under the flood of common alerts.

Moogsoft AIOps analyzes the textual aspects of the incoming event by tokenizing the value of the description field. Items such as numbers and timestamps are masked and therefore excluded from the entropy calculation, and the score is derived based on the aggregation of token entropies from within the string. The entropy score is calculated up to 16 decimal places. However, note that in the Cookbook UI, you can configure the entropy threshold value only up to 2 decimal places.

See Entropy Overview for more information about entropy in Moogsoft AIOps.

Cluster alerts by combining multiple attributes

Attribute similarity allows you to dictate the context of the situation. You can combine multiple attributes that alerts should have in common to be cluster together, and each attribute can have its own configured similarity. For some attributes you will be required to do a full match. For other you will be implementing a fuzzy matching which allows for finding the unifying attribute element between separate alerts up to a configurable extent.

As an example, consider an organization that has multiple sites, at the moment located in Paris and London, that are functionally separate. They want to be able to see the alerts and situations sorted by site so teams can focus on issues specific to their site. This means that they don't want to have alerts from one location clustered with alerts from the other location. In the future they intend to setup additional sites but still retain the view separation.

We can get the site from the CMDB, however the entries are inconsistent. For example London may be labelled "LONDON" or "LON". Some of the data was manually entered so, there are some misspellings like "LonDDon". And Paris can be "Par" or "PARIS". You might think you need to normalize the data, but you can rely on a Cookbook to handle these data variances.

We can use shingle size similarity to differentiate the sites as this helps account for variances in data entry.

See Recipe Types for more information on shingle size.

Decide between First Match and Closest Match in your Recipes

In a Cookbook or a Recipe, you can specify cluster matching by either First Match or Closest Match. If you select First Matching Cluster, Cookbook adds each alert to the first cluster in a Recipe that meets the clustering criteria. If you select Closest Matching Cluster, Cookbook adds each alert to the cluster with the highest similarity. The second option might be less efficient because it needs to compare alerts against each cluster in a Recipe.

If a Recipe has all of the attribute similarities set to 100% match, then when an alert matches a Recipe, there is no need to keep checking it against other Recipes. In this case you can set 'Cluster By' as First Matching Cluster. Otherwise, choose Closest Matching Cluster to evaluate an alert by all Recipes to determine the best match.

You can set this behavior at Recipe level or at Cookbook level. See Cluster By in Configure a Cookbook Recipe for setting cluster matching at Recipe level, and in Configure a Cookbook for setting it in a Cookbook. See also the Graze API EndPoint Reference for setting cluster matching using the Graze API.

Create a Situation if a key alert arrives

You can use a seed alert in a Recipe to disregard certain alerts until the associated key alert happens.

A seed alert is useful to create Situations for cause and effect scenarios. You can ignore the symptomatic alerts except in cases where they arrive after a much more important and potential causal alert. In this case, you want to surface them as a Situation requiring operator attention. You may need to implement alert classification in order to identify what alerts qualify as seed alerts.

A seed alert filter allows you to restrict what event can start a candidate cluster and become a reference event within it. Any subsequent events that match the in scope filters can then join the cluster based on the attribute similarities. Only events arriving after the seed alert can join the same candidate cluster. If you require to do look back and catch any symptomatic events happening prior to seed alert then you may need to look at other options such as using the Alert Rules Engine.

See Add a seed alert about setting up a seed alert in connection with Vertex Entropy. See also Seed Alert Filter in Configure a Cookbook Recipe for setting it up using the UI, or in the endpoint addValueRecipe for how to do this using the Graze API.

Use the Cook For Auto-Extension to extend clustering

When you create a Cookbook or Recipe, you define a Cook For time. This can be described as the lifespan of a candidate cluster. It does not control how Cookbook clusters alerts but it dictates how long alerts should be considered in scope of the candidate cluster after its initial creation.

When deciding on a Cook For time, think about how long it takes for events relating to the same incident, as hinted by the Recipe, to occur. For example, when an underlying database fails supporting an application, how long does it take for monitoring to report on the failed database as well as the symptomatic application related alerts? If it is roughly 30 minutes then you should set your Cook For time to this value.

There are use cases when a longer Cook For time would make sense but you are still unsure by exactly how long you need to extend it. For instance, in the example above until the database gets fixed, the system will continue reporting application and transaction failures for a long time until the underlying issue gets fixed, and that can take hours or even days. Hence ideally you would like all these alerts clustered together. But remember that given the Cook For time of only 15 minutes, any alerts beyond this period, even if they are in scope of the original cluster, which has now already expired, will have to form a new cluster. This produces two or more Situations that actually relate to the same incident.

To address this you can enable the Cook For Auto-Extension feature.

If you add an extension time of 1 hour and an alert arrives during the extension time, Cookbook adds it to the existing Situation and extends the time by another hour in case further alerts come in. The Max Cook For time lets you cap the total length of time that Cookbook will continue to add alerts to the existing Situation.

This feature is available at Cookbook and Recipe level. See Configure a Cookbook Recipe and Configure a Cookbook for more information on setting up this feature. See Cookbook and Recipe Examples for an example of the Cook For Auto-Extension feature.

Cluster by shared list items

You can use this technique to cluster alerts based on values within a list.

Here is a demonstration of how this works:

GIF
List_Items.png
matchList.gif

For example, you would like to create Situations clustered around the same impacted business services because they relates directly to your teams' organization. The common factor in a Situation is business services. However a server can impact multiple business services so we need to cluster on the intersection of these. Since an alert is attached to multiple services it can appear in multiple Situations. But this is still an acceptable behavior for us and it is easy to see which other Situation an alert is part of, and perhaps worth consulting the knowledge base of the parallel Situation as well.

The type of attribute used in list matching must be an array.

See Match List Items in Recipes for how to configure list-based matching.

Prioritize Recipes in a Cookbook

You can set a priority order for Recipes in a Cookbook. If you want to match the first Recipe only, Cookbook treats Recipes in priority order and adds each alert to a cluster created by the highest priority Recipe that meets the clustering criteria. The priority order is defined by the order you have ranked the Recipes in the list. Alternatively, if you do not select this option, Cookbook adds each alert to clusters in all the Recipes in the Cookbook that meet the clustering criteria.

GIF
SingleRecipeMatching.png
SingleRecipeMatchingv2.gif

See First Recipe Match Only in Configure a Cookbook for more information on prioritizing Recipes in a Cookbook via the UI, or in the endpoint addCookbook for how to do this using the Graze API.