Skip to main content

Concept explainer: Alert correlation - how it works ►

This video explains how alerts are correlated into incidents in APEX AIOps Incident Management, as well as the behavior of the correlation time window.

*Please note Moogsoft is now part of Dell's IT Operations solution called APEX AIOps, and changed its name to APEX AIOps Incident Management. The UI in this video may differ slightly but the content covered is still relevant.

In Incident Management, related alerts are grouped into an incident.

Alert_Correlation_Method_new.jpg

In this video, you will learn how to correlate your alerts into incidents in Incident Management.

Specifically, you will be able to explain the default correlation settings and how alerts are correlated into incidents.  Also, you will be able to configure new clustering settings in the correlation engine.

Correlation engine is where you manage all alert clustering configurations.

You start out with one out-of-the-box clustering setting, so your alerts will be grouped into incidents without any configuration on your end.

Image2.png

This is the default correlation setting.  The incident created by this correlation will have a dynamically composed description.  The default description shows how many sources are affected 'unique_count(source)' and the top three sources 'unique(source,3)', services 'unique(service,3)', and event classes 'unique(class,3)'involved in the incident.

Image3.png

Scope defines which alerts are going to be evaluated by this correlation definition.  Consider it like an entry filter.  Since right now there’s only one correlation, it will evaluate ALL alerts.  It will evaluate the source field values, and the alerts whose source field values match more than 45% will be clustered into an incident.

Image4.png

The time window for correlation is automatically set.

You can set the correlation time window up to 24 hours.  Let me show you how it works.

When an incident is created, the correlation engine starts a timer.  Let’s say we keep the time window to 15 minutes.

If more qualifying alerts come in, they are added to the incident.

But we don’t want to keep adding alerts to the same incidents forever.  If you keep the incident open for new alert membership for an indefinite amount of time, you’ll end up mixing multiple separate issues.

Alert_Correlation_Method_1.png

So, here’s how Incident Management handles the time window.

The key concept here is 50%. 50% of the default time window of 15 minutes is 7 minutes 30 seconds.

Alert_Correlation_Method_2.png

From the moment the incident formed, qualifying alerts keep getting added to this incident throughout the 15 minute time window.

But the last half, from 7 minutes 30 second to 15 minutes, is the key to determine when the time window for this incident actually closes.

If no qualifying alert arrives during the second half, then the window closes at the default 15 minute mark.

Alert_Correlation_Method_3.png

But if a qualifying alert comes in during the second half, say, at the 14 minute mark,

it triggers Incident Management to extend the correlation window.  For how long?

Alert_Correlation_Method_4.png

Again the key is the 50%.  50% of 15 minutes is added from the point this alert arrived. So now the time window is set to close at 21 minutes and 30 seconds.

Alert_Correlation_Method_5.png

Suppose another alert arrives within the new correlation window, at 16 minutes. The window extends again from the arrival time.

Alert_Correlation_Method_6.png

By how much? Again, 50% of the default window!

Alert_Correlation_Method_7.png

You can also choose how many similar alerts are needed to create an incident. The default is one alert, so every alert that arrives will either form a new incident or be added to an existing incident.

Alert_Correlation_Method_9.png

Now you know how Incident Management correlates alerts into incidents.  Thanks for watching!