Skip to main content

Understand how correlation works

Watch a concept explainer: Alert correlation - how it works ►

In real-world terms, an incident is a cluster of alerts that all relate to the same issue. For example, an incident might consist of the following alerts:

  1. webServB23, 09:00, spike in incoming requests to myApp

  2. webServB23, 09:01, spike in CPU load

  3. webServB23, 09:01, dip in available memory

  4. webServB23, 09:02, spike in response times for myApp

The Correlation Engine identifies correlations between data fields to cluster alerts into incidents. You can correlate based on source nodes, times, alert types, custom tags, and other relevant information. You can define targeted correlations with the clustering behavior you want. Each definition has the following:

  • A filter that specifies the set of alerts to consider for correlation

  • A similarity test that includes the alert fields to compare and the degree of similarity for each field. Two alerts are considered similar if all fields meet the similarity criteria in the test.

  • The correlation time period -- that is, the maximum time window between two alerts to be considered correlated.

  • A description based on the fields and other data of interest in the component alerts

A common practice is to create a profile that clusters alerts by source node, so each resulting incident relates to a specific node within a specific time window (within the maximum correlation time period). You can create more customized profiles to cluster alerts based on alert type, application, service, location, ops team, and so on. You can use any common data fields to cluster your alerts.