Understand alert similarity
You can define the set of alert fields and tags to evaluate for correlation, and the degree of similarity required for a match to occur between an alerts or an alert and an incident. Two alerts are considered similar and part of the same incident if all the fields and tags in the definition meet the specified degree of similarity. For specific guidance, see Create a new correlation definition and Best practices for defining similarity in correlation definitions.
The following sections explain the methodologies involved in making this similarity determination.
Alert field similarity
The required degree of similarity between the same fields in a new vs. an open alert. APEX AIOps Incident Management uses the bag-of-words model and the shingling natural-language processing methods to calculate the text similarity between two fields.
The correlation engine calculates the similarity differently depending on the field type:
Shingle similarity for the
source
and all customtag
fields.List similarity for the
services
andtags.labels
fields.Multi-word similarity for the.
agent
,class
,description
, andmanager
fields.
This section describes similarity for the source
field and all custom tag
fields. The following example illustrates how the correlation engine determines if two fields are similar.
A correlation definition specifies
source
as the one field to correlate, with a similarity threshold of80%
.The correlation engine receives an alert with
source
=clst1sql4
(cluster 1, SQL server 3). An open alert hassource
=clst1sql5
(cluster 1, SQL server 5).To determine if the two fields are similar, the engine does the following:
Splits each string into a set of shingles based on the default shingle size, which is 3.
Compares each 3-character sequence in the new-alert field with the corresponding sequence in the open-alert field:
cls lst st1 t1s 1sq sql ql4
cls lst st1 t1s 1sq sql ql5
Calculates the similarity score between the two sources using the Sørensen–Dice coefficient.
Compares the similarity score with the similarity threshold for this field. The similarity score is 85%, which meets the required simlarity threshold of 80%.
If your source fields are based on a common naming convention, you can tune the similarity threshold based on the degree of correlation you want. In this example, you could specify the following thresholds:
100% similarity => same cluster and same SQL server:
clst1sql3
80% similarity => Any SQL server in the same cluster:
clst1sql3
,clst1sql4
...40% similarity => Any SQL server in any cluster:
clst1sql3
,clst2sql4
,clst7sql9
...
Correlation evaluates the service
and tags
fields (the list fields supported by Incident Management) in a specific way: if an item in an alert list field matches at least one item in the corresponding list field of another alert at 100%, those alerts are considered part of the same incident. The alert lists are then combined in the incident, and subsequent incoming alerts are compared to the combined list.
In the following example, Alerts #1 and #2 do not match on the services
field as they have no common members:
Alert #1
services
field =["A", "B", "C", "D"]
Alert #2
services
field =["E", "F", "G", "H"]
In the following example, Alert #1 and Alert #3 are matches:
Alert #1
services
field =["A", "B", "C", "D"]
Alert #3
services
field =["A", "E", "I", "K"]
Alert #1 and Alert #3 match on list member A. Therefore, the two alerts are grouped together as parts of the same incident (referred to as Incident #1), and the services
field lists are combined. Subsequent alert services
fields are compared to Incident #1 services
field, which includes the members of both alerts: [A, B, C, D, E, I, K]
.
After the lists from the alerts are combined in Incident #1, Alert #2 from the first example also matches the incident:
Incident #1
services
list =["A", "B", "C", "D", "E", "I", "K"]
Alert #2
services
list =["E", "F", "G", "H"]
While Alert #1 and Alert #2 in the first example are not a match, Alert #2 is a match for the services
field in Incident #1 (a combination of Alert #1 and Alert #2), due to the list members contributed by Alert #3.
Here is another example showing how this works, using more realistic values for the services
field:
Alert #4
services
list =["logging", "database", "http"]
Alert #5
services
list =["snmp", "database"]
Alerts #4 and #5 match on the database
list member and form an incident with the following services
list: ["logging", "database", "snmp", "http"]
.
This section describes similarity calculations for agent
, class
, description
, and manager
fields that contain multiple words.
agent
, class
, description
, and manager
are string fields that might consist of multiple words separated by spaces. Instead of splitting the string into shingles, the correlation engine splits each string into words with space characters as the delimiters. Then it applies the Sørensen–Dice coefficient to calculate the similarity.
The following example illustrates how the correlation engine calculates similarity between two multi-word strings.
A correlation definition specifies
class
as the one field to correlate, with a similarity threshold of70%
.Two open alerts have the following classes:
Alert 1:
"class" : "HTTP 5xx% c1n05 login1.0"
Alert 2:
"class" : "HTTP 5xx% c1n04 login1.1"
Alert 3 arrives with
"class" : "HTTP 5xx% c1n03 login1.1"
.Comparing alert 3 with alert 1, the engine calculates a similarity score of 50%. Both fields capture HTTP 5xx responses, but for different nodes and service versions. These fields do not meet the similarity threshold of 70%.
Comparing alert 3 with alert 2, the engine calculates a similarity score of 75%. Both fields capture HTTP 5xx responses for the same service and version but on different nodes. These fields meet the similarity threshold of 70%.
Depending on your alert fields, you can tune the similarity threshold to specify the degree of similarity you want. This is easiest to do when an alert field always uses the same convention with the same number of words. In this example, in which the class
field always uses the same four-word convention, you could specify the following thresholds:
100% similarity => Same response type, same node, same service
70% similarity => Three matches (response type, node, and/or service)
50% similarity => Two matches (response type, or same node and service)