Understand alert similarity
You can define the set of alert fields and tags to evaluate for correlation, and the degree of similarity required for a match to occur between an alerts or an alert and an incident. Two alerts are considered similar and part of the same incident if all the fields and tags in the definition meet the specified degree of similarity. For specific guidance, see Create a new correlation definition and Best practices for defining similarity in correlation definitions.
The following sections explain the methodologies involved in making this similarity determination.
Alert field similarity
The required degree of similarity between the same fields in a new vs. an open alert. APEX AIOps Incident Management uses the bag-of-words model and the shingling natural-language processing methods to calculate the text similarity between two fields.
The correlation engine calculates the similarity differently depending on the field type:
Shingle similarity for the
source
and all customtag
fields.List similarity for the
services
andtags.labels
fields.Multi-word similarity for the.
agent
,class
,description
, andmanager
fields.
This section describes similarity for the source
field and all custom tag
fields. The following example illustrates how the correlation engine determines if two fields are similar.
A correlation definition specifies
source
as the one field to correlate, with a similarity threshold of80%
.The correlation engine receives an alert with
source
=clst1sql4
(cluster 1, SQL server 3). An open alert hassource
=clst1sql5
(cluster 1, SQL server 5).To determine if the two fields are similar, the engine does the following:
Splits each string into a set of shingles based on the default shingle size, which is 3.
Compares each 3-character sequence in the new-alert field with the corresponding sequence in the open-alert field:
cls lst st1 t1s 1sq sql ql4
cls lst st1 t1s 1sq sql ql5
Calculates the similarity score between the two sources using the Sørensen–Dice coefficient.
Compares the similarity score with the similarity threshold for this field. The similarity score is 85%, which meets the required simlarity threshold of 80%.
If your source fields are based on a common naming convention, you can tune the similarity threshold based on the degree of correlation you want. In this example, you could specify the following thresholds:
100% similarity => same cluster and same SQL server:
clst1sql3
80% similarity => Any SQL server in the same cluster:
clst1sql3
,clst1sql4
...40% similarity => Any SQL server in any cluster:
clst1sql3
,clst2sql4
,clst7sql9
...
Correlation evaluates the services
and tags.labels
fields (the list fields supported by Incident Management) in a specific way: if one or more items in the list for an incoming alert match one or more items in the list of the initial alert for the incident, then the list fields are considered similar and the alerts are a match. There is no similarity threshold configuration for lists.
The following examples show different ways that lists may or may not match. Initial refers to the initial alert in an incident, and Incoming refers to an alert that could potentially get added to the incident, and which is being compared to the initial alert to make this determination.
In this example, Alerts #1 and #2 do not match on the services
field as they have no common members:
Initial: Alert #1
service
field =["A", "B", "C", "D"]
Incoming: Alert #2
service
field =["E", "F", "G", "H"]
In this example, Alert #3 matches Alert #1:
Initial: Alert #1
service
field =["A", "B", "C", "D"]
Incoming: Alert #3
service
field =["A", "D"]
Alert #3 matches Alert #1 for items A and D.
Here is another example showing how this works, using more realistic values for the services
field:
Initial: Alert #4
service
list =["logging", "database", "http"]
Incoming: Alert #5
service
list =["snmp", "database"]
Alerts #4 and #5 match on the database
item. Because only one item in the list must match, these two alerts are considered similar based on the service
list.
Individual list values must match at 100% similarity. In this example, the alerts do not match at all:
Initial: Alert #4
service
list =["logging", "database", "http"]
Incoming: Alert #6
service
list =["datacenter"]
While datacenter
and database
contain the similar string "data", the values in lists must match exactly. The value database
only matches the value database
, and the value datacenter
only matches the value datacenter
.
Alerts can potentially cause the creation of additional incidents when alert service
lists overlap with other alerts in the incident services
list. Incoming alerts can match items in an incident list and become part of an additional incident when they do not share a common list value with all of the other alerts.
In the following examples the Minimum Alerts Count (alert threshold) is set to 2.
Initial: Alert #7
service
list =["A", "B", "C"]
Incoming: Alert # 8
service
list =["C", "D", "E"]
Incident #1 forms with Alert #7 and Alert #8 and combines the service
lists, so the services
list = ["A", "B", "C", "D", "E"]
for the incident.
Then, another alert arrives and matches the incident for item E:
Initial: Incident #1
services
list =["A", "B", "C", "D", "E"]
Incoming: Alert #9
service
list =["E", "F", "G"]
The two alerts that include the matching list item become an additional incident. Incident #2 is created with the following alerts:
Alert #8
service
list =["C", "D", "E"]
Alert #9
service
list =["E", "F", "G"]
The services
list for Incident #2 includes the following items: ["C", "D", "E", "F", "G"]
Alert #8 is included in both Incident #1 and Incident #2.
When incoming alerts share a common list value with other alerts in the incident, the incident is updated rather than creating a new incident.
In the following examples the Incident Creation Threshold (alert threshold) is set to 2.
Initial: Alert #10
service
list =["A", "B", "C"]
Incoming: Alert #11
service
list =["C", "D", "E"]
Incident #3 forms with Alert #10 and Alert #11 as members because they match on item C. The services
list for Incident #3 includes ["A", "B", "C", "D", "E"]
A third alert arrives, matching the incident services list for item C:
Initial: Incident #3
services
list:["A", "B", "C", "D", "E"]
Incoming: Alert #12
service
list["C", "E", "F"]
Incident #3 updates with the information from Alert #12. It then contains the following alerts:
Alert #10
service
list =["A", "B", "C"]
Alert #11
service
list =["C", "D", "E"]
Alert #12
service
list["C", "E", "F"]
The services
list for Incident #3 includes the following items: ["A", "B", "C", "D", "E", "F"]
This section describes similarity calculations for agent
, class
, description
, and manager
fields that contain multiple words.
agent
, class
, description
, and manager
are string fields that might consist of multiple words separated by spaces. Instead of splitting the string into shingles, the correlation engine splits each string into words with space characters as the delimiters. Then it applies the Sørensen–Dice coefficient to calculate the similarity.
The following example illustrates how the correlation engine calculates similarity between two multi-word strings.
A correlation definition specifies
class
as the one field to correlate, with a similarity threshold of70%
.Two open alerts have the following classes:
Alert 1:
"class" : "HTTP 5xx% c1n05 login1.0"
Alert 2:
"class" : "HTTP 5xx% c1n04 login1.1"
Alert 3 arrives with
"class" : "HTTP 5xx% c1n03 login1.1"
.Comparing alert 3 with alert 1, the engine calculates a similarity score of 50%. Both fields capture HTTP 5xx responses, but for different nodes and service versions. These fields do not meet the similarity threshold of 70%.
Comparing alert 3 with alert 2, the engine calculates a similarity score of 75%. Both fields capture HTTP 5xx responses for the same service and version but on different nodes. These fields meet the similarity threshold of 70%.
Depending on your alert fields, you can tune the similarity threshold to specify the degree of similarity you want. This is easiest to do when an alert field always uses the same convention with the same number of words. In this example, in which the class
field always uses the same four-word convention, you could specify the following thresholds:
100% similarity => Same response type, same node, same service
70% similarity => Three matches (response type, node, and/or service)
50% similarity => Two matches (response type, or same node and service)