Understand alert similarity

You can define the set of alert fields and tags to evaluate for correlation, and the degree of similarity required for a match to occur between an alerts or an alert and an incident. Two alerts are considered similar and part of the same incident if all the fields and tags in the definition meet the specified degree of similarity. For specific guidance, see Create a new correlation definition and Best practices for defining similarity in correlation definitions.

The following sections explain the methodologies involved in making this similarity determination.

Alert field similarity

The required degree of similarity between the same fields in a new vs. an open alert. APEX AIOps Incident Management uses the bag-of-words model and the shingling natural-language processing methods to calculate the text similarity between two fields.

The correlation engine calculates the similarity differently depending on the field type:

Shingle similarity for the source and all custom tag fields.
List similarity for the services and tags.labels fields.
Multi-word similarity for the. agent, class, description, and manager fields.

Shingle similarity

This section describes similarity for the source field and all custom tag fields. The following example illustrates how the correlation engine determines if two fields are similar.

A correlation definition specifies source as the one field to correlate, with a similarity threshold of 80%.
The correlation engine receives an alert with source = clst1sql4 (cluster 1, SQL server 3). An open alert has source = clst1sql5 (cluster 1, SQL server 5).
To determine if the two fields are similar, the engine does the following:
1. Splits each string into a set of shingles based on the default shingle size, which is 3.
2. Compares each 3-character sequence in the new-alert field with the corresponding sequence in the open-alert field:
  cls lst st1 t1s 1sq sql ql4
  cls lst st1 t1s 1sq sql ql5
3. Calculates the similarity score between the two sources using the Sørensen–Dice coefficient.
4. Compares the similarity score with the similarity threshold for this field. The similarity score is 85%, which meets the required simlarity threshold of 80%.

If your source fields are based on a common naming convention, you can tune the similarity threshold based on the degree of correlation you want. In this example, you could specify the following thresholds:

100% similarity => same cluster and same SQL server: clst1sql3
80% similarity => Any SQL server in the same cluster: clst1sql3, clst1sql4...
40% similarity => Any SQL server in any cluster: clst1sql3, clst2sql4, clst7sql9...

List similarity

Correlation evaluates the service and tags fields (the list fields supported by Incident Management) in a specific way: if an item in an alert list field matches at least one item in the corresponding list field of another alert at 100%, those alerts are considered part of the same incident. The alert lists are then combined in the incident, and subsequent incoming alerts are compared to the combined list.

In the following example, Alerts #1 and #2 do not match on the services field as they have no common members:

Example 1:

Alert #1 services field = ["A", "B", "C", "D"]
Alert #2 services field = ["E", "F", "G", "H"]

In the following example, Alert #1 and Alert #3 are matches:

Example 2:

Alert #1 services field = ["A", "B", "C", "D"]
Alert #3 services field = ["A", "E", "I", "K"]

Alert #1 and Alert #3 match on list member A. Therefore, the two alerts are grouped together as parts of the same incident (referred to as Incident #1), and the services field lists are combined. Subsequent alert services fields are compared to Incident #1 services field, which includes the members of both alerts: [A, B, C, D, E, I, K].

After the lists from the alerts are combined in Incident #1, Alert #2 from the first example also matches the incident:

Example 3:

Incident #1 services list = ["A", "B", "C", "D", "E", "I", "K"]
Alert #2 services list = ["E", "F", "G", "H"]

While Alert #1 and Alert #2 in the first example are not a match, Alert #2 is a match for the services field in Incident #1 (a combination of Alert #1 and Alert #2), due to the list members contributed by Alert #3.

Here is another example showing how this works, using more realistic values for the services field:

Example 4:

Alert #4 services list = ["logging", "database", "http"]
Alert #5 services list = ["snmp", "database"]

Alerts #4 and #5 match on the database list member and form an incident with the following services list: ["logging", "database", "snmp", "http"].

Multi-word similarity

This section describes similarity calculations for agent, class, description, and manager fields that contain multiple words.

agent, class, description, and manager are string fields that might consist of multiple words separated by spaces. Instead of splitting the string into shingles, the correlation engine splits each string into words with space characters as the delimiters. Then it applies the Sørensen–Dice coefficient to calculate the similarity.

The following example illustrates how the correlation engine calculates similarity between two multi-word strings.

A correlation definition specifies class as the one field to correlate, with a similarity threshold of 70%.
Two open alerts have the following classes:
- Alert 1: "class" : "HTTP 5xx% c1n05 login1.0"
- Alert 2: "class" : "HTTP 5xx% c1n04 login1.1"
Alert 3 arrives with "class" : "HTTP 5xx% c1n03 login1.1".
Comparing alert 3 with alert 1, the engine calculates a similarity score of 50%. Both fields capture HTTP 5xx responses, but for different nodes and service versions. These fields do not meet the similarity threshold of 70%.
Comparing alert 3 with alert 2, the engine calculates a similarity score of 75%. Both fields capture HTTP 5xx responses for the same service and version but on different nodes. These fields meet the similarity threshold of 70%.

Depending on your alert fields, you can tune the similarity threshold to specify the degree of similarity you want. This is easiest to do when an alert field always uses the same convention with the same number of words. In this example, in which the class field always uses the same four-word convention, you could specify the following thresholds:

100% similarity => Same response type, same node, same service
70% similarity => Three matches (response type, node, and/or service)
50% similarity => Two matches (response type, or same node and service)

Understand alert similarity

Alert field similarity

Shingle similarity

List similarity

Multi-word similarity

Search results