Understand alert similarity

You can define the set of alert fields and tags to evaluate for correlation, and the degree of similarity required for a match to occur between an alerts or an alert and an incident. Two alerts are considered similar and part of the same incident if all the fields and tags in the definition meet the specified degree of similarity. For specific guidance, see Define a custom correlation definition and Best practices for defining correlations.

The following sections explain the methodologies involved in making this similarity determination.

Alert field similarity

The required degree of similarity between the same fields in a new vs. an open alert. Moogsoft Cloud uses the bag-of-words model and the shingling natural-language processing methods to calculate the text similarity between two fields.

The correlation engine calculates the similarity differently depending on the field type:

This section describes similarity for the source field and all custom tag fields. The following example illustrates how the correlation engine determines if two fields are similar.

  1. A correlation definition specifies source as the one field to correlate, with a similarity threshold of 80%.

  2. The correlation engine receives an alert with source = clst1sql4 (cluster 1, SQL server 3). An open alert has source = clst1sql5 (cluster 1, SQL server 5).

  3. To determine if the two fields are similar, the engine does the following:

    1. Splits each string into a set of shingles based on the default shingle size, which is 3.

    2. Compares each 3-character sequence in the new-alert field with the corresponding sequence in the open-alert field:

      cls lst st1 t1s 1sq sql ql4

      cls lst st1 t1s 1sq sql ql5

    3. Calculates the similarity score between the two sources using the Sørensen–Dice coefficient.

    4. Compares the similarity score with the similarity threshold for this field. The similarity score is 85%, which meets the required simlarity threshold of 80%.

If your source fields are based on a common naming convention, you can tune the similarity threshold based on the degree of correlation you want. In this example, you could specify the following thresholds:

  • 100% similarity => same cluster and same SQL server: clst1sql3

  • 80% similarity => Any SQL server in the same cluster: clst1sql3, clst1sql4...

  • 40% similarity => Any SQL server in any cluster: clst1sql3, clst2sql4, clst7sql9...

Correlation evaluates the services and tags fields (the list fields supported by Moogsoft Cloud) in a specific way: if an item in an alert list field matches at least one item in the corresponding list field of another alert at 100%, those alerts are considered part of the same incident. The alert lists are then combined in the incident, and subsequent incoming alerts are compared to the combined list.

In the following example, Alerts #1 and #2 do not match on the services field as they have no common members:

Example 1:
  • Alert #1 services field = ["A", "B", "C", "D"]

  • Alert #2 services field = ["E", "F", "G", "H"]

In the following example, Alert #1 and Alert #3 are matches:

Example 2:
  • Alert #1 services field = ["A", "B", "C", "D"]

  • Alert #3 services field = ["A", "E", "I", "K"]

Alert #1 and Alert #3 match on list member A. Therefore, the two alerts are grouped together as parts of the same incident (referred to as Incident #1), and the services field lists are combined. Subsequent alert services fields are compared to Incident #1 services field, which includes the members of both alerts: [A, B, C, D, E, I, K].

After the lists from the alerts are combined in Incident #1, Alert #2 from the first example also matches the incident:

Example 3:
  • Incident #1 services list = ["A", "B", "C", "D", "E", "I", "K"]

  • Alert #2 services list = ["E", "F", "G", "H"]

While Alert #1 and Alert #2 in the first example are not a match, Alert #2 is a match for the services field in Incident #1 (a combination of Alert #1 and Alert #2), due to the list members contributed by Alert #3.

Here is another example showing how this works, using more realistic values for the services field:

Example 4:
  • Alert #4 services list = ["logging", "database", "http"]

  • Alert #5 services list = ["snmp", "database"]

Alerts #4 and #5 match on the database list member and form an incident with the following services list: ["logging", "database", "snmp", "http"].

This section describes similarity calculations for agent, class, description, and manager fields that contain multiple words.

agent, class, description, and manager are string fields that might consist of multiple words separated by spaces. Instead of splitting the string into shingles, the correlation engine splits each string into words with space characters as the delimiters. Then it applies the Sørensen–Dice coefficient to calculate the similarity.

The following example illustrates how the correlation engine calculates similarity between two multi-word strings.

  1. A correlation definition specifies class as the one field to correlate, with a similarity threshold of 70%.

  2. Two open alerts have the following classes:

    • Alert 1: "class" : "HTTP 5xx% c1n05 login1.0"

    • Alert 2: "class" : "HTTP 5xx% c1n04 login1.1"

  3. Alert 3 arrives with "class" : "HTTP 5xx% c1n03 login1.1".

  4. Comparing alert 3 with alert 1, the engine calculates a similarity score of 50%. Both fields capture HTTP 5xx responses, but for different nodes and service versions. These fields do not meet the similarity threshold of 70%.

  5. Comparing alert 3 with alert 2, the engine calculates a similarity score of 75%. Both fields capture HTTP 5xx responses for the same service and version but on different nodes. These fields meet the similarity threshold of 70%.

Depending on your alert fields, you can tune the similarity threshold to specify the degree of similarity you want. This is easiest to do when an alert field always uses the same convention with the same number of words. In this example, in which the class field always uses the same four-word convention, you could specify the following thresholds:

  • 100% similarity => Same response type, same node, same service

  • 70% similarity => Three matches (response type, node, and/or service)

  • 50% similarity => Two matches (response type, or same node and service)