Deduplicate events to reduce noise

datapipelineenrichdedup.png

Deduplication is the process of identifying repeated events and combining them into alerts. This reduces operational noise by limiting the number of alerts in the system.

How it works: the dedupe key

When Moogsoft ingests a raw event, it compares the new event with all open alerts. Does an open alert already describe the same issue, on the same node and service?

  • If there is a matching alert, Moogsoft flags the new event as a duplicate and updates the alert.

  • If there is no matching alert,Moogsoft creates a new alert.

Each incoming event has a dedupe_key field. By default,Moogsoft autogenerates this key based on the sourceservice, and check fields in the event itself. (The dedupe key also includes class if an event includes this field.) The dedupe key defines the context shared by all events that belong to the same alert.

An example

Suppose a series of events all describe response times for the same microservice on the same host. Although their timestamps, descriptions, and severities differ, all these events have the same key: source = server 23, service = db-query-svc, check = response-time.

The following sequence illustrates how Moogsoft deduplicates these events:

  1. The first event arrives with description = “db-query-response-time > 400 ms”  and severity = minor.

  2. Moogsoft compares this event against all open alerts. There is no open alert with the same key. Moogsoft creates a new alert based on the new event.

  3. The second event arrives with description = “db-query-response-time > 600 ms”  and severity = major.

  4. Moogsoft compares the new event with the alert it just created. The event and alert have the same key. It updates the alert fields with the new event information:

    • Event count = 2 (was 1)

    • Last event time = new event time (was previous event time)

    • Severity = major (was previous event severity)

    • Description = “db-query-response-time > 600 ms” (was previous event description)

  5. The response time remains high and several more events arrive with varying levels of severity. With each event, Moogsoft updates the alert as described previously.

  6. Finally, the response time falls to within acceptable levels. An event arrives with description = “db-query-response-time < 200 ms”  and severity = clear.

  7. Moogsoft updates the alert as described previously. Because the status of this alert is now clear, any new events with the same dedupe key get added to a new alert.