Skip to main content

Understand incidents

Moogsoft Cloud uses correlation definitions to cluster groups of related alerts into actionable incidents. By examining incidents, their constituent alerts, and similar past incidents, you can troubleshoot multiple related aspects of an outage or performance problem, monitor impacted services, and deduce a root cause. You can also share incident information with ServiceNow, Slack, PagerDuty, Microsoft Teams, and other third-party applications.

Navigate to Incidents > Incidents to view and interact with incidents. In this view you can review a list or a dashboard of incidents, drill down to the details for an individual incident, and examine an incident’s member alerts and metrics. You can search and filter incidents and save and share custom views. You can also focus on a single incident by entering its Situation Room.

IncidentDetails.png

Using the tools in the Incidents view, you can assign an incident to one or more user groups, yourself, or another user; update its priority, status, and description; collaborate using comments; and capture information about how the incident was resolved.

Incident attributes

The following table lists incident attributes available in Moogsoft.

Notes

  • Moogsoft stores all timestamps in UTC format. The dates and times displayed in the UI are based on your browser's local time.

  • Underscores ( _ ) in field names viewable in the code are replaced with a space in this view.

Attribute

Description

alerts

A list of the alerts in this incident.

assigned groups

User groups assigned to this incident.

changes

The last change (or changes, as multiple changes are possible) to the incident.

classes

A list of the classifications (class field) of all alerts in the incident. The class field is used to categorize the events and metric anomalies that make up an alert. For example, an alert with a "WebServerMonitor" class might include a "web-server-down" event and a "http-requests-failed-rate" anomaly.

closed on

Timestamp when this incident was closed.

correlation definition

The name of the correlation definition which resulted in the creation of the incident. The name is linked to the correlation definition.

created at

Timestamp when the Correlation Engine created this incident.

description

Auto-generated description of the incident, based on the description field in the correlation definition that generated the incident. If the description is manually edited in the incident details view, then it will remain static unless the edit is reverted.

external names

If the incident triggered an external notification based on an outbound webhook, this indicates the object (such as a ticket number) in the external system. 

first event time

Timestamp of the earliest event in this incident.

integration id

The outbound integration ID, if the incident triggered an external notification based on an outbound webhook.

integration name

The outbound integration name, if the incident triggered an external notification based on an outbound webhook.

id

Moogsoft auto-generates this ID when it creates the incident.

in maintenance

Whether the incident includes any alerts that are in an active maintenance window (true) or not (false).

in progress on

The time when the incident status was set to "In Progress." 

last event time

Timestamp of the most recent event in this incident.

last state change

The last time a user updated the incident status or severity.

maintenance windows

A list of maintenance windows that were active when some of the alerts in the incident updated with new events. 

manual description set

Whether the description was created automatically by the system (false) or updated by a user (true).

originator

The user or process initiating the last change. For updates from external systems, this is the email address of the user which sends the updates to Moogsoft.

priority

A user-selectable value from P1 to P5. Unlike severity, incident priority does not change when an incident is resolved or closed.

resolved on

Time when the incident was resolved.

resolving steps

The number of resolving steps in the comments associated with the incident.

services

A list of all services that generated the events and metrics included in this incident. This list is derived from the service field in the member alerts in this incident. 

severity

The incident severity equals the highest severity of any alert in that incident.

severity high water

The highest severity an incident has reached.

severity numeric

A number representing the incident severity level.

status

Status of the incident.

status numeric

A number representing the incident status.

superseded by

An incident that was created after this one which includes all of the alerts in this incident.

An incident is superseded by another incident when alerts initially included in one incident combine to form a more comprehensive and descriptive incident (such as incidents indicating several system failures combining into a single overarching switch failure incident). Reference the incident superseding the original incident for the most recent information.

total alerts

The total number of alerts in the incident.

types

The list of types from alerts in this incident.