Understand incidents
Moogsoft Cloud uses correlation definitions to cluster groups of related alerts into actionable incidents. By examining incidents, their constituent alerts, and similar past incidents, you can troubleshoot multiple related aspects of an outage or performance problem, monitor impacted services, and deduce a root cause. You can also share incident information with ServiceNow, Slack, PagerDuty, Microsoft Teams, and other third-party applications.
Navigate to Incidents > Incidents to view and interact with incidents. In this view you can review a list or a dashboard of incidents, drill down to the details for an individual incident, and examine an incident’s member alerts and metrics. You can search and filter incidents and save and share custom views. You can also focus on a single incident by entering its Situation Room.
Using the tools in the Incidents view, you can assign an incident to one or more user groups, yourself, or another user; update its priority, status, and description; collaborate using comments; and capture information about how the incident was resolved.
Incident attributes
The following table lists incident attributes available in Moogsoft.
Notes
Moogsoft stores all timestamps in UTC format. The dates and times displayed in the UI are based on your browser's local time.
Underscores ( _ ) in field names viewable in the code are replaced with a space in this view.
Attribute | Description |
---|---|
alerts | A list of the alerts in this incident. |
assigned groups | User groups assigned to this incident. |
changes | The last change (or changes, as multiple changes are possible) to the incident. |
classes | A list of the classifications ( |
closed on | Timestamp when this incident was closed. |
correlation definition | The name of the correlation definition which resulted in the creation of the incident. The name is linked to the correlation definition. |
created at | Timestamp when the Correlation Engine created this incident. |
description | Auto-generated description of the incident, based on the |
external names | If the incident triggered an external notification based on an outbound webhook, this indicates the object (such as a ticket number) in the external system. |
first event time | Timestamp of the earliest event in this incident. |
integration id | The outbound integration ID, if the incident triggered an external notification based on an outbound webhook. |
integration name | The outbound integration name, if the incident triggered an external notification based on an outbound webhook. |
id | Moogsoft auto-generates this ID when it creates the incident. |
in maintenance | Whether the incident includes any alerts that are in an active maintenance window (true) or not (false). |
in progress on | The time when the incident status was set to "In Progress." |
last event time | Timestamp of the most recent event in this incident. |
last state change | The last time a user updated the incident status or severity. |
maintenance windows | A list of maintenance windows that were active when some of the alerts in the incident updated with new events. |
manual description set | Whether the description was created automatically by the system (false) or updated by a user (true). |
originator | The user or process initiating the last change. For updates from external systems, this is the email address of the user which sends the updates to Moogsoft. |
priority | A user-selectable value from P1 to P5. Unlike severity, incident priority does not change when an incident is resolved or closed. |
resolved on | Time when the incident was resolved. |
resolving steps | The number of resolving steps in the comments associated with the incident. |
services | A list of all services that generated the events and metrics included in this incident. This list is derived from the |
severity | The incident severity equals the highest severity of any alert in that incident. |
severity high water | The highest severity an incident has reached. |
severity numeric | A number representing the incident severity level. |
status | Status of the incident. |
status numeric | A number representing the incident status. |
superseded by | An incident that was created after this one which includes all of the alerts in this incident. An incident is superseded by another incident when alerts initially included in one incident combine to form a more comprehensive and descriptive incident (such as incidents indicating several system failures combining into a single overarching switch failure incident). Reference the incident superseding the original incident for the most recent information. |
total alerts | The total number of alerts in the incident. |
types | The list of types from alerts in this incident. |