Skip to main content

Understand incidents

APEX AIOps Incident Management uses correlation definitions to cluster groups of related alerts into actionable incidents. By examining incidents, their constituent alerts, and similar past incidents, you can troubleshoot multiple related aspects of an outage or performance problem, monitor impacted services, and deduce a root cause. You can also share incident information with ServiceNow, Slack, PagerDuty, Microsoft Teams, and other third-party applications.

Navigate to Incidents > Incidents to view and interact with incidents. In this view you can review a list or a dashboard of incidents, drill down to the details for an individual incident, and examine an incident’s member alerts and metrics. You can search and filter incidents and save and share custom views. You can also focus on a single incident by entering its Situation Room.

IncidentDetails.png

Using the tools in the Incidents view, you can assign an incident to one or more user groups, yourself, or another user; update its priority, status, and description; collaborate using comments; and capture information about how the incident was resolved.

Incident fields

The following table describes the incident fields which display as columns in the top pane and in the Details area on the Alerts page.

Notes

  • Incident Management stores all timestamps in UTC format. The dates and times displayed in the UI are based on your browser's local time.

  • Underscores ( _ ) in field names viewable in the code are replaced with spaces in this view.

Column name

Details field name (if different)

Description

alerts

A list of the alerts in this incident.

assigned groups

User groups assigned to this incident.

assignee

User assigned to this incident.

none

auto closed on

The time when the incident was closed, if it was closed by automation.

none

changes

The last change (or changes, as multiple changes are possible) to the incident.

classes

A list of the classifications (class field) of all alerts in the incident. The class field is used to categorize the events and metric anomalies that make up an alert. For example, an alert with a "WebServerMonitor" class might include a "web-server-down" event and a "http-requests-failed-rate" anomaly.

closed on

Timestamp when this incident was closed.

none

comments

The number of comments included with this incident.

correlation definition

The identifier for the correlation definition which resulted in the creation of the incident. The name is linked to the correlation definition in the bottom grid for the selected incident in the Incidents page.

correlation definition name

source correlation name

The user-friendly name of the correlation definition which resulted in the creation of the selected incident. The value for correlation definition name always displays as easily readable name, while the correlation definition field displays as a hexadecimal ID in some situations.

created at

Timestamp when the Correlation Engine created this incident.

description

Auto-generated description of the incident, based on the description field in the correlation definition that generated the incident. If the description is manually edited in the incident details view, then it will remain static unless the edit is reverted.

none

external details

The number of sets of external details (objects in the array) in the incident. For example, if three integrations sent outbound notifications for this incident, the the value for this field would be 3.

external IDs

external details.external id

A list of identifiers for the incident on external systems.

external integration IDs

external details.integration id

A list of identifiers in Incident Management for the integrations which have sent notifications for this incident.

external integration names

external details.integration name

A list of user-friendly names in Incident Management for the integrations which have sent notifications for this incident.

external integration types

external details.integration type

A list of the types (webhook or category, such as PagerDuty or ServiceNow) of integrations have sent notifications for this incident.

external links

external details.external link

One or more HTML links to systems outside of Incident Management which usually link to the equivalent incident on the external system.

external names

A list of user-friendly object names (such as ticket numbers) on external systems which are the equivalent of this incident.

first event time

Timestamp of the earliest event in this incident.

id

Incident Management auto-generates this ID when it creates the incident.

in maintenance

Whether the incident includes any alerts that are currently in an active maintenance window (true) or not (false).

in progress on

The time when the incident status was set to "In Progress." 

last event time

Timestamp of the most recent event in this incident.

last state change

The last time a user updated the incident status or severity.

maintenance window occurrence IDs

maintenance windows.occurrence id

The IDs of the specific maintenance window occurrences (such as one instance of a recurring scheduled window) affecting one ore more alerts in the incident.

none

maintenance windows

A list of maintenance window names, linked to the corresponding maintenance window summary page, which potentially impacted alerts in this incident.

none

manual description set

This value is true for incidents where a user has manually updated the description, and false for incidents that have an automatically generated description.

merged into incident

The ID of the incident that replaced a superseded incident.

none

originator

The user or process initiating the last change. For updates from external systems, this is the email address of the user which sends the updates to Incident Management.

policies

The metric policies responsible for identifying the anomalies which led to the creation of this incident.

priority

A user-selectable value from P1 (most urgent) to P5 (least urgent). Unlike severity, incident priority does not change when an incident is resolved or closed.

none

priority numeric

A value from 1 (P1) to 5 (P5) representing the priority assigned to the incident.

resolve time

The length of time which was required to resolve or close the incident.

resolved on

Time when the incident was resolved.

none

resolving steps

The number of comments associated with the incident that are resolving steps.

services

A list of all services that generated the events and metrics included in this incident. This list is derived from the service field in the member alerts in this incident. 

severity

The incident severity equals the highest severity of any alert in that incident.

severity high water

The highest severity an incident has reached.

severity high water numeric

A number representing the highest severity an incident has reached.

severity numeric

A number representing the incident severity level.

sources

A list of the originating hosts of the alerts in the incident.

status

Status of the incident.

status numeric

A number representing the incident status.

superseded by

In superseded incidents, the ID of the most recent incident which includes all of the alerts in this incident. The value is usually the same as the value for merged into incident, but it is possible for it to be a different ID if multiple superseding events occurred involving the same alerts. If the incident superseding an incident was superseded by another incident, then the final superseding incident ID displays for superseded by.

For more, see Understand superseded incidents.

superseded on

The time when a superseded incident was replaced by another incident.

tags.<tag_name>

The optional tags included in this incident. Tags may move into incidents from the component alerts, or incidents themselves can have tags added through automated processes like workflows.

total alerts

The total number of alerts in the incident.

types

The list of types from alerts in this incident.