Events, alerts, and incidents
Before moving on to more advanced features, it's helpful to understand the three basic informational elements of APEX AIOps Incident Management: events, alerts, and incidents.
Events
Events are occurrences on monitored systems outside of Incident Management that are passed into Incident Management. They are identified by user-defined configurations, which specify the characteristics of an occurrence of special interest.
Events are ingested by Incident Management through:
Events in originating systems have vendor-specific (or user-defined) event payload fields, but Incident Management recognizes the following values for severity:
Unknown |
Clear |
Warning |
Minor |
Major |
Critical |
Severity levels are typically mapped to the corresponding payload values in incoming events as a part of integration setup.
Alert and incident creation from events
There are two paths for creating alerts and incidents in Incident Management from events:
Starting with metrics:
Starting with events:
Alerts
An alert is a unique event as identified by Incident Management.
Alerts are created from:
Ingested events
Events derived from metrics that deviate from expected values
Incident Management deduplicates ingested events and creates an alert to uniquely represent a specific group of repeated events or metric data points.
Like the triggering events they are derived from, alerts have a variety of associated parameters. They include identifying fields such as source, time, description, and other fields which provide useful information for determining event uniqueness when reducing noise by consolidating repeated events. For more information, see Deduplication: events to alerts.
Two fields--severity and status--are of particular interest because they directly impact Incident Management processes and alert and incident visibility. For more information about alert fields, see Alerts and alert details reference.
Severity
All alerts have an associated severity, either assigned by Incident Management (for metrics) or the originating external system (for integrations).
Alert severity | Meaning for alerts derived from events | Meaning for alerts derived from metrics |
---|---|---|
Unknown | Displays when Incident Management cannot identify the severity in the event payload; occurs when an event severity is not mapped to a recognized severity in Incident Management. | Incident Management has not yet ingested enough data to accurately assign a severity. An Unknown severity typically displays when the system is still gathering enough data to determine the normal values for a metric. |
Clear | An outside system has sent an event through an integration or the Events API with a payload field mapping to the Clear severity in Incident Management; Clear indicates that an alert is resolved. | Values for a metric which was producing anomalous values are now within expected thresholds. |
Warning | Assigned by payload fields in originating systems. | Determined by the degree a metric value is either above or below the configured metric thresholds; controlled by metrics configurations and automatic system learning. |
Minor | ||
Major | ||
Critical |
The severity of an alert is determined by the severity of the most recently received event matching the existing alert. It is not user editable.
Example: Incident Management receives three events and determines they are similar enough to represent with a single alert with Minor severity. A fourth event with Critical severity is ingested and Incident Management determines that this event represents the same condition which generated the other three events. Incident Management removes the duplicate event and changes the severity for the representative alert to Critical.
Status
Unlike severity, alert statuses are user editable. Alert statuses can also affect the status of incidents.
NOTE: You can customize system default settings using Auto Close.
Alert status | An alert changes to this status when: | Impact to incidents | System defaults |
---|---|---|---|
Open | The alert is first generated from received events. NOTE: The age of an alert is based on the time the alert has existed in Incident Management since it was opened. | Incidents are created from correlated open alerts. An alert can appear in multiple incidents because the occurrence represented by an alert can be a factor in multiple issues. | All alerts initially have a status of Open. |
In Progress | A user manually assigns this alert to an Incident Management user. | When the alerts in an incident have a mixed set of statuses, the incident status changes to In Progress. | NA |
Superseded | A newer alert replaces the initial alert. This can happen as part of the alert deduplication process when the polling interval for data collection is low. | The initial alert no longer appears as part of the incident. The alert which replaced it appears instead. | NA |
Resolved | A user manually sets the status of this alert to Resolved. When Incident Management ingests an event with a severity of Clear, the status of the representative alert changes to Resolved. | When all of the alerts in an incident have a status of Resolved, the incident status changes to Resolved. | NA |
Closed | An Incident Management user manually closes the alert, or the default length of time elapses to automatically close a resolved alert. | Incidents have a status of Closed when a user manually closes them or when they exceed the configured age to close automatically. | Status = Closed 30 minutes after Status = Resolved Status = Closed from any state after 72 hours |
Alert statuses can change to other statuses in the following ways:
From | To |
---|---|
Open | In Progress Resolved Closed |
In Progress | Resolved Closed |
Superseded | NA An alert becomes irrelevant when it is superseded and it is not updated further. All updates apply to the alert which replaced it. |
Resolved | Open Closed |
Closed | NA Closed is a final status and cannot revert to any other status |
Incidents
When Incident Management creates an alert, it also creates an incident from that alert. Incident Management determines if alerts are similar and, if they are similar enough, they are added as components of a single incident. Any matching incoming alerts are added to the incident as long as the incident has a status other than Closed.
For more information on incident fields, see Incidents and incident details reference.
Severity
Like alerts, incidents have an assigned severity. Incident severity is derived from the alerts comprising it.
Incident severity | This alert status means: |
---|---|
Unknown | All alerts in the incident have a severity of Unknown. If a matching alert with a different severity is added to the incident, that severity replaces Unknown. |
Clear | All alerts in the incident have a severity of Clear. |
Warning | The severity of an incident (Warning through Critical) is determined by the highest severity in the included alerts. Example: Incident Management creates an incident from three alerts with the following severity values: Minor, Major, Warning. The severity of the incident is Major. |
Minor | |
Major | |
Critical |
Status
Incident statuses are user editable and are subject to automatic changes due to system settings.
Just as alert status changes can affect incidents, changes in incident statuses can affect alerts.
NOTE: You can customize system default settings using Auto Close.
Incident status | An incident changes to this status when: | Impacts to alerts | System defaults |
---|---|---|---|
Open | The incident is created. Open is the default status for new incidents. A resolved incident status returns to Open if an alert with a severity higher than Clear is added. | None. An Open status for an incident indicates that at least one included alert is still unresolved. | All incidents initially have an Open status. |
In Progress | A user manually assigns this incident to an Incident Management user, or the alerts in the incident have mixed statuses. | When an incident status changes to In Progress, the statuses of all included alerts also change to In Progress. Incident Management assigns all alerts included in the incident to the same user the incident is assigned to. If a different user is already assigned to an alert, that assignment does not change. | NA |
Superseded | A new incident that better represents an issue (usually when it is more encompassing) replaces the initial incident. An incident is more likely to be superseded by another when a system has lengthy correlation time windows. | Alerts which were part of the initial incident display as part of the superseding incident instead. | NA |
Resolved | A user manually sets the status of the incident to Resolved, or all alerts included in the incident have a severity of Clear. | The status for all included alerts changes to Resolved. The severity for all included alerts changes to Clear. | Incident Status = Closed 60 minutes after all alerts in the incident are closed |
Closed | An Incident Management user manually closes the incident, or the incident automatically closes based on system settings. | The status of the alert with the highest severity in the incident changes to Closed and the alert severity changes to Clear. No further alerts are added to the incident; no further changes occur to the incident or included alerts. | Incidents older than seven days automatically close. |
Incident statuses can change to other statuses in the following ways:
From | To |
---|---|
Open | In Progress Resolved Closed |
In Progress | Resolved Closed |
Superseded | NA An incident becomes irrelevant when it is superseded and it is not updated further. All updates apply to the incident which replaced it. |
Resolved | Open In Progress Closed |
Closed | NA Closed is a final status and cannot revert to any other status |