Glossary

Express Glossary

Alert

A set of one or more unique events that all relate to a specific performance measure. Examples include:

  • 4 events that show show CPU utilization at 10%, 47%, 63%, and 20% on a specific server

  • 3 events that show a short spike in a key performance metric (1 second, 9 seconds, 1 second) over a 3-minute time window

The Correlation Engine clusters similar alerts into incidents. You can view individual alerts in the Alerts page. You can also drill down into the member alerts of an incident in the Incidents page.

Note

In some monitoring tools, "alert" refers to external notifications of a rule violation or other performance-impacting event. This is different from the usage of "alert" in Express.

Anomaly

The first observed data point after a time series metric switches from normal to anomalous performance. The ingestion engine treats each anomaly as a performance-impacting event.

Collector

A Java-based agent running on a remote server that does the following:

  • Observes time series metrics – either actively, at the source, or by ingesting a stream passively

  • Detects anomalies in each performance metric locally

  • Sends detected anomalies and raw metrics to Express. The ingestion engine treats anomalies as performance-impacting events and aggregates them into alerts, which you can view in the Alerts page.

Correlation

The process of finding correlations between alerts, based on similarities between data fields of interest, and clustering correlated alerts into actionable incidents.

The Settings > Correlation Engine page includes a simple UI to define the correlation logic that makes sense for your organization. Express then uses fuzzy matching, natural-language processing, and your correlation profiles to correlate new alerts with previous ones.

Dedupe Key

A auto-generated signature that Express generates for each new event and uses to determine if that event is a duplicate. By default the dedupe key is based on the source, service, and check fields.

Deduplication

A stage in the ingestion process where the ingestion engine eliminates any event that is identical to a previously-seen event.

Deduplication eliminates noise and ensures that each ingested event is unique.

Detector

The algorithm that a Managed Object uses to detect anomalies in a metric. Every metric observed by a Managed Object has an associated detector.

Enrichment

The process of adding user-defined data to alerts during the ingestion process.

Enrichment is useful when you want to customize how Express correlates alerts and clusters them into incidents. You might also want to enrich your alerts to make the resulting incidents more informative and readable.

Event

A data object that describes an event of operational interest. An event might be based on an event notification from an external tool, or a metric anomaly from a Collector or AWS CloudWatch. Examples include:AWS Cloudwatch old

  • A network switch went down 35 seconds ago

  • Average free memory on a server was 10% over the past minute

  • A collector detected an anomaly in a key performance metric 43 seconds ago

Events form the initial raw data for Express, which does the following:

  1. Converts each ingested notification and anomaly into a generic event object.

  2. Deletes duplicate events.

  3. Aggregates similar events into alerts

Incident

A cluster of alerts that all relate to the same actionable incident. Express clusters alerts based on the similarity of their time stamps and data fields. The Settings > Correlation Engine page has a simple UI where you can define the correlation behavior that makes sense for your organization.

Managed Object

A set of collector policies for observing metrics from a specific data source such as Linux OS, AWS, Docker, Logstash, etc. Each Managed Object defines the set of metrics to observe and the configuration settings for each metric.

Metric

A set of data points, each with its own timestamp, that measures a specific aspect of performance such as response time or utilization. Collectors can monitor performance on remote servers, detect performance anomalies locally at the source, and send anomalies and raw metrics directly to Express.

Severity

Each anomaly, alert, and incident has an associated severity that indicates the degree of difference between the observed performance and normal performance. The severity generally indicates how urgently the performance issue requires corrective action. The degrees of severity are:

  • Critical (red)

  • Major (orange)

  • Minor (yellow) 

  • Warning (blue)

  • Unknown (purple)

  • Clear (green)

Express calculates severities as follows:

  • Metric anomalies — Express considers each new anomaly within a distribution of all previous anomalies for that metric.

  • Events from external tools —Express maps the severities from the external tool's schema to the Express event schema.

  • Alerts — The alert severity is the severity of the most recent event used to update the alert.

  • Incidents — The incident severity is the highest current severity of any member alert.

Superseded

An incident that has been merged and replaced with another incident.