Tempus
Tempus is a time-based algorithm in Moogsoft AIOps which clusters alerts into Situations based on the similarity of their timestamps.
The underlying premise of Tempus is that when things go wrong, they go wrong together. For example, if a core element of your network infrastructure such as a switch fails and disconnects then it affects a lot of other interconnected elements which send events at a similar time.
Tempus uses the Jaccard index to calculate the similarity of different alerts. It also uses community detection methods to identify which alerts with similar arrival patterns it should cluster into Situations.
As Tempus is time-based, you should not use it to detect events relating to the slow or gradual degradation of a service from disks filling up or CPU usage.
Note
One advantage of Tempus is it only uses event timestamps for clustering so no alert enrichment is required.
Time-based clustering
Moogsoft AIOps applies Tempus incrementally to alerts as it ingests them so that it can create Situations in real-time.
The diagrams below show how Tempus sorts and then groups alerts with similar timestamps into Situations.
Raw alerts from the Moolet chain, for example, the Alert Builder or Alert Workflows, arrive over a period of time. These are shown as gray dots in the diagram below:
Tempus identifies and sorts which alerts have similar arrival patterns:
Tempus clusters alerts with similar arrival patterns into Situations:
Configure Tempus
You can configure and tune Tempus in $MOOGSOFT_HOME/config/moolets/tempus.conf
. The Moolet parameters configure
general information about each Sigaliser. The Output parameters control where the output processed
by Tempus originates from. The Trigger and Sigalising parameters control the Sigaliser execution
and duration.
Moolet Parameters
The parameters that relate to the Tempus Moolet are as follows:
run_on_startup
Determines whether Tempus runs when Moogsoft AIOps starts. If enabled, Tempus captures all alerts from the moment the system starts, without you having to configure or start it manually.
Type: Boolean
Default: false
metric_path_moolet
Determines whether Tempus is included in the metric for Self Monitoring or not.
Type: Boolean
Default: false
description
Describes the Situation produced by the Tempus clustering algorithm.
Type: String
Default: A Tempus (a.k.a. Sigaliser V2) Situation
The default Tempus parameters are as follows:
name : "Tempus", classname : "com.moogsoft.farmd.moolet.tempus.CTempus", run_on_startup : false, metric_path_moolet : true, #process_output_of : "AlertRulesEngine", process_output_of : "AlertBuilder", description : "A Tempus (a.k.a. Sigaliser V2) Situation",
Note
name
and classname
are hard coded and
should not be changed.
Output Parameters
These parameters control the output processed by the Tempus clustering algorithm:
process_output_of
Defines the Moolet source of the alerts that Tempus processes. By default, the Sigaliser connects directly to the Alert Builder and Alert Rules Engine is only being used if automations are desired prior to Situation resolution.
Type: List
One of: AlertBuilder
,
AlertRulesEngine
, MaintenanceWindowManager
,
EmptyMoolet
Default: AlertBuilder
entropy_threshold
Sets the minimum entropy value for an alert to be clustered into a Situation. Tempus does not include any alerts with an entropy value below the threshold in Situations. Set to a value between 0.0 and 1.0. The default of 0.0 means all alerts are processed.
Type: Integer
Default: 0.0
The default output parameters are as follows:
# process_output_of : "AlertRulesEngine", process_output_of : "AlertBuilder", description : "A Tempus (a.k.a. Sigaliser V2) Situation", # Algorithm entropy_threshold : 0.0,
Trigger and Sigalising Window Parameters
The execution and duration of Tempus is controlled by the trigger, window and bucket parameters:
-
The sig_interval trigger determines when Tempus starts to run
-
The window is the total span of time in seconds in which alerts will be analyzed each time Tempus runs
-
Time buckets are small five-second subdivisions of the window in which the Alerts are captured.
sig_interval
Executes the Tempus algorithm after a defined number of seconds. In the example above, the Sigaliser will run every 120 seconds (two minutes).
Type: Integer
Default: 120
window_size
Determines the length of time of the window in which alerts are analyzed and a Situation develops each time the Sigaliser is run. By default the Sigalising window is 1200 seconds (20 minutes).
Type: Integer
Default: 1200
bucket_size
Determines the time span of each bucket in which alerts are captured in seconds. By default each bucket is five seconds long so there will be 240 buckets per window.
Type: Integer
Default: 5
Warning
Moogsoft does not recommend you change the bucket size. If you do
want to change the bucket_size
then change with caution because
Tempus is designed to use small bucket sizes
arrival_spread
Sets the acceptable latency or arrival window for each alert in seconds. This can be used to minimise or reduce the impact of multiple alerts arriving over a small amount of time and landing in separate buckets.
Type: Integer
Default: 15
min_arrival_similarity
Determines how similar alerts must be to be consider for clustering. This is useful way to determine what proportion of the events two alerts need to share to have a similar pattern of arrival. By default this is 0.6667 which means Tempus will disregard any alerts with less than two-thirds similarity.
Type: Integer
Default: 0.6667
The default trigger and sigalising window parameters are as follows:
# Triggers sig_interval : 120, # seconds => sigalise every 2 minutes # Sigalising Window window_size : 1200, # seconds => 20 minutes bucket_size : 5, # seconds : Take Care if changing - Tempus is designed to use small bucket sizes arrival_spread : 15, # seconds : acceptbale latency/arrival window for each event
Partitioning
Partitioning is set to 'null' by default. There are two methods to partition data into Situations. The first is 'partition_by' which splits the clusters according to the parameters specified. The second is 'pre_partition', which splits the incoming event stream before clustering.
Note
Pre-partitioning is recommended as it does not interfere with the results of the clustering algorithms.
partition_by
After clustering has taken place and before you enter merging and resolution, you can split
clusters into sub-clusters based on a component of the events. For example, you can use the
manager
parameter to ensure the Situations only contain events from
the same manager. In general, and by default, you should comment out the partition_by
parameter.
Warning
Partitioning by components is not recommended.
pre_partition
An alternative way of partitioning is to use pre_partition
which
allows you to specify a component field (from the list of specified components) around which the
event stream will be partitioned before clustering occurs. The Alerts in the resulting
Situations will each contain a single value for the component field chosen.
Significance
You can configure Tempus to only create Situations from alerts that meet a certain degree of constant significance based upon Poisson distribution calculations.
significance_test
Calculation that determines how significant a cluster of alerts or potential Situation must be
for Tempus to detect it. The default, Poisson1
, looks at the data of a
single alert cluster to calculate how significant it is. The default is more likely to detect
all significant alert clusters but with a higher risk of creating insignificant alert clusters.
Use this option when your alerts originate from different networks. Poisson2
is a more thorough test that looks at an alert cluster and all
alerts outside the cluster with a similar event rate. It is more likely to exclude all
insignificant alert clusters but with a high risk of excluding significant alert clusters. Use
this option if you expect all of your alerts to come from the same connected network.
Type: String
One of: Poisson1
,
Poisson2
Default: Poisson1
significance_threshold
Sets the maximum significance score in order for Tempus to create a Situation. The score is
proportional to the probability that the alert cluster or potential Situation was coincidence.
The lower the score, the more significant the cluster and the least likely it was a coincidence.
The significance_threshold
score ranges from 0-100.
Type: Integer
Default: 1
Tempus Example
Tempus appears in $MOOGSOFT_HOME/config/moolets/tempus.conf
as follows:
{ # Moolet name : "Tempus", classname : "com.moogsoft.farmd.moolet.tempus.CTempus", run_on_startup : false, metric_path_moolet : true, #process_output_of : "AlertRulesEngine", process_output_of : "AlertBuilder" description : "A Tempus (a.k.a. Sigaliser V2) Situation", # Algorithm entropy_threshold : 0.0, # Triggers sig_interval : 120, # seconds => sigalise every 2 minutes # Sigalising Window window_size : 1200, # seconds => 20 minutes bucket_size : 5, # seconds : Take Care if changing - Tempus is designed to use small bucket sizes arrival_spread : 15, # seconds : acceptbale latency/arrival window for each event # How similar must alerts be to be considered for clustering? min_arrival_similarity : 0.6667, pre_partition : null, partition_by : null, significance_test : "Poisson1", significance_threshold : 1 }