Tempus

Tempus is a time-based algorithm in Moogsoft AIOps which clusters alerts into Situations based on the similarity of their timestamps.

The underlying premise of Tempus is that when things go wrong, they go wrong together. For example, if a core element of your network infrastructure such as a switch fails and disconnects then it affects a lot of other interconnected elements which send events at a similar time.

Tempus uses the Jaccard index to calculate the similarity of different alerts. It also uses community detection methods to identify which alerts with similar arrival patterns it should cluster into Situations.

As Tempus is time-based, you should not use it to detect events relating to the slow or gradual degradation of a service from disks filling up or CPU usage.

Note

One advantage of Tempus is it only uses event timestamps for clustering so no alert enrichment is required.

Time-based clustering

Moogsoft AIOps applies Tempus incrementally to alerts as it ingests them so that it can create Situations in real-time.

The diagrams below show how Tempus sorts and then groups alerts with similar timestamps into Situations.

Raw alerts from the Moolet chain, for example, the Alert Builder or Alert Workflows, arrive over a period of time. These are shown as gray dots in the diagram below:

29962166.png

Tempus identifies and sorts which alerts have similar arrival patterns:

29962165.png

Tempus clusters alerts with similar arrival patterns into Situations:

29962164.png

Configure Tempus

You can configure and tune Tempus in $MOOGSOFT_HOME/config/moolets/tempus.conf. The Moolet parameters configure general information about each Sigaliser. The Output parameters control where the output processed by Tempus originates from. The Trigger and Sigalising parameters control the Sigaliser execution and duration.

Moolet Parameters

The parameters that relate to the Tempus Moolet are as follows:

run_on_startup

Determines whether Tempus runs when Moogsoft AIOps starts. If enabled, Tempus captures all alerts from the moment the system starts, without you having to configure or start it manually.

Type: Boolean

Default: false

metric_path_moolet

Determines whether Tempus is included in the metric for Self Monitoring or not.

Type: Boolean

Default: false

description

Describes the Situation produced by the Tempus clustering algorithm.

Type: String

Default: A Tempus (a.k.a. Sigaliser V2) Situation

The default Tempus parameters are as follows:

    name              : "Tempus",
    classname         : "com.moogsoft.farmd.moolet.tempus.CTempus",
    run_on_startup    : false,
    metric_path_moolet   : true,
    #process_output_of : "AlertRulesEngine",
    process_output_of : "AlertBuilder",
    description       : "A Tempus (a.k.a. Sigaliser V2) Situation",

Note

name and classname are hard coded and should not be changed.

Output Parameters

These parameters control the output processed by the Tempus clustering algorithm:

process_output_of

Defines the Moolet source of the alerts that Tempus processes. By default, the Sigaliser connects directly to the Alert Builder and Alert Rules Engine is only being used if automations are desired prior to Situation resolution.

Type: List

One of: AlertBuilder, AlertRulesEngine, MaintenanceWindowManager, EmptyMoolet

Default: AlertBuilder

entropy_threshold

Sets the minimum entropy value for an alert to be clustered into a Situation. Tempus does not include any alerts with an entropy value below the threshold in Situations. Set to a value between 0.0 and 1.0. The default of 0.0 means all alerts are processed.

Type: Integer

Default: 0.0

The default output parameters are as follows:

    # process_output_of  : "AlertRulesEngine",
    process_output_of    : "AlertBuilder",
    description          : "A Tempus (a.k.a. Sigaliser V2) Situation",
 
    # Algorithm
    entropy_threshold    : 0.0,

Trigger and Sigalising Window Parameters

The execution and duration of Tempus is controlled by the trigger, window and bucket parameters:

  • The sig_interval trigger determines when Tempus starts to run

  • The window is the total span of time in seconds in which alerts will be analyzed each time Tempus runs

  • Time buckets are small five-second subdivisions of the window in which the Alerts are captured.

sig_interval

Executes the Tempus algorithm after a defined number of seconds. In the example above, the Sigaliser will run every 120 seconds (two minutes).

Type: Integer

Default: 120

window_size

Determines the length of time of the window in which alerts are analyzed and a Situation develops each time the Sigaliser is run. By default the Sigalising window is 1200 seconds (20 minutes).

Type: Integer

Default: 1200

bucket_size

Determines the time span of each bucket in which alerts are captured in seconds. By default each bucket is five seconds long so there will be 240 buckets per window.

Type: Integer

Default: 5

Warning

Moogsoft does not recommend you change the bucket size. If you do want to change the bucket_size then change with caution because Tempus is designed to use small bucket sizes

arrival_spread

Sets the acceptable latency or arrival window for each alert in seconds. This can be used to minimise or reduce the impact of multiple alerts arriving over a small amount of time and landing in separate buckets.

Type: Integer

Default: 15

min_arrival_similarity

Determines how similar alerts must be to be consider for clustering. This is useful way to determine what proportion of the events two alerts need to share to have a similar pattern of arrival. By default this is 0.6667 which means Tempus will disregard any alerts with less than two-thirds similarity.

Type: Integer

Default: 0.6667

The default trigger and sigalising window parameters are as follows:

       # Triggers
       sig_interval      : 120,    # seconds => sigalise every 2 minutes

       # Sigalising Window
       window_size       : 1200,   # seconds => 20 minutes
       bucket_size       : 5,      # seconds : Take Care if changing - Tempus is designed to use small bucket sizes
       arrival_spread    : 15,     # seconds : acceptbale latency/arrival window for each event

Partitioning

Partitioning is set to 'null' by default. There are two methods to partition data into Situations. The first is 'partition_by' which splits the clusters according to the parameters specified. The second is 'pre_partition', which splits the incoming event stream before clustering.

Note

Pre-partitioning is recommended as it does not interfere with the results of the clustering algorithms.

partition_by

After clustering has taken place and before you enter merging and resolution, you can split clusters into sub-clusters based on a component of the events. For example, you can use the manager parameter to ensure the Situations only contain events from the same manager. In general, and by default, you should comment out the partition_by parameter.

Warning

Partitioning by components is not recommended.

pre_partition

An alternative way of partitioning is to use pre_partition which allows you to specify a component field (from the list of specified components) around which the event stream will be partitioned before clustering occurs. The Alerts in the resulting Situations will each contain a single value for the component field chosen.

Significance

You can configure Tempus to only create Situations from alerts that meet a certain degree of constant significance based upon Poisson distribution calculations.

significance_test

Calculation that determines how significant a cluster of alerts or potential Situation must be for Tempus to detect it. The default, Poisson1, looks at the data of a single alert cluster to calculate how significant it is. The default is more likely to detect all significant alert clusters but with a higher risk of creating insignificant alert clusters. Use this option when your alerts originate from different networks. Poisson2 is a more thorough test that looks at an alert cluster and all alerts outside the cluster with a similar event rate. It is more likely to exclude all insignificant alert clusters but with a high risk of excluding significant alert clusters. Use this option if you expect all of your alerts to come from the same connected network.

Type: String

One of: Poisson1, Poisson2

Default: Poisson1

significance_threshold

Sets the maximum significance score in order for Tempus to create a Situation. The score is proportional to the probability that the alert cluster or potential Situation was coincidence. The lower the score, the more significant the cluster and the least likely it was a coincidence. The significance_threshold score ranges from 0-100.

Type: Integer

Default: 1

Tempus Example

Tempus appears in $MOOGSOFT_HOME/config/moolets/tempus.conf as follows:

{
        # Moolet
        name                    : "Tempus",
        classname               : "com.moogsoft.farmd.moolet.tempus.CTempus",
        run_on_startup          : false,
        metric_path_moolet  : true,
        #process_output_of  : "AlertRulesEngine",
        process_output_of   : "AlertBuilder"
        description             : "A Tempus (a.k.a. Sigaliser V2) Situation",

        # Algorithm
        entropy_threshold       : 0.0,

        # Triggers
        sig_interval            : 120,    # seconds => sigalise every 2 minutes

        # Sigalising Window
        window_size             : 1200,   # seconds => 20 minutes
        bucket_size             : 5,      # seconds : Take Care if changing - Tempus is designed to use small bucket sizes
        arrival_spread          : 15,     # seconds : acceptbale latency/arrival window for each event

        # How similar must alerts be to be considered for clustering?
        min_arrival_similarity : 0.6667,

        pre_partition           : null,
        partition_by            : null,

        significance_test       : "Poisson1",
        significance_threshold  : 1
}