Classic
Warning
Support for Classic Sigaliser is deprecated in the release of Moogsoft AIOps v7.3.0. If you want to use time-based clustering see Tempus. For other clustering, see Cookbook.
The Sigaliser Moolet, also known as Sigaliser Classic, is where in event processing, alert streams from the Alert Builder or the Alert Rules Engine are converted into Situations.
The Sigaliser is self-contained and has no Moobot. It takes every occurrence of an event in an alert stream and uses matrix factorisation algorithms to identify clusters of alerts that are temporally correlated identifying underlying service outages or Situations. The Sigaliser then updates its own internal knowledge of the stores of the Situations and the Moogsoft AIOps database before putting updates out on the Message Bus.
Basic Concepts
You can configure and tune Sigaliser Classic by editing the parameters in the $MOOGSOFT_HOME/config/moolets/sigaliser.conf
configuration file. Generally,
the types of Situations created for a given set of alerts are dependent on the rate of occurrence
of alerts. You correspond by adjusting the resolution of the window of the Sigaliser parameters to
try and match the activity.
The algorithms work by spotting signature scatter pattern of alerts with in a time period. Firstly, how many optimal clusters there are, which should correspond to the number of current, active, service threatening outages in the given window that the Sigaliser operates on. Secondly, it then optimally factorises it down into individual groups, which Moogsoft AIOps calls Situations. Once you have a Situation, a Situation Room is created in the Moogsoft AIOps database, and you are notified through the Situation View in the user interface.
The algorithm is run in semi real-time and is triggered by either:
-
A fixed polled time period.
-
A single time slice being filled up, the width of which is set by the resolution parameter in the configuration. For example, the first alert that arrives after the current slice has been filled will trigger the Sigaliser to run its algorithms.
Sigaliser Configuration
You can define the Sigaliser behavior in the Sigaliser
section of the
Moogfarmd configuration file. In general, the following parameters can be configured to either
produce more Situations with fewer alerts, or, fewer Situations with more alerts. The consequence
of having more Situations with fewer alerts is that the same underlying outage could be split
across multiple Situations. Fewer Situations with more alerts results in the same Situation
containing alerts from multiple service outages. The process of tuning the Sigaliser parameters
leads to an optimal configuration, where, Situations sharply reflect the state of the managed
systems. Moogsoft refers to Situations being “sharp” and well
“resolved” when the parameters give you the best fit of Situations to service outages.
Sigaliser
contains a number of properties. The name
, classname
, and run_on_startup
properties are shared with other Moolets.
{ name : "Sigaliser", classname : "CSigaliser", run_on_startup : false, process_output_of : "AlertBuilder" }
name
The name
is hardcoded and should never be changed from Sigaliser.
classname
The classname
, CSigaliser
, is hardcoded and
should never be changed.
run_on_startup
By default, run_on_startup
is set to false, so that when Moogfarmd
starts, it does not automatically create an instance of the Sigaliser. In this case you can
start it using farmd_ctrl
.
Undertaking the sigalising
These properties in the Moolet direct which output should be processed:
process_output_of
Instructs the Moolet to process the output of the Alert Builder or Alert Rules Engine. Usually the Sigaliser connects directly to the Alert Builder, and the Alert Rules Engine is only used if automations are desired prior to Situation resolution. The Sigaliser can have only one input.
Algorithmics
The Sigaliser runs the matrix factorization algorithms, the properties for which are as follows:
# Algorithm time_compression : true, alert_threshold : 2, membership_limit : 3, sig_similarity_limit : 0.7, sig_alert_horizon : 0.5, scale_by_severity : false, entropy_threshold : 0.0,
time_compression
If set to true
, the algorithm will ignore any empty time buckets in
the Sigaliser calculation. If set to false, it will include the empty time buckets. We recommend
that you set time_compression
to true for low data rates and false for
normal data rates.
You only require time_compression
in scenarios where the data rate is
very low when compared to the values of window
and resolution
. In certain low data-rate scenarios it is possible for a window
or resolution
to contain no alerts. For
example if the data rate is two alerts per hour and the window
is 15
minutes, on average, some of the time buckets in any Situation calculation will be empty. When
time_compression
is true
empty time-buckets
are removed from the calculation, but the total number of buckets used in the calculation
remains the same.
alert_threshold
Defines the minimum number of alerts that a Situation can contain. So, increasing the alert_threshold
will reduce the total number of Situations. We recommend
an alert_threshold
of 2.
alert_threshold
can be used in conjunction with small values of membership_limit
to produce a smaller number of Situations, each of which
has more alerts.
membership_limit
The Situation creation process contains multiple steps, including a resolution and merging step. During the merging phase, the raw Situations from the factorization calculation are compared and merged with the currently active Situations. This detects when a detected Situation is either novel or an evolution in time of an existing Situation.
The membership_limit
property restricts the number of Situations in
which an alert can appear. As Situations become merged with each other over time, it is possible
for an alert to appear in more Situations than are defined by membership_limit
. Changing the value of membership_limit
does not have a large impact on the total number of
Situations but does change the distribution of the number of alerts in each Situation.
Decreasing the membership_limit
results in fewer Situations with more
alerts and more Situations containing a small numbers of alerts. Whereas, increasing membership_limit
results in, more Situations with a greater number of
alerts and fewer Situations containing a small numbers of alerts. Therefore, the optimal value
seems to be between one and five, with a recommended membership_limit
of three.
sig_similarity_limit (Jaccard Similarity Coefficient)
A measure of the similarity between two Situations before they are merged together. The value is the Jaccard Similarity Coefficient (JSC) defined as the ratio of shared alerts between two Situations to total unique alerts in both Situations.
For example, if Situation1 & Situation 2 share two common alerts, each Situation has one unique alert:
JSC = 2 (common alerts) / [1 (unique to Situation 1) + 2 (common to both) + 1 (unique to Situation 2)] = 2/(1+2+1) = 2/4 = 0.5.
Reducing the similarity index will reduce the total number of Situations. Smaller values
increase the likelihood of Situations being merged together, as they have to share fewer alerts
in common to be viewed as the same Situation. Conceptually, JSC values less than 0.5 are hard to
justify as grounds for merging, so should be used with care. We recommend a sig_similarity_limit
of 0.7.
sig_alert_horizon
When the Sigaliser algorithm initially identifies a Situation, it will contain alerts that are more representative of the Situation than others. This parameter, which takes the value between 0.0 and 1.0, allows you to provide a cut off for membership based upon the highest significant alert in the cluster. If you set this value to be 0.5, for example, only alerts that have a “significance” for the Situation that is more than half of the most significant alert in the Situation will be included. 0.5 is the default value.
entropy_threshold
The value of this parameter is the minimum entropy that an alert must possess to be included in the Sigaliser calculation. Any alert that arrives at the Sigaliser with entropy below this value will never be included in a Situation. It has a value between 0.0 and 1.0 and has a default of 0.0 which means every alert will be processed.
scale_by_severity
scaleBySev
allows you to bias Moogsoft
AIOps so that high severity alerts are treated as having higher entropy. If you had the
same alert arrive with a critical severity, versus a minor severity, you would give the critical
severity the higher entropy than the minor severity. This scaling is done as the severity
constant number divided by the maximum severity (5). So in the case of critical, you get all of
the entropy and in the case of minor, you get three fifths of the entropy. In the case of clear
you would get an entropy value of 0.0.
Triggers and Time Buckets
The algorithm is run incrementally as events are ingested, as such Situations are produced and updated in real-time. There are two ways to trigger the algorithm: using a time interval or using the rate of the event stream.
# Triggers sig_on_bucket : true, sig_interval : 100, max_backlog : 1000000, # Time Buckets resolution : 120, window : 90
The optimal trigger for production should be sig_on_bucket=true
,
provided this ensures satisfactory Situation accuracy and that Situations are being regularly
updated. sig_on_bucket
can also simulate real-time behavior using
historical data.
When Situations are not being updated regularly enough, configure sig_on_bucket
= false
and set sig_interval
to a value no more than half of the real-time size of the
window.
In a production environment, set max_backlog
to a high value to avoid
triggering the Sigaliser between timed executions. This parameter will cause the algorithms to
run if the number of events that arrive before either a scheduled execution, or a bucket being
filled is above this value. It should be used with care and only when you have an environment
where the event rate is highly variable.
sig_on_bucket
If set to true
, the Sigaliser will run whenever a new time bucket
occurs. Depending upon the data rate, this has the effect of executing the Sigaliser after every
defined number of “resolution” seconds.
sig_on_bucket = true
deactivates both the sig_interval
and max_backlog
triggers.
sig_interval
Executes the Sigaliser algorithm every defined number of seconds, in the example above, every 100 seconds.
sig_interval
and max_backlog
do not
override each other; consequently, it is possible for the Sigaliser to be executed more
frequently through the sig_interval
value.
max_backlog
Executes the Sigaliser if the number of defined Alerts are received since last execution, in the example above, the Sigaliser is executed after 1,000,000 alerts are received.
resolution
The duration, in seconds, for each bucket of time that the event stream is divided into. A high value for the resolution will result in Situations that are less “sharp” in time, as the wider the bucket the more likely that alerts from disconnected outages will occur in the same bucket, and potentially in the same Situation.
window
The number of time-buckets to include in the calculation. The width of the window should be
chosen to match the average time period over which outages typically evolve. The total amount of
time considered in any Sigaliser calculation is window multiplied by the resolution
.
In general, for a high data rate you would use a smaller resolution
and window
than for a low data rate. For a fixed data rate, a smaller
resolution
will generally result in more Situations.
Diagrams
The diagram below illustrates how a Sigaliser can be triggered every 180 seconds if 'sig_on_bucket' is set to 'true', the time bucket resolution is set to '60' and the window is '3':
The diagram below illustrates how a Sigaliser can be triggered if 'sig_interval' is set to 120 seconds and if 'max_backlog' is set to 50,000 events: