Speedbird
Speedbird groups events related to an actionable outage into clusters of their related Alerts. These clusters are service impacting, with the group of ‘clustered’ Alerts providing operational value to someone using the system.
Speedbird allows you to configure a set of parameters of an Event to drive the clustering in addition to time. For example, you may want a group of Alerts and Events together that have a coincidence in time, but also have a coincidence in another value of the Event, such as, the hostname. Speedbird allows you to create clusters of Alerts with a similar hostname that have also occurred at a similar time.
Speedbird's Algorithm
The algorithmic technique used by SpeedBird is based around Kmeans, which is a wellunderstood and traditional clustering algorithm that is a form of unsupervised machine learning. For the SpeedBird Moolet, Moogsoft AIOps uses some of the same algorithmic tool chain that is used in the Sigaliser along with the Kmeans algorithm. For instance, AIOps still uses the same time based determination of how many real clusters there are in the data at a given point in time. Nonnegative matrix factorisation in the limit collapses into a Kmeans calculation, but is more computationally efficient.
Configuration
To configure SpeedBird, the following should be read in conjunction with the Tuning guidelines , to enable you to produce optimal results.
sig_resolution
In
moog_farmd.conf
, there is a general
sig_resolution
parameter grouping before the Moolet definitions with the following parameters:
sig_resolution : { alert_threshold : 1, sig_similarity_limit : 0.7 },

These parameters are set for all Sigalisers whether it is SpeedBird or the traditional Sigaliser running in a given farmd. The
sig_resolution
parameters allow you to compare preexisting Situations and determine if it is an evolution of an existing Situation, or, a new Situation
Moolet and Algorithm
The parameter groups Moolet and Algorithm function in the same way as those in the existing Sigaliser.
# Moolet name : "Speedbird", classname : "CSpeedbird", run_on_startup : false, process_output_of : "AlertBuilder", # Algorithm time_compression : true, scale_by_severity : true, entropy_threshold : 0.35,
For further information on these parameters, see the table below:
Parameter  Input  Description  Example 

name 
  The name of the Sigaliser 
Speedbird 
classname 
 
The classname of the Sigaliser WarningThis is hardcoded and should never be changed 
CSpeedbird 
run_on_startup 
Boolean 
If enabled, an instance of the Sigaliser will be created when moog_farmd starts. This is disabled by default 
false 
process_output_of 
AlertBuilder

This sets whether the Sigaliser processes the output of either the Alert Builder or Alert Rules Engine. The latter can only be used if automations are desired prior to the Situation resolution NotePlease note : The Sigaliser can only have one input 
AlertBuilder 
time_compression 
Boolean 
If enabled the Sigaliser will ignore empty time buckets. If disabled, it will include empty time buckets NotePlease note : For low data rates you should set this to 'true', for normal data rates set this to 'false' 
true 
scale_by_severity 
Boolean  If enabled, high severity Alerts are treated as having higher entropy. This scaling is done as the severity constant number divided by the maximum severity (5) 
true 
entropy_threshold 
Integer 
The value of this parameter is the minimum entropy that the Alert must have to be included in the Sigaliser calculation. Any Alert that arrives at the Sigaliser with a lower entropy than this value will not be included in Situations. NotePlease note : The default value of 0.0 means every Alert will be processed by the Sigaliser 
0.35 
sig_alert_horizon
The
sig_alert_horizon
parameter allows you to prune clusters. The value allows you to control when you remove outlying Events from the cluster:
 If the value is less than <0.0, no pruning is undertaken.
 At 0.0, members that are further than one standard deviation from the centroid of the cluster are eliminated.

At more than > 0.0 the standard deviation is multiplied by
sig_alert_horizon
, and then members further than mean+ sig_alert_horizon*std_dev
distance from centroid are eliminated.
sig_alert_horizon : 0.0,
Every cluster has a centroid, which is the average point in the middle of a cluster.
In the diagram above there are three points in a defined cluster (X), and the centroid (C), which is not a real point in this space phase but represents the center of the cluster. You compute the distance of each point from the centroid of the cluster, which results in an average distance and standard deviation. You can then work out the standard deviation to determine the spread of the cluster. A low standard deviation, i.e., 0, means all of the points are the same distance from the centroid; whereas, a high standard deviation means they are a highly variable distance from the centroid thus indicating a random cluster.
components
You can choose which parameters of an Event are used by the clustering algorithm. In the following example, "source", "source_id", "description" are declared:
components : [ "source","source_id","description" ]
Additionally, the system always takes into consideration the time that the Event arrives in the system (
event_time or last_occurred
for an Alert). You can have as many components as you like, but, the more components that are selected, the greater numerical complexity is introduced into the system, and there is a chance you will get a smaller number of Alerts per cluster and less correlation.
Partitioning
There are two methods of partitioning the data into Situations. The first is 'partition_by' which splits the clusters according to the parameters specified. The second is 'pre_partition', which splits the incoming Event stream before clustering.
Note
Please note : Prepartitioning is recommended as it does not interfere with the results of the clustering algorithms
partition_by
After clustering has taken place and before you enter merging and resolution, you can split clusters into subclusters based on a component of the Events. For example, you can use the
manager
parameter to ensure the Situations only contain Events from the same manager. In general, and by default, you should comment out the
partition_by
parameter.
pre_partition
An alternative way of partitioning is to use
pre_partition
which allows you to specify a component field (from the list of specified components) around which the Event stream will be partitioned before the Kmeans clustering occurs. The Alerts in the resulting Situations will each contain a single value for the component field chosen.
For example, if the SpeedBird
component
option was set to:
components : [ "source","manager","description" ],
In the
metric
below, the description component is being weighted more heavily compared to source and manager. Please note that the metric always contains one more values than the components specified and that the first value always corresponds to time.
default : [ 1,1,1,1000000],
This results in Situations containing Alerts with more similar
description
fields and a variety of
source
and
manager
fields.
Adding the following property ensures that Situations contain Alerts with very similar
description
fields, a variety of
source
fields but only a single distinct
manager
field.
prepartition : "manager"
pre_partition
, like
partition_by
, is defaulted to false in
moog_farmd.conf
so has no effect. If
pre_partition
is not required there is no need to modify the existing
moog_farmd.conf
files to include the property.
It is possible to configure
pre_partition
and
partition_by
at the same time, but the
partition_by
parameter will only have any effect if it is applied to a different component.
A note on time_compression and pre_partition
pre_partition
splits the Events into separate streams based on the component you have specified, as opposed to
partition_by
, which allows the algorithms to work on the whole Event stream and then splits up the results.
Partitioning the Event stream using pre_partition can make time_compression less effective. There are many things in the tuning parameters and behaviours of the Sigalisers that depend upon the event rate, and because you are splitting the stream up, if you have an event rate of X and you split it into many streams, each of those streams is going to have an event rate of less than X. This can skew whether the tuning parameters you are using are appropriate, so with or without
time_compression
you should be careful. With
time_compression
, you expect to avoid silent moments in the Event stream, but this may not be the case because the effect of
pre_partition
is to split the stream.
For example, if you
pre_partition
on
manager
, set
time_compression
to
true
, and set
window
to
10
and
resolution
to
60
, you will store up to 10 oneminute wide buckets of Events for clustering.
The Events could arrive as follows:
Bucket 
Minute 
Manager 

1 
1 
Andrew, Alan 
2 
2 
Alan 
3 
17 
Alan 
4 
18 
Alan 
5 
20 
Andrew 
6 
35 
Alan 
7 
37 
Alan 
8 
38 
Alan 
9 
57 
Alan 
10 
59 
Alan 
11 
60 
Alan 
It should be noted that the minute 1 bucket will be dropped from the Sigaliser window because AIOps only keeps the last ten live buckets. Clustering for Events with Manager Alan will only use nine buckets, and clustering for Events with Manager Andrew will only use 1 bucket.
metric
metric : { default : [ 1,1,1,1], categoryField: "agent", "DBMON" : [ 100,1,1000,1000000], "NETMON" : [ 1,100000000,1,0] },
The metric is a technical and detailed area of configuration, which relates to how Moogsoft measures distance between two events in the phase space used for clustering. Euclidean distance is easy to compute as you calculate the square of the differences in the components (in two dimensions the distance is the hypotenuse of a rightangled triangle, in three dimensions it is the diagonal measurement of a cuboid, and so on...) add them all up and this reveals the square of the distance. This example is a simplification.
For instance, if you have x, y and z as the components of a vector, the square root of the distance is:
You can put a number in front of these sums of squares, and the values are more correctly known as the diagonal metric tensor values. Moogsoft assumes that you should only ever consider the diagonal metric tensor values; however, in general coordinate geometry you can contribute to the distance by adding in, for example, (yz)2 . It is not considered useful to compare different attributes of an event for similarity.
This approach allows you to weight the distance between two events based upon their components. For example, if X represents time, Y represents source and Z represents manager, and you make a2 much bigger than a1. Any distance in source creates a lot more distance between the events than the same distance in time. This allows you to weight the importance. This is why you have four component values in all the different metrics. The default is [1,1,1,1]. You can also select a
category Field
, which is a parameter in the event, i.e.,
categoryField: "agent"
.
In the example configuration above, if one of the events has a value DBMON, then you use the metric [100,1,1000,1000000] to weight the distance; otherwise, if NETMON, you use the alternate metric [1,100000000,1,0]. If you have neither of these two values, you use default. This allows configuration of different metric weightings for different sources of events.
string_len_cutoff
This determines the maximum number of characters in a component to use in the distance calculation described in the previous section. This cutoff will apply to all string components being used.
For example, if there are occasionally very long descriptions, you can specify a 64character cutoff which will avoid excessive computation. See example below:
string_len_cutoff : 64
spread_cutoff
Whereas the
sig_alert_horizon
is used to take events out of clusters,
spread_cutoff
determines whether or not to consider a cluster to be worth processing.
spread_cutoff : 5.0
 0 means all clusters have to be one hundred percent tight, so the same distance from the center with no variation; otherwise, the cluster will be discarded. A higher number allows for looser clusters, i.e. more variation within the cluster.
The spread cutoff uses the cluster standard deviation, after any outliers have been pruned in accordance with the
sig_alert_horizon
parameter to determine, which clusters should be rejected. 0.0 means that all clusters have to be one hundred percent tight, i.e., with all members matching the cluster centroid. A higher number allows for more loosely correlated clusters. It is worth noting that the metrics chosen for weighting the components can have a direct impact on the standard deviation of the clusters generated, and it may be necessary to increase the spread_cutoff value to reflect this.
ignore_case
When comparing strings, determines if the translation of strings into a number in ‘phase’ space is case sensitive. In general, case should be ignored. See below:
ignore_case : true,
iterations
Unless “Entropy” seeding is specified, the initial seeds for Kmeans clustering includes a random element that will lead to different solutions on different iterations. If more than one iteration is chosen Speedbird will select the best solution of those returned for Situation processing. For higher numbers of iterations, Kmeans clustering will tend to converge on an optimal solution, which in turn leads to lower variance from one Speedbird run to another. Iterations however take both time and CPU resources so a sensible compromise between speed and the optimal solution is needed.
iterations : 5,
 Moogsoft recommends a value of 5
seeding
Seeding can be set to 'Kmpp', 'Lloyd', or 'Entropy'. Both 'Kmpp' (recommended) and' Lloyd' use random elements to select seeds to initialise the clustering process, and therefore have the advantage of finding different cluster solutions over multiple iterations.
Alternatively, 'Entropy' selects the highest entropy Events to seed clusters, and as such, returns the same results on each occasion. It should be noted that this is not necessarily an optimal result.
seeding : "Kmpp",
force_causal
Setting
force_causal
to true ensures that Events which are part of causal Alerts are preserved. They are never discarded during the Kmeans clustering process, but are always returned as a member of a cluster.
The entropy range for causal alerts is defined in the
moogdb.significance
table.
force_causal : true
generate_stats
generate_stats
provides detailed logging useful for tuning purposes. Detailed logging is written at log level WARN to the
moogfarmd.log
file. The logging contains detailed information around event clustering, and also includes information about partitioning.
generate_stats : "true"

If
generate_stats
is not required there is no need to modify an existing moog_farmd.conf files to include the property
Tuning guidelines
To ensure you produce useful results, it is recommended that you read the following in conjunction with description of the configuration parameters :

Disable parameters which remove Alerts from Situations and discard Situations which are poorly correlated. Start with
sig_alert_horizon
set to 0.1 (to prevent any outliers from being pruned) andspread_cutoff
set to a high value (to prevent any clusters from being discarded). Subsequently modify these parameters to reduce Situation size and numbers.

When tuning the system, consider using 'Entropy' seeding and only switching to 'Kmpp' when you are happy with the results. 'Entropy' seeding always produces the same Situations, unlike 'Kmpp' or 'Lloyd', but often not the most appropriate ones. Using 'Entropy' seeding guarantees you can normally run a dataset once to see if the parameters you have used have given you the desired effect. 'Kmpp' seeding usually produces the best Situations with a moderate number of iterations.

The K in Kmeans indicates the number of seeds AIOps clusters around and the number of Situations which are produced. It is calculated using a technique that analyses the dataset to establish the number of independent clusters of events. The calculation is dependent on number of time slices (window), and the effective event rate (after entropy thresholds etc.) which determines the number of unique signatures received in
resolution*window seconds
.
Note
Please note : Moogsoft advises that you start by asking how many tickets/Situations are expected in a day, and you adjust the
resolution/window
parameters to achieve the same number of Situations in a day’s worth of data. The value of k is never greater than thewindow
and the number of unique alerts in the total window, and is often about 80% of this value 
If you are using time and one other component, be prepared to significantly reduce the time metric, as well as, increasing the value of the metric for the other component. For example, assume that you have the following configuration and that you are interested in a series of events that occur over 10 minutes:
components : ["source"] metric : { default : [1,1000000] }
The time spread of the cluster you are interested is 600 seconds. If you have increased the metric a lot on the source component, your cluster may contain a single value for source (or very closely related values). Therefore, the cluster spread value will be generated largely or entirely by the event time component. Kmeans solutions that split this set of events into more than one cluster are preferred over those that keep them in a single cluster. If you use default metrics [1,1], the clustering will mostly be primarily driven by time.

The metrics that you use may affect the
spread_cutoff
. If you increase a metric it may be necessary to increase the spread cutoff by quite a large amount (up to the square root of the increase of the metric).

It is the square root of the metric that is applied to a component. If you increase a metric for a component from 1 to 100, you emphasise the effect of that component on the resulting clusters by a factor of 10.

Do not vary more than one configuration parameter at a time.
 Start with small data sets and limited (i.e., time plus one other) components before increasing the size and, or, complexity of your solution.