# Moogsoft Docs

## Speedbird

Speedbird groups events related to an actionable outage into clusters of their related Alerts. These clusters are service impacting, with the group of ‘clustered’ Alerts providing operational value to someone using the system.

Speedbird allows you to configure a set of parameters of an Event to drive the clustering in addition to time. For example, you may want a group of Alerts and Events together that have a co-incidence in time, but also have a coincidence in another value of the Event, such as, the hostname. Speedbird allows you to create clusters of Alerts with a similar hostname that have also occurred at a similar time.

## Speedbird's Algorithm

The algorithmic technique used by SpeedBird is based around K-means, which is a well-understood and traditional clustering algorithm that is a form of unsupervised machine learning. For the SpeedBird Moolet, Moogsoft AIOps uses some of the same algorithmic tool chain that is used in the Sigaliser along with the K-means algorithm. For instance, AIOps still uses the same time based determination of how many real clusters there are in the data at a given point in time. Non-negative matrix factorisation in the limit collapses into a K-means calculation, but is more computationally efficient.

## Configuration

To configure SpeedBird, the following should be read in conjunction with the Tuning guidelines , to enable you to produce optimal results.

#### sig_resolution

In moog_farmd.conf , there is a general  sig_resolution  parameter grouping before the Moolet definitions with the following parameters:

            sig_resolution :
{
sig_similarity_limit : 0.7
},
• These parameters are set for all Sigalisers whether it is SpeedBird or the traditional Sigaliser running in a given farmd. The  sig_resolution  parameters allow you to compare pre-existing Situations and determine if it is an evolution of an existing Situation, or, a new Situation

#### Moolet and Algorithm

The parameter groups Moolet and Algorithm function in the same way as those in the existing Sigaliser.

            # Moolet
name : "Speedbird",
classname : "CSpeedbird",
run_on_startup : false,

# Algorithm
time_compression : true,
scale_by_severity : true,
entropy_threshold : 0.35,

For further information on these parameters, see the table below:

Parameter Input Description Example
name
- The name of the Sigaliser
Speedbird
classname
-

The classname of the Sigaliser

### Warning

This is hardcoded and should never be changed

CSpeedbird
run_on_startup

Boolean

If enabled, an instance of the Sigaliser will be created when moog_farmd starts.

This is disabled by default

false
process_output_of

This sets whether the Sigaliser processes the output of either the Alert Builder or Alert Rules Engine.

The latter can only be used if automations are desired prior to the Situation resolution

### Note

Please note : The Sigaliser can only have one input

AlertBuilder
time_compression
Boolean

If enabled the Sigaliser will ignore empty time buckets. If disabled, it will include empty time buckets

### Note

Please note : For low data rates you should set this to 'true', for normal data rates set this to 'false'

true
scale_by_severity
Boolean If enabled, high severity Alerts are treated as having higher entropy. This scaling is done as the severity constant number divided by the maximum severity (5)
true
entropy_threshold
Integer

The value of this parameter is the minimum entropy that the Alert must have to be included in the Sigaliser calculation. Any Alert that arrives at the Sigaliser with a lower entropy than this value will not be included in Situations.

### Note

Please note : The default value of 0.0 means every Alert will be processed by the Sigaliser

0.35

The  sig_alert_horizon  parameter allows you to prune clusters. The value allows you to control when you remove outlying Events from the cluster:

• If the value is less than <0.0, no pruning is undertaken.
• At 0.0, members that are further than one standard deviation from the centroid of the cluster are eliminated.
• At more than > 0.0 the standard deviation is multiplied by  sig_alert_horizon  , and then members further than mean  + sig_alert_horizon*std_dev  distance from centroid are eliminated.
sig_alert_horizon : 0.0,

Every cluster has a centroid, which is the average point in the middle of a cluster.

In the diagram above there are three points in a defined cluster (X), and the centroid (C), which is not a real point in this space phase but represents the center of the cluster. You compute the distance of each point from the centroid of the cluster, which results in an average distance and standard deviation. You can then work out the standard deviation to determine the spread of the cluster. A low standard deviation, i.e., 0, means all of the points are the same distance from the centroid; whereas, a high standard deviation means they are a highly variable distance from the centroid thus indicating a random cluster.

#### components

You can choose which parameters of an Event are used by the clustering algorithm. In the following example, "source", "source_id", "description" are declared:

            components : [ "source","source_id","description" ]

Additionally, the system always takes into consideration the time that the Event arrives in the system (  event_time or last_occurred  for an Alert). You can have as many components as you like, but, the more components that are selected, the greater numerical complexity is introduced into the system, and there is a chance you will get a smaller number of Alerts per cluster and less correlation.

#### Partitioning

There are two methods of partitioning the data into Situations. The first is 'partition_by' which splits the clusters according to the parameters specified. The second is 'pre_partition', which splits the incoming Event stream before clustering.

### Note

Please note : Pre-partitioning is recommended as it does not interfere with the results of the clustering algorithms

##### partition_by

After clustering has taken place and before you enter merging and resolution, you can split clusters into sub-clusters based on a component of the Events. For example, you can use the  manager  parameter to ensure the Situations only contain Events from the same manager. In general, and by default, you should comment out the  partition_by  parameter.

##### pre_partition

An alternative way of partitioning is to use  pre_partition  which allows you to specify a component field (from the list of specified components) around which the Event stream will be partitioned before the K-means clustering occurs. The Alerts in the resulting Situations will each contain a single value for the component field chosen.

For example, if the SpeedBird  component  option was set to:

            components : [ "source","manager","description" ],

In the  metric  below, the description component is being weighted more heavily compared to source and manager. Please note that the metric always contains one more values than the components specified and that the first value always corresponds to time.

            default : [ 1,1,1,1000000],

This results in Situations containing Alerts with more similar  description  fields and a variety of  source  and  manager  fields.

Adding the following property ensures that Situations contain Alerts with very similar  description  fields, a variety of  source  fields but only a single distinct  manager  field.

                                pre-partition : "manager"

 pre_partition  , like  partition_by  , is defaulted to false in moog_farmd.conf so has no effect. If  pre_partition  is not required there is no need to modify the existing moog_farmd.conf files to include the property.

It is possible to configure  pre_partition  and  partition_by  at the same time, but the  partition_by  parameter will only have any effect if it is applied to a different component.

##### A note on time_compression and pre_partition

 pre_partition  splits the Events into separate streams based on the component you have specified, as opposed to  partition_by  , which allows the algorithms to work on the whole Event stream and then splits up the results.

Partitioning the Event stream using pre_partition can make time_compression less effective. There are many things in the tuning parameters and behaviours of the Sigalisers that depend upon the event rate, and because you are splitting the stream up, if you have an event rate of X and you split it into many streams, each of those streams is going to have an event rate of less than X. This can skew whether the tuning parameters you are using are appropriate, so with or without  time_compression  you should be careful. With  time_compression  , you expect to avoid silent moments in the Event stream, but this may not be the case because the effect of  pre_partition  is to split the stream.

For example, if you  pre_partition  on  manager  , set  time_compression  to  true  , and set  window  to  10  and  resolution  to  60  , you will store up to 10 one-minute wide buckets of Events for clustering.

The Events could arrive as follows:

Bucket

Minute

Manager

1

1

Andrew, Alan

2

2

Alan

3

17

Alan

4

18

Alan

5

20

Andrew

6

35

Alan

7

37

Alan

8

38

Alan

9

57

Alan

10

59

Alan

11

60

Alan

It should be noted that the minute 1 bucket will be dropped from the Sigaliser window because AIOps only keeps the last ten live buckets. Clustering for Events with Manager Alan will only use nine buckets, and clustering for Events with Manager Andrew will only use 1 bucket.

#### metric

            metric :   {
default : [ 1,1,1,1],
categoryField: "agent",
"DBMON" : [ 100,1,1000,1000000],
"NETMON" : [ 1,100000000,1,0]
},

The metric is a technical and detailed area of configuration, which relates to how Moogsoft measures distance between two events in the phase space used for clustering. Euclidean distance is easy to compute as you calculate the square of the differences in the components (in two dimensions the distance is the hypotenuse of a right-angled triangle, in three dimensions it is the diagonal measurement of a cuboid, and so on...) add them all up and this reveals the square of the distance. This example is a simplification.

For instance, if you have x, y and z as the components of a vector, the square root of the distance is:

You can put a number in front of these sums of squares, and the values are more correctly known as the diagonal metric tensor values. Moogsoft assumes that you should only ever consider the diagonal metric tensor values; however, in general co-ordinate geometry you can contribute to the distance by adding in, for example, (y-z)2 . It is not considered useful to compare different attributes of an event for similarity.

This approach allows you to weight the distance between two events based upon their components. For example, if X represents time, Y represents source and Z represents manager, and you make a2 much bigger than a1. Any distance in source creates a lot more distance between the events than the same distance in time. This allows you to weight the importance. This is why you have four component values in all the different metrics. The default is [1,1,1,1]. You can also select a  category Field  , which is a parameter in the event, i.e.,  categoryField: "agent"  .

In the example configuration above, if one of the events has a value DBMON, then you use the metric [100,1,1000,1000000] to weight the distance; otherwise, if NETMON, you use the alternate metric [1,100000000,1,0]. If you have neither of these two values, you use default. This allows configuration of different metric weightings for different sources of events.

#### string_len_cutoff

This determines the maximum number of characters in a component to use in the distance calculation described in the previous section. This cutoff will apply to all string components being used.

For example, if there are occasionally very long descriptions, you can specify a 64-character cutoff which will avoid excessive computation. See example below:

            string_len_cutoff : 64

Whereas the  sig_alert_horizon  is used to take events out of clusters,  spread_cutoff  determines whether or not to consider a cluster to be worth processing.

spread_cutoff : 5.0
• 0 means all clusters have to be one hundred percent tight, so the same distance from the center with no variation; otherwise, the cluster will be discarded. A higher number allows for looser clusters, i.e. more variation within the cluster.

The spread cutoff uses the cluster standard deviation, after any outliers have been pruned in accordance with the  sig_alert_horizon  parameter to determine, which clusters should be rejected. 0.0 means that all clusters have to be one hundred percent tight, i.e., with all members matching the cluster centroid. A higher number allows for more loosely correlated clusters. It is worth noting that the metrics chosen for weighting the components can have a direct impact on the standard deviation of the clusters generated, and it may be necessary to increase the spread_cutoff value to reflect this.

#### ignore_case

When comparing strings, determines if the translation of strings into a number in ‘phase’ space is case sensitive. In general, case should be ignored. See below:

            ignore_case : true,

#### iterations

Unless “Entropy” seeding is specified, the initial seeds for K-means clustering includes a random element that will lead to different solutions on different iterations. If more than one iteration is chosen Speedbird will select the best solution of those returned for Situation processing. For higher numbers of iterations, K-means clustering will tend to converge on an optimal solution, which in turn leads to lower variance from one Speedbird run to another. Iterations however take both time and CPU resources so a sensible compromise between speed and the optimal solution is needed.

            iterations : 5,
• Moogsoft recommends a value of 5

#### seeding

Seeding can be set to 'Kmpp', 'Lloyd', or 'Entropy'. Both 'Kmpp' (recommended) and' Lloyd' use random elements to select seeds to initialise the clustering process, and therefore have the advantage of finding different cluster solutions over multiple iterations.

Alternatively, 'Entropy' selects the highest entropy Events to seed clusters, and as such, returns the same results on each occasion. It should be noted that this is not necessarily an optimal result.

            seeding : "Kmpp",

#### force_causal

Setting  force_causal  to true ensures that Events which are part of causal Alerts  are preserved. They are never discarded during the K-means clustering process, but are always returned as a member of a cluster.

The entropy range for causal alerts is defined in the  moogdb.significance  table.

			force_causal : true

#### generate_stats

 generate_stats  provides detailed logging useful for tuning purposes. Detailed logging is written at log level WARN to the moogfarmd.log file. The logging contains detailed information around event clustering, and also includes information about partitioning.

            generate_stats : "true"
• If  generate_stats  is not required there is no need to modify an existing moog_farmd.conf files to include the property

## Tuning guidelines

To ensure you produce useful results, it is recommended that you read the following in conjunction with description of the configuration parameters :

1. Disable parameters which remove Alerts from Situations and discard Situations which are poorly correlated. Start with  sig_alert_horizon  set to -0.1 (to prevent any outliers from being pruned) and  spread_cutoff  set to a high value (to prevent any clusters from being discarded). Subsequently modify these parameters to reduce Situation size and numbers.

2. When tuning the system, consider using 'Entropy' seeding and only switching to 'Kmpp' when you are happy with the results. 'Entropy' seeding always produces the same Situations, unlike 'Kmpp' or 'Lloyd', but often not the most appropriate ones. Using 'Entropy' seeding guarantees you can normally run a dataset once to see if the parameters you have used have given you the desired effect. 'Kmpp' seeding usually produces the best Situations with a moderate number of iterations.

3. The K in K-means indicates the number of seeds AIOps clusters around and the number of Situations which are produced. It is calculated using a technique that analyses the dataset to establish the number of independent clusters of events. The calculation is dependent on number of time slices (window), and the effective event rate (after entropy thresholds etc.) which determines the number of unique signatures received in  resolution*window seconds  .

### Note

Please note : Moogsoft advises that you start by asking how many tickets/Situations are expected in a day, and you adjust the  resolution/window  parameters to achieve the same number of Situations in a day’s worth of data. The value of k is never greater than the  window  and the number of unique alerts in the total window, and is often about 80% of this value

4. If you are using time and one other component, be prepared to significantly reduce the time metric, as well as, increasing the value of the metric for the other component. For example, assume that you have the following configuration and that you are interested in a series of events that occur over 10 minutes:

       components : ["source"]
metric : {
default : [1,1000000]
}

The time spread of the cluster you are interested is 600 seconds. If you have increased the metric a lot on the source component, your cluster may contain a single value for source (or very closely related values). Therefore, the cluster spread value will be generated largely or entirely by the event time component. K-means solutions that split this set of events into more than one cluster are preferred over those that keep them in a single cluster. If you use default metrics [1,1], the clustering will mostly be primarily driven by time.

5. The metrics that you use may affect the  spread_cutoff  . If you increase a metric it may be necessary to increase the spread cut-off by quite a large amount (up to the square root of the increase of the metric).

6. It is the square root of the metric that is applied to a component. If you increase a metric for a component from 1 to 100, you emphasise the effect of that component on the resulting clusters by a factor of 10.

7. Do not vary more than one configuration parameter at a time.

8. Start with small data sets and limited (i.e., time plus one other) components before increasing the size and, or, complexity of your solution.