Moogsoft Docs

Self Monitoring

Administrators can use Self Monitoring to view the status, health and processing metrics of the Moogsoft AIOps processes.

The different tabs show the state of Processing Metrics, Event Processing, Web Services, Event Ingestion and Message Bus.

Note

Please note : All data displayed in this screen is live and updates continually.

Note

Important : Heartbeats are one of the key concepts in Self Monitoring. A heartbeat is an internal message sent by a process every 10 seconds to inform Self Monitoring that it is still running

States

The states of the different tabs and their components are indicated by the icons shown in the table below:

Icon Description

The process is running (reserved or unreserved*)

The reserved process has missed some heartbeats. This could indicate a potential problem and should be investigated

The reserved process is either not running or has missed its last heartbeat. This could indicate the process has failed, has not started or that AIOps is not working properly

The unreserved process is not running

The process is in passive mode.

Warning

This is for High Availability deployments only. See High Availability .

Note

* Please note : Processes can be set to reserved or unreserved in the system.conf file ($MOOGSOFT_HOME/config/system.conf). If 'reserved' setting is 'true', a warning will appear in Self Monitoring if they are not running. Unreserved processes will not produce a warning when it is not running

Controls

There are a number of controls which appear in Self Monitoring can be used to stop. start and restart moogfarmd and the LAM services. These are as follows:

Button Description

Restart

Stop* (see note below)

Start

These can be configured by users with Super User permissions.

Warning

*Please note : Stop will only work if moogfarmd is started and running as a process rather than a service

Self Monitoring Tabs

The Self Monitoring screen is divided into five tabs. Each section will display the states of the various processes, indicating which are running or which have issues:

Processing Metrics

This tab, which is open by default when Self Monitoring is launched, displays Event processing times and other metrics.


The icon in the top left corner indicates the overall state of Event processing, this is determined by the Current Maximum Event Processing Time in seconds.

This time is indicated by the position of the gray bar on the colored bullet graph shown below. The Current Maximum Event Processing Time is 1.917s in this example:

The default bullet chart color values are as follows:

GREEN (0 - 10 seconds) Good performance

YELLOW (10 - 15 seconds) Marginal performance

RED (15 - 20 seconds) Poor performance

The time values are configurable in the web.conf file.

Using Processing Metrics

To use the Processing Metrics tab, open the LAMs and moog_farmd folders and look for deviations from normal values.

The numeric value itself may not be an absolute measurement of health, so as a general rule, look for unusual or sudden changes in the values or behaviour. See the examples below:

  • If a particular LAM becomes a data flow bottleneck, expect to see substantial increases in the values for the Message Queue Size and/or Socket Backlog metrics for that LAM. This leads to an increasing Event Processing Time for the appropriate moog_farmd (which is expecting data from the LAM)
  • If an AlertRulesEngine in a moog_farmd instance becomes a data flow bottleneck, expect to see a substantial increase in the Message Backlog and possibly the Messages Processed decreasing for that AlertRulesEngine. This also leads to an increasing Event Processing Time for the moog_farmd

Both of these result in the bullet chart (at the top) showing increasing Current Maximum Event Processing Time , from green to yellow to red.

Event Processing

This tab contains a process group including moog_farmd (the core Moogsoft AIOps application) and the Moolets (AlertBuilder, Alert Rules Enginer, Sigalisers etc)


The icon in the top left corner indicates the overall state (running normally in the example above).

The group and cluster names are displayed in the top right corner. The time and date of the last heartbeat is displayed above the list of Moolet processes.

Web Services

This tab contains all processes related to Tomcat web applications: moogsvr, moogpoller, toolrunner and Graze.

Each row will display the following information:

Column Description

Click this button to expand or collapse the row for further information. E.g 'No reported problems'
State This will show an indicator icon showing whether the process is running as normal or not
Process The name of the Moogsoft AIOps component
*Instance The name of the instance (in High Availability there are multiple instances of Moogsoft AIOps)
*Group The name of the Process Group the component belongs to
*Cluster The name of the Cluster the component's Process Group belongs to

Last Heartbeat

The time of the last received heartbeat. A heartbeat indicates a health component

Warning

*These only apply to High Availability deployments where there are more than one instance of Moogsoft AIOps and its component processes

Event Ingestion

This tab displays information about the state of all processes relating to the LAMs and the individual processes which process raw data and create Events.


The controls in the far right column can be used to stop and restart active LAM processes or to start inactive LAMs.

Message Bus

The final tab provides a link to the Message Bus Console, also known as the MooMs (Moogsoft Messaging System). This is hosted by message-queueing software RabbitMQ.


Click the link to proceed to the RabbitMQ management console.

Warning

Please note : The username and password to login are specified and can be configured in $MOOGSOFT_HOME/config/system.conf (under mooms.username and mooms.password in the JSON) & correspondingly in RabbitMQ. For more information see Message Bus

Once logged in, RabbitMQ will display information about message rates, connections, channels and queued messages etc.

Configuration

The 'Restart/Stop/Start' feature uses the moogfarmd/LAM service scripts under /etc/init.d (eg: /etc/init.d/moogfarmd and /etc/init.d/logfilelamd ) in combination with the apache-tomcat 'toolrunner'.

Note

Please note : You will need Super User role permissions to configure this feature of Moogsoft AIOps

A user needs to be present on the system which is in the 'moogsoft' group and this is the user that must be used by the toolrunner and the services in order to start/stop services via the UI. e.g.:

  • /etc/init.d/moogfarmd - PROCESS_OWNER set to 'controluser'
  • $APPSERVER_HOME/webapps/toolrunner/WEB-INF/web.xml - toolrunneruser set to 'controluser' (toolrunnerpassword needs to be the password for that user)

Warning

We would recommend that you do not use the default 'moogsoft' user because that is a system user and does not allow you to login using password

Update the /etc/init.d/ service scripts to have the correct:

  • SERVICE_NAME (to make the services unique)
  • PROCESS_OWNER (must be the same user as the toolrunner user)
  • INSTANCE/CLUSTER/GROUP (unless already configured via relevant the LAM/farmd/system.conf config file). These need to be provided to the 'daemon' lines as command line parameters. e.g.: --instance MY_INSTANCE --group MY_GROUP --cluster MY_CLUSTER

Add the name of the service script into the 'service_name' field in $MOOGSOFT_HOME/config/system.conf for that Moogsoft AIOps process. To ensure the service appears in the right Self Monitoring tab, the process_type field must be set. See the default system.conf file for examples.

Note

Please note : If a moogfarmd service or LAM service is run that does not match a configuration block in system.conf/'processes', then it will still appear within the UI 'Self Monitoring' dialog, but it will not be possible to start/stop/restart the service

The 'toolrunner' is used to control the services (requires configuring $APPSERVER_HOME/webapps/toolrunner/WEB-INF/web.xml) :

  • The 'toolrunneruser' must match the PROCESS_OWNER specified within the relevant service script. This is because only root can run services as a different user
  • The 'toolrunnerpassword' must be the password of the 'toolrunneruser'
  • The 'toolrunnerhost' value must match the host of the machine which contains the moogfarmd/LAM services and the PROCESS_OWNER user

It is more likely that an existing LAM/farmd service will have been run already in upgrade scenarios. If the service is one which needs to be controlled via the UI, then the service log file and PID (if present) need to be 'chowned' to the new service script PROCESS_OWNER/toolrunner user before it will work. e.g.:

chown toolrunneruser /var/log/moogsoft/moogfarmd.log

See the example of a $MOOGSOFT_HOME/config/system.conf file below:

# group REQUIRED - The group name for the process
# instance REQUIRED - The instance name for the process
# display_name OPTIONAL - An identification label used in the UI, if no
# display name is configured, group will be used
# cluster OPTIONAL - Overrides the default cluster for a process, if
# not configured the default cluster will be used
# service_name OPTIONAL - The name of the service script used to
# control this process, if not configured this will
# be guessed using the group name by removing underscores
# and appending a 'd'
# process_type OPTIONAL - The type of the process, possible values:
# moog_farmd,
# servlet,
# LAM
# If not configured, this will be guessed from the group
# name based on our knowledge of which process names are
# used for which types of process
# reserved OPTIONAL - If a process is reserved, we will produce a
# warning in the UI when it is not running.
# Unreserved processes will not produce a warning
# if they are not running. Defaults to true if not configured
# subcomponents OPTIONAL - Specifies which moolets should be reserved for
# a moog_farmd process, if left unconfigured no moolets will be
# reserved
#
# Moog_farmd
{
"group" : "moog_farmd",
"instance" : "",
"service_name" : "moogfarmd",
"process_type" : "moog_farmd",
"reserved" : true,
"subcomponents" : [
"AlertBuilder",
"Sigaliser",
"Default Cookbook",
"Journaller",
"TeamsMgr"
#"AlertRulesEngine",
#"SituationMgr",
#"Notifier"
]
},