Troubleshoot Slow Alert/Situation Creation

If the system is showing signs of latency in alert or Situation creation then the problem is likely with Moogfarmd and/or the database. The following diagnostic steps will help you track down the cause:

Step

Description

Possible Cause and Resolution

1

Check the Moogfarmd log for any obvious errors or warning.

Cause may be evident from any warnings or errors.

2

Check the Self Monitoring > Processing Metrics Page

If the event_process_metric is large and/or increasing then something is backing up.

Check Moogfarmd health logging also for sign of message_queue build-up in any of the Moolets.

3

Check the CPU/memory usage of the server itself.

If the server, as a whole, is running close to CPU or memory limit and no other issues can be found (e.g. rogue processes or memory leaks in the Moogsoft AIOps components) then consider adding more resource to the server or distributing the Moogsoft AIOps components.

4

Check whether the Moogfarmd java process is showing constant high CPU/memory usage.

Moogfarmd may be processing an event or Situation storm.

Check Moogfarmd health logging also for sign of message_queue build-up in any of the Moolets. Backlog should clear assuming storm subsides.

5

Has the memory of the Moogfarmd java processed reached a plateau?

Moogfarmd may have reached its java heap limit. Check the -Xmx settings of Moogfarmd. If not specified has Moogfarmd reached approximately a quarter of the RAM on the server? Increase the -Xmx settings as appropriate and restart the Moogfarmd service.

6

Is the database tuned?

Check the innodb-buffer-pool-size and innodb_buffer_pool_instances settings in /etc/my.cnf as per Tuning section above. Ensure they are set appropriately and restart mysql if changes are made.

7

Check the server for any other high CPU or memory processes or that which might be impacting the database.

Something may be hogging CPU/memory on the server and starving Moogfarmd of resources.

The events_analyser utility may be running or a sudden burst of UI or Graze activity may be putting pressure on the database and affecting Moogfarmd.

8

Run DBPool Diagnostics (see previous section) several times to assess current state of Moogfarmd to database connections.

Moogfarmd database connections may be maxed out with long running connections - this may indicate a processing deadlock - perform a kill -3 <pid> on the Moogfarmd java process to generate a thread dump (in the Moogfarmd log) and send it to Moogsoft Support.

Alternatively Moogfarmd may be very busy with lots of short but frequent connections to the database. Consider increasing the number DBPool connections for Moogfarmd by increasing the top-level "threads" property in the Moogfarmd configuration file and restarting the Moogfarmd service.

9

Turn on MySQL slow query logging (see earlier section on how to do this)

Slow queries from a Moobot in Moogfarmd may be causing problems and they should be reviewed for efficiency.

Alternatively slow queries from other parts of the system may be causing problems (e.g. nasty UI filters).

Slow queries may also be down to the sheer amount of data in the system. Consider enabling Database Split to move old data and/or using the Archiver to remove old data.

10

Check Moogfarmd Situation resolution logging using:

grep "Resolve has been running for" /var/log/moogsoft/moogfarmd.log

If this logging shows non-zero upward trend in "Resolve" time then Moogfarmd is struggling with the number of "in memory" Situations for its calculations.

Check the Moogfarmd health logging for the current count of "in memory" situations and consider reducing the retention_period setting in the Moogfarmd log (will need a Moogfarmd restart) and/or closing more old Situations.

11

Is Moogfarmd memory constantly growing over time and a memory leak is suspected?

Note that Moogfarmd memory does typically increase for periods of time then is trimmed back via Java garbage collection and Sigaliser memory purge (via retention_period property).

Take periodic heap dumps from the Moogfarmd java process and send them to Moogsoft Support so they can analyse the growth. Use the following commands:

DUMPFILE=/tmp/farmd-heapdump-$(date +%s).bin
sudo -u moogsoft jmap -dump:format=b,file=$DUMPFILE $(ps -ef|grep java|grep moog_farmd|awk '{print $2}')
bzip2 $DUMPFILE

Notes:

  • jmap needs java jdk to be installed. "yum install jdk" should suffice to install this.

  • generating a heap dump is like to make the target process very busy for a period of time and also triggers a garbage collection so the memory usage of the process may well reduce.

  • heapdump files may be very large.