Troubleshoot the UI

The following sections outline potential problems and solutions related to the Moogsoft AIOps user interface.

Slow UI

If the system is showing signs of slow UI performance, such as long login times, spinning summary counters, or other, then the problem is likely with Apache Tomcat and/or the database. The following diagnostic steps will help you track down the cause:

Step

Description

Possible cause and resolution

1

Check catalina.out for any obvious errors or warning.

Cause may be evident from any warnings or errors.

2

Check browser console or any errors or timing out requests.

Possibly a bug or more likely that the query to the database associated with the request is taking longer that 30 seconds (the default browser timeout). Investigate the root cause.

3

Check network latency between browser client machine and server using ping.

Latency of =>100ms can make login noticeably slower.

4

Check the CPU/memory usage of the server itself.

If the server, as a whole, is running close to CPU or memory limit and no other issues can be found (e.g. rogue processes or memory leaks in the Moogsoft AIOps components) then consider adding more resource to the server or distributing the Moogsoft AIOps components.

5

Check MoogSvr/Moogpoller/Graze counter logging in catalina.out

Tomcat may be processing a high number of requests or Message Bus updates.

If Moogpoller count is zero then something may be wrong with Tomcat > RabbitMQ connection. Check RabbitMQ admin UI for signs of message queue build-up.

6

Check whether Tomcat java process is showing constant high CPU/memory usage.

Tomcat may be processing the updates from an event or situation storm. Backlog should clear assuming storm subsides.

7

Has the memory of the Tomcat java processed reached a plateau?

Tomcat may have reached its java heap limit. Check the -Xmx setting in /etc/init.d/apache-tomcat.

Increase the -Xmx settings as appropriate and restart the apache-tomcat service.

8

Is the database tuned?

Check the innodb-buffer-pool-size and innodb_buffer_pool_instances settings in /etc/my.cnf as per Tuning section above. Ensure they are set appropriately and restart mysql if changes are made.

9

Check the server for any other high CPU or memory processes or that which might be impacting the database.

Something may be hogging CPU/memory on the server and starving Tomcat of resources.

The Events Analyser utility may be running or a sudden burst of Moogfarmd or Graze activity may be putting pressure on the database and affecting the UI.

10

Run DBPool Diagnostics (see previous section) several times to assess current state of Tomcat > Database connections.

Tomcat database connections may be maxed out with long running connections - this may indicate a processing deadlock - perform a kill -3 <pid> on the Tomcat java process to generate a thread dump (in catalina.out) and send it to Moogsoft AIOps Support.

Alternatively Tomcat may be very busy with lots of short but frequent connections to the database. A Graze request bombardment is another possibility (Graze does not currently have a separate DB Pool). Consider increasing the number DBPool connections for Tomcat by increasing the related properties in servlets.conf and restarting the apache-tomcat service.

11

Turn on MySQL slow query logging (see earlier section on how to do this)

Slow queries from nasty filters in the UI may be causing problems, review them for efficiency.

Alternatively slow queries from other parts of the system may be causing problems (e.g. inefficient Moobot code).

Slow queries may also be down to the sheer amount of data in the system. Consider enabling Database Split to move old data and/or using the Archiver to remove old data.

12

Is Tomcat memory constantly growing over time and a memory leak is suspected?

Note that Tomcat memory does typically increase for periods of time then is trimmed back via java garbage collection.

Take periodic heap dumps from the Tomcat java process and send them to Moogsoft support so they can analyse the growth. Use the following commands:

DUMPFILE=/tmp/tomcat-heapdump-$(date +%s).bin
sudo -u tomcat jmap -dump:format=b,file=$DUMPFILE $(ps -ef|grep java|grep tomcat|awk '{print $2}')
bzip2 $DUMPFILE

Notes:

  • jmap needs Java JDK to be installed. "yum install jdk" should suffice to install this.

  • Generating a heap dump is likely to make the target process very busy for a period of time and also triggers a garbage collection so the memory usage of the process may well reduce.

  • Heap dump files may be very large.

Search / Elasticsearch problems

See Configure Search and Indexing for more information.

Elasticsearch is not running or generating errors (such as MySQL connection problems)
  • Check that the Elasticsearch service is running:

    service elasticsearch status
  • Check errors, they are written to /var/log/elasticsearch/elasticsearch.log.

Elasticsearch does not start or restart using process_cntl
Tomcat cannot connect to Elasticsearch
  • Check /usr/share/apache-tomcat/logs/catalina.out for any errors when attempting a search from the UI.

Cron job errors
  • Check that cron job that runs the moog_indexer (created by the moog_init_search.sh script to re-index against the Moogsoft AIOps database on a once-a-minute basis) exists and is not generating any warnings or errors.

  • List the configured cron jobs:

    crontab -l
  • Check errors, they are written to /var/log/cron.

  • Depending on the intervals at which Elasticsearch re-indexes against the Moogsoft AIOps database, it is possible that new alerts, Situations, threads or comments have not yet been indexed, and so will not be searchable.

  • To change the interval manually:

    crontab -ed
Elasticsearch fails to start with /tmp directory permission problems

Elasticsearch fails to start with "java.lang. UnsatisfiedLinkError: /tmp/jna--<text>" error. For example:

[2017-08-07T14:14:31,173][WARN ][o.e.b.Natives] unable to load JNA native support library, native methods will be disabled.
java.lang.UnsatisfiedLinkError: /tmp/jna--1985354563/jna3872404023206022895.tmp: /tmp/jna--1985354563/jna3872404023206022895.tmp: failed to map segment from shared object: Operation not permitted
   at java.lang.ClassLoader$NativeLibrary.load(Native Method) ~[?:1.8.0_171]
   at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941) ~[?:1.8.0_171]
   at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824) ~[?:1.8.0_171]
   at java.lang.Runtime.load0(Runtime.java:809) ~[?:1.8.0_171]
   at java.lang.System.load(System.java:1086) ~[?:1.8.0_171]
   at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:851) ~[jna-4.2.2.jar:4.2.2 (b0)]
   at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:826) ~[jna-4.2.2.jar:4.2.2 (b0)]
   at com.sun.jna.Native.<clinit>(Native.java:140) ~[jna-4.2.2.jar:4.2.2 (b0)]
   at java.lang.Class.forName0(Native Method) ~[?:1.8.0_171]
   at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_171]
   at org.elasticsearch.bootstrap.Natives.<clinit>(Natives.java:45) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:104) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:203) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.6.9.jar:5.6.9]
   at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.6.9.jar:5.6.9]

This is most likely due to the noexec directive in the /tmp mount. The solution is to remove the noexec directive, if it is practical to do so:

sudo mount /tmp -o remount,exec

Or set the following in /etc/sysconfig/elasticsearch:

ES_JAVA_OPTS="-Djna.tmpdir=/var/lib/elasticsearch/tmp"

Restart the Elasticsearch service after either of the above changes.

Other UI issues

Unavailable UI login page
  • Check that port 443 is not being blocked by the firewall on the server.

  • Check that the Nginx service is running with command:

    service nginx status
  • Check that Nginx is listening on port 443. Example expected output:

    netstat -anp|grep 443
    tcp    0    0 0.0.0.0:443   0.0.0.0:*   LISTEN    42356/nginx         
    tcp    0    0 :::443        :::*        LISTEN    42356/nginx 
UI login fails

A possible error is "You could not be logged in. Please try again". First check /usr/share/apache-tomcat/logs/catalina.out to understand the error better. Possible causes are as follows:

  • Apache Tomcat is not running. Check its status:

    service apache-tomcat status
  • There is a communication problem between the UI and the MySQL database. Check the MySQL service is running:

    service mysqld status

    If MySQL is running on a different server, check that it is accessible from the Moogsoft AIOps web server and the required permissions have been applied.

  • There is an authentication problem between the UI and the MySQL database.

    • Check that the user exists in the MySQL moogdb.users table.

    • Check that the username and password used for authentication are correct.

  • If you're using a load balancer, the hostname in the URL you're using to access the UI does not match the webhost in the servlet configuration files.

    • Set the "webhost" in $MOOGSOFT_HOME/config/servlets.conf on each UI server to the hostname of the load balancer.

"Your connection is not private"

A message appears in your browser "your connection is not private" and you are unable to proceed to the UI.

  • After upgrading macOS to Catalina, the Moogsoft AIOps UI is inaccessible in Chrome, Safari and Edge browsers because self-signed certificates are no longer trusted. For workaround instructions see Catalina Browser Certificate Workaround.

Empty column in alert views

If you are getting an empty column in your alert views for alerts from a particular event source, verify the following:

  • The events are being processed by the LAM.

  • Moogfarmd is running.

After that check the LAM configuration file for configuration and mapping issues.