Skip to main content

Troubleshoot Processes

The following sections include troubleshooting advice for Moogsoft Onprem, moogfarmd, RabbitMQ, LAM, and NGinix, alert, and situation processing issues.

Moogsoft Onprem Does Not Start

Check the following:

  • Check the file system with the command df -m and look for partitions that are full.

  • The environment variables in your shell might not be set up correctly. Run the environment and check the location set for $MOOGSOFT_HOME.

Moogfarmd Does Not Start

The message +|No config present|+ in message in /var/log/moogsoft/moogfarmd.log indicates a syntax error in . Do the following:

  • Check the config file for punctuation mistakes. Look for:

    • Missing commas

    • Unbalanced quotes

    • Missing  {' or '} 

      In this case, use f# to comment out code instead of  */and /* 

  • Edit moog_farmd.conf and then restart the service

RabbitMQ Errors

See also Message System Deployment.

"No such user" Message on Startup

If you see the message No such user in /var/log/rabbitmq/startup_err, do the following:

  • Check /etc/passwd for user rabbitmq with the following command:

    grep rabbitmq/etc/passwd
  • If no user is found, add the following to /etc/passwd:

    rabbitmq:x:491:488:RabbitMQ messaging server:/var/lib/rabbitmq:/bin/bash

"Failed to create aux thread" Message

If you see the following message "Failed to create aux thread" in var/log/rabbitmq/startup_err, this is most likely a ulimit issue for the RabbitMQ user.

Do the following:

  • Check ulimit settings for the RabbitMQ user by running the following command as root:

    su - rabbitmq
    bash-4.1$ ulimit -a
    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 515675
    max locked memory       (kbytes, -l) 64
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 1024
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 10240
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 1024
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited
  • The above example shows ulimit settings that are likely too low for RabbitMQ.

  • As per instructions here it may be appropriate to increase the ulimit settings for "open files" and "max user processes" to at least 4096 for development/QA environments and 65536 for production environments

"Unable to create connection" Message

If you see the message "Unable to create connection" message appearing in a LAM, Moogfarmd, or Apache Tomcat logs, it indicates that the process unable to connect to the Message Bus zone in RabbitMQ.

  • Check that the RabbitMQ server is running:

    service rabbitmq-server status
  • If the service is not running, start it:

    service rabbitmq-server start
  • If the service is running, check that the zone used in Moogsoft Onprem matches the vhost in RabbitMQ. List the zones (vhosts) added in RabbitMQ:

     rabbitmqctl list_vhosts
  • Check the MooMS section in /usr/share/moogsoft/config/system.conf for the zone used in Moogsoft Onprem .

  • If the zone is missing, add the zone (vhost) to RabbitMQ manually (see Message System Troubleshooting).

  • Restart the affected process.

LAMs Do Not Start from Command Line

If LAMs run from the command line or as a service result in the following error:

[root@moogbox2 bin]# ./socket_lam 
./socket_lam: error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory

it may be because /usr/java/jdk1.8.0_171/jre/lib/amd64/server has not been added to the LD_LIBRARY_PATH.

To run the LAMs via a command line, a change the LD_LIBRARY_PATH to be as follows (the default initd files contain this setting):

export LD_LIBRARY_PATH=$MOOGSOFT_HOME/lib:/usr/GNUstep/Local/Library/Libraries:/usr/GNUstep/System/Library/Libraries:$JAVA_HOME/jre/lib/amd64/server

Generic LAM does not Start

"Unable to parse configuration file" Error

Check the following:

  • Unable to parse configuration file" message in /var/log/moogsoft/<lamd_name>.log" indicates a syntax error in the LAM configuration file.

  • Check the config file for syntax mistakes. Look for missing commas and unbalanced quotes.

  • Compare the configuration file to a default configuration file for the same LAM. Use the following command to locate differences in the files:

    diff -y <current_lam>.conf <default_lam>.conf | less
  • Edit <current_lam>.conf. to resolve any syntax errors and restart the LAM.

"Connection refused" Error

Check the following:

  • "Failed to connect to [host:port]: Connection refused" error in /var/log/moogsoft/<lamd_name>.log means that the port specified in the LAM configuration file is already in use.

  • Use the following command to check that the LAM is not already started:

    ps -ef | grep <lamd_name>
  • Check that another process is not already bound to the port.

  • If required, edit the port setting for the LAM in the configuration file and restart the LAM.

"Host unresolvable" Error

Check the following:

  • "Host [hostname] unresolvable" error in /var/log/moogsoft/<lamd_name>.log means that the LAM is unable to resolve the host, or the hostname is incorrectly set.

  • In the LAM config file, check the address property and correct any errors.

  • Check that the /etc/hosts file contains an entry for the specified hostname.

"Failed to open file" Error

Check the following:

  • "Failed to open file [<path to file>] error in /var/log/moogsoft/<lamd_name>.log means that the LAM is unable to locate a file specified in the LAM configuration file.

  • Locate the missing file in $MOOGSOFT_HOME/bots/lambots or $MOOGSOFT_HOME/contrib.

  • Update the LAM configuration file with the correct file path and restart the LAM.

Syntax Error in Presend Filter

Check the following:

  • In the LAM configuration file, locate the presend filter file name.

    You can do this with a JavaScript editor. Check the code to locate any syntax errors.

  • If you have Node.js installed, you can run the following command to locate the incorrect line in the code:

    node $MOOGSOFT_HOME/bots/lambots/<path_to_filter_file>.js
  • Edit $MOOGSOFT_HOME/bots/lambots/<path_to_filter_file>.js to resolve the error and restart the LAM.

Empty Columns in Alert Lists

Check the following:

  • Empty columns in alert lists may indicate incorrect field mapping assignments.

  • Check field mappings at the bottom of the LAM configuration file.

  • Edit the configuration file to properly map the field to the column name and then restart the LAM.

Socket LAM is Not Processing

Check the following:

  • The LAM may be set to Server mode rather than Client mode in the configuration file. For a description of mode types see Socket LAM.

  • Set the mode correctly in the configuration file and restart the LAM.

JSON Feed is Not Processing

Check the following:

  • The JSON string might be incorrectly formatted. For example, event data contains nested JSON.

  • Run the LAM in debug mode and look for nested JSON.

  • Either modify the event data or edit the presend filter to match values in the nested JSON.Either modify the event data or edit the presend filter to match values in the nested JSON.

Logfile LAM Does Not Start

Check the following:

  • In the LAM configuration file, check the target setting and confirm the file path to the target log file.

  • "Could not stat file [-1] error: [Bad file descriptor]" error in /var/log/moogsoft/<lamd_name>.log shows that the Logfile LAM cannot locate the log file to read.

REST LAM Does Not Start

If the REST LAM does not start and you are using SSL, the SSL path might be missing. Do the following:

  • Verify that the following properties are correctly set:

    path_to_ssl_files

    ssl_cert_filename

    ssl_key_filename

  • Verify that the value of the use_ssl property has been set correctly.

  • indicates a missing SSL path in the LAM configuration file./var/log/moogsoft/<lamd_name>.log"No file path specified" error in

    +|No file path specified|+ message in /var/log/moogsoft/<lamd_name>.log

Nginx Fails on Startup if IPv6 not Configured

Comment out the following references in two configuration files:

  1. Go to /etc/nginx/conf.d

  2. Edit out IPv6 references with a hash #:

    Configuration File

    Section

    moog-default.conf

    listen 80 default_server;

    #listen [::]:80 default_server;

    moog-ssl.conf

    listen 443 ssl default_server;

    #listen [::]:443 ssl default_server;

Alert Processing Issues

Do the following:

  • Ensure that the product license has been applied.

  • Check that the RabbitMQ server is running:

    service rabbitmq-server status
  • Check that the "run on startup" setting for Alert Builder in the Moogfarmd configuration file is set to true.

  • Check the Moogfarmd log:

    /var/log/moogsoft/moogfarmd.log
  • Check the Alert Builder Moolet for syntax errors or errors in logic.

  • Check that the LAM is correctly parsing/mapping the data feed.

  • Check that the LAM is not performing any post-event processing that may be filtering out the events in the associated LAMbot.

No Situations Created

Do the following:

  • Ensure that the product license has been applied.

  • Check the Moogfarmd configuration file to ensure that the "run on startup" property for the Sigaliser used is set to true.

  • In the Moogfarmd configuration file, check that the "process output of" setting for the Sigaliser used lists the correct Moolet.

  • Check that the Moolet listed in the "process output of" property is running.

  • If Moolet settings are too restrictive or too open they may not produce Situations.

Alerts in Situations are not being labelled with PRC values

If PRC (Probable Root Cause) is not working as expected, i.e. no RC values are assigned or no PRC model is generated in the database, check the following possible causes:

  • Missing (or inaccessible) fortran/libgfortran3 library - this output will be seen in /var/log/messages

    Dec  8 12:02:25 qatest1 moogfarmd: -- org.jblas ERROR Couldn't load copied link file: java.lang.UnsatisfiedLinkError: /tmp/jblas862842246719293546/libjblas_arch_flavor.so: libgfortran.so.3: cannot open shared object file: Permission denied.
    Dec  8 12:02:25 qatest1 moogfarmd: On Linux 64bit, you need additional support libraries.
    Dec  8 12:02:25 qatest1 moogfarmd: You need to install libgfortran3.
    Dec  8 12:02:25 qatest1 moogfarmd: For example for debian or Ubuntu, type "sudo apt-get install libgfortran3"
    Dec  8 12:02:25 qatest1 moogfarmd: For more information, see https://github.com/mikiobraun/jblas/wiki/Missing-Libraries
    • Solution: Ensure the libgfortran RPM package in installed on the same host that MoogFarmD is running. For Moogsoft Onprem v9.x.x also ensure that the compat-libgfortran-48 RPM package is installed.

  • The /tmp directory has permissions which prevent PRC from running. Specifically, if the /tmp mount has the 'noexec' mount directive, PRC will not work and the following output will be visible when running journalctl -xe:

    Dec 08 14:46:05 aiops1 moogfarmd[3370]: -- org.jblas ERROR Couldn't load copied link file: java.lang.UnsatisfiedLinkError: /tmp/jblas2939805401468156573/libjblas_arch_flavor.so: /tmp/jblas2939805401468156573/libjblas_arch_flavor.so: failed to map segment from shared object: Operation not permitted.
    Dec 08 14:46:05 aiops1 moogfarmd[3370]: On Linux 64bit, you need additional support libraries.
    Dec 08 14:46:05 aiops1 moogfarmd[3370]: You need to install libgfortran3.
    Dec 08 14:46:05 aiops1 moogfarmd[3370]: For example for debian or Ubuntu, type "sudo apt-get install libgfortran3"
    Dec 08 14:46:05 aiops1 moogfarmd[3370]: For more information, see https://github.com/mikiobraun/jblas/wiki/Missing-Libraries
    • Solution 1: Remove the noexec permission from the /tmp mount point

    • Solution 2: Change the 'tmpdir' the JVM uses. For example, create an alternative tmpdir:

      mkdir ${MOOGSOFT_HOME}/tmp
      chown -R moogsoft:moogsoft ${MOOGSOFT_HOME}/tmp

      Then update the MoogFarmD script to use this new path by editing ${MOOGSOFT_HOME}/bin/moog_farmd and add in the -Djava.io.tmpdir=/usr/share/moogsoft/tmp path (change this as appropriate) into the following line:

      #
      # Run app
      #
      $java_vm -server $ADDITIONAL_ARGS -XX:+UseThreadPriorities -Djava.io.tmpdir=/usr/share/moogsoft/tmp -DLOG4J_CONTEXT_SELECTOR="$logger_class_loader" -Dlog4j.configurationFile="$MOOGSOFT_CONFIG_FILE_FOR_LOG" -DMoogsoftLogFilename='/dev/null' -DMoogsoftLogAppender="STDOUT" -DprocName="$proc_name" -DMOOGSOFT_HOME="$MOOGSOFT_HOME" -DproductName="$product_name" -classpath $java_classpath $java_main_class "$@" <&0

      Finally, restart the MoogFarmD process

  • If neither of the above errors are seen, check if the /tmp folder has the noexec flag set and if so, follow the instructions above