High Availability Installation Lab
At the end of this lab, you will know how to:
-
Install Moogsoft AIOps on a single host as the root user.
-
Initialize, start, and stop Moogsoft AIOps services, and find and review log files.
-
Configure core processing and message broker software to provide redundancy and failover.
-
Set up two virtual Moogsoft AIOps instances to work together as a highly available (HA) pair.
Prerequisites
-
You have access to a Moogsoft AIOps license and Speedy download credentials through your job.
-
You are familiar with networking concepts and enterprise IT operations.
-
You are comfortable using Linux and Javascript Object Notation (JSON).
-
You are familiar with Moogsoft AIOps operations and system architecture.
Requesting Lab Instances
Moogsoft customers and partners can request lab instances here. After filling out the form, you will receive an email from Moogsoft University with access information for the pair of virtual machines you will need to complete the lab. Keep in mind that it may take up to two days before you get your lab instances.
Obtaining a Moogsoft AIOps License
This lab requires a valid Moogsoft AIOps license. If you are a Moogsoft partner and do not have a valid license, contact Sales Operations with your name, organization, and email address. We will provide you with a 30-day license to work on the lab.
Scenario
You have just started working for a Moogsoft partner which provides service assurance solutions for large enterprise customers. It’s your fourth day on the job. You’ve met your colleagues, signed up for benefits, and spent a day and a half learning about Moogsoft AIOps by watching Moogsoft University videos. You are justifiably proud of your Linux and IT skills, but some of the Moogsoft proprietary technologies and open source components are new to you.
Today you and your boss have agreed that it’s time for you to get hands-on with the product so that you can make the most of your onboarding time before joining a customer implementation team. She has asked you to do a practice installation of Moogsoft AIOps on two virtual hosts and configure them as a highly available, redundant pair. Your installation won’t be as complex as at a typical distributed customer site, but if you succeed with your HA implementation you will know you are off to a good start in your new job.
Before You Begin
To complete this lab, you will draw heavily on the Moogsoft AIOps documentation, specifically the Implementer Guide. Start by scanning the entire guide, and then read through these areas of focus:
Then, review the control utility instructions from the documentation for RabbitMQ, the open source message broker which drives the Moogsoft AIOps message bus. You may wish to scan materials about RabbitMQ clustering and queue mirroring as well.
Take note of the following points from the HA Reference Architecture diagram and the documentation:
-
Different components of Moogsoft AIOps—for example, the user interface; the database; the combined core processing, message bus, and search facilities; and the data adapters—can be installed on different virtual or physical machines.
-
You can set up Moogsoft AIOps to be highly available so that, depending on the component, each component is either active in a redundant group or configured to switch from passive to active if its counterpart fails. You can also set up a backup disaster recovery site, but it won’t automatically come online if the primary site fails.
-
Moogsoft recommends using a load balancer to distribute network traffic across multiple machines. A load balancer can improve responsiveness and also allows you to add machines to maintain performance as demand increases.
-
Moogsoft AIOps uses Hazelcast to provide in-memory data persistence across core processing services.
-
Some third-party components, specifically the message broker (RabbitMQ), the search engine (Elasticsearch), and the database (Percona Server for MySql) need three running instances to avoid 'split brain.' Thus the minimum size for a fully fault tolerant Moogsoft AIOps system is three servers.
-
How you configure high availability varies by component. Configure Moogsoft core processing and user interface for HA by editing system configuration files, and configure other components separately according to their own requirements. After components are configured and initialized, you can use control scripts to examine HA status and modify failover behavior.
-
For Moogsoft core processing components, "cluster" refers to the vertical collection of components that make up a complete Moogsoft AIOps system, so that an HA pair comprises two clusters. Each cluster can be distributed across several hosts. The Moogsoft HA topology is further defined by groups (similar components linked to exhibit failover behavior) and instances (individual running components).
-
For the RabbitMQ message broker, "cluster" is a horizontal term analogous to group. It refers to peer nodes which share information.
Check your understanding:
-
For Moogsoft AIOps, what is the difference between a cluster and a host?
-
Which third-party components need to be on the same host as the Moogsoft core processing system?
About the Lab
For simplicity, you won't be doing a fully fault-tolerant distributed installation, where you install, configure, and initialize components separately. Instead you'll install two Moogsoft AIOps clusters, each on a single host, and then configure them for automatic failover. Here are some of the other differences between the lab HA pair installation and a fully fault tolerant system:
-
You won't set up a third 'redundancy' server for messaging, search, and the database.
-
You’ll use only one database to support both clusters.
-
You won’t use a load balancer. Instead of a single UI served by multiple back end components, each cluster will have its own active UI.
-
You won’t configure high availability for search or data ingestion.
Despite these simplifications, this lab is still time consuming. You’ll be doing a large number of interdependent operations step-by-step, and even when each step is simple it’s hard to avoid errors when many steps are chained together. You might find you have to repeat parts of the lab several times to get the desired results, or even start over from the beginning. (It might help to think of it like a video game with levels and lives.)
Most of the lab steps give general instructions with specific code or settings under an "Example" heading. If you are comfortable with Linux try to do the steps on your own when you can, and refer to the "Example" sections when you need help or to check your work.
At the beginning of the lab, you’ll be following instructions in the documentation to install Moogsoft AIOps on a single host. Later you will be editing system configuration files in JSON format. It's easy to make mistakes when editing JSON, because a misplaced comma or curly brace can invalidate the structure of an entire file. Be careful and patient as you work on editing the system settings.
While you work on the lab, ask yourself:
-
How can I test that each step worked?
-
When something goes wrong, what should I do? Should I
-
inspect my work for an error,
-
review log output,
-
do research in the documentation,
-
repeat the last few steps, or
-
all of the above?
-
-
How would I change the installation design to
-
increase reliability?
-
respond to performance demands?
-
As you gain product fluency, you’ll be able to move away from step by step instructions. With experience you will gain the ability to design and install distributed systems with varying topologies and HA settings.
Log Into Your Lab Instances
You have been given URLs, usernames, and passwords for two Amazon Web Services (AWS) virtual machines running Centos Linux version 7. You will also need a Moogsoft AIOps license and Speedy download credentials to complete the lab.
The web addresses for the AWS virtual machines in the examples are 'ec2-18-232-176-118.compute-1.amazonaws.com' and 'ec2-54-82-68-91.compute-1.amazonaws.com'. The corresponding Linux hostnames are 'ip-172-31-19-254.ec2.internal' and 'ip-172-31-16-221.ec2.internal'. The AWS nomenclature is unwieldy, but note that the DNS names include the public IP address and the hostnames include the internal IP address.
The example username for both machines is 'bailey'. Substitute the URLs, hostnames, and credentials for your own instances in the examples.
-
Make sure that you can log into each machine using a terminal program.
-
Once you are logged in, switch to the root user. You might also want to install your preferred text editor, for example Vim. Repeat these steps for both instances.
-
Check the hostnames and make a note of the internal IP addresses (formatted like 172.31.19.254). You will need them when you install the Percona database software.
Do Single-host Installations
Install an independent Moogsoft AIOps system on each of your virtual AWS
machines by working through the following sections in the documentation. To
save time, you can copy and paste code snippets into your terminal program.
If you’re using the Vim editor, enter the :set paste
command before pasting multiple lines to
avoid formatting problems.
You’ll be using the Yum package utility to install Moogsoft and open source components in RPM format from online repositories. Once you’ve successfully installed one system, repeat the steps to install the second. Doing this manually is tedious, but it is good practice.
-
Download and install open source packages that are used by Moogsoft AIOps. Create
.repo
files to identify URLs so that Yum can access RabbitMQ, ElasticSearch, Nginx, and Moogsoft AIOps packages online. For the Moogsoft AIOps.repo
file, you will need to include your Speedy login and password in the base URL. Follow the instructions for a single-node installation to install Percona database software. Change SELinux to permissive mode to relax access controls for the installation. -
Use Yum to install Moogsoft AIOps. The software is distributed in nine RPM package files: database, common, data integrations, UI-based data integrations, the message bus, search, server, user interface, and utilities.
After installing Moogsoft AIOps, set the environment variables and update the path, and then start the system with default settings using an initialization script. Choose a name for your RabbitMQ messaging zone like 'MY_ZONE'. Don't configure HAProxy. By default, many Moogsoft AIOps services start during initialization, but not the Moogfarmd core processing service. Start Moogfarmd after the initialization script completes. Since this is a training exercise, you don’t need to update the default user passwords.
-
Run scripts to check that Moogsoft AIOps, the Apache Tomcat webserver, and the MySQL database installed successfully.
-
Log in to Moogsoft AIOps and add a license.
-
Go to the system’s URL in your browser.
-
Click on 'Advanced' when you see the security warning.
-
Click to proceed to your machine’s URL.
-
When you see the Moogsoft login screen, log in using the username 'admin' and the password 'admin'.
-
You’ll see a banner in the user interface warning that you need to apply a license. Follow the instructions to paste your license text string into the license box, and then click 'Update' in the lower right hand corner.
-
-
Make sure that your installation can receive and process data. In your terminal, navigate to
$MOOGSOFT_HOME/bin/utils
. Execute thetest_lam
script to send test events into Moogsoft AIOps. -
Go to the user interface and check to make sure that test events are populating the Open Alerts and Open Situation views. Then press 'control-c' in your terminal to terminate the
test_lam
script. -
Check your understanding:
-
Name three open-source packages that you need to install to run Moogsoft AIOps.
-
In which directory is the Moogsoft AIOps initialization script stored?
-
Reflect
If you’ve run into trouble with your installations and are stuck, you can request two new virtual machines to work through the lab again from the beginning.
Otherwise, congratulations! You have successfully installed Moogsoft AIOps. Email screenshots of your two Moogsoft AIOps Workbench screens showing the URLs to Moogsoft University, and then take a break before you tackle the next section.
Explore Services, Logs, and Control Scripts
Now that you have followed the instructions in the documentation to install Moogsoft AIOps, the next step is to get comfortable stopping and restarting the parts of the system you will reconfigure for high availability, checking logs as you do so. Becoming familiar with the locations of the logs and how the logs record normal startup will boost your ability to troubleshoot when something goes wrong.
-
On one of your installed instances, navigate to
$APPSERVER_HOME
(usr/share/apache-tomcat
). Change to thelogs
subdirectory. The primary log for the Apache Tomcat server iscatalina.out.
Review its contents. As you scroll through, you should see that all the messages are labeled 'INFO', unless you had problems during your installation which generated warnings. Go to the top of the file and look at the startup records. Depending on how long your instance has been running,catalina.out
could be very long. -
Stop the Apache Tomcat webserver.
-
Go to the URL where your instance is running and reload the web page. You should see that the user interface is no longer running.
-
To give yourself a fresh start with a manageably sized log, delete
catalina.out.
-
Restart Apache Tomcat, and examine the contents of
catalina.out
again. -
Go to the Moogsoft AIOps user interface in your browser and refresh your instance. It might take a few moments, but after Apache Tomcat has successfully restarted you should see the Moogsoft AIOps screen.
-
Next, consider the Moogfarmd core processing service. Navigate to
/var/log/moogsoft
and look at the contents of the directory.There are several Moogsoft logs here. You’ll see the logs of the initialization scripts for the database, data adapters (LAMs), message bus (MOOMS), server, and UI. These component-specific scripts are called by the general initialization script that you ran during installation. Depending on the actions you’ve taken with your instance, there may be other logs as well. You’ll also see the file
MOO.moog_farmd.log.
This is the log for the Moogfarmd core processing service which you started during installation. Note that the service name--
moogfarmd
--does not have an underscore, but the log file and other files usually include an underscore between 'moog' and 'farmd'. The log file naming convention is to show the HA cluster name followed by the group name with a period as separator. If there was an instance name it would follow the group name.In this case the cluster name is still the installation default, 'MOO'. If you look at the Moogfarmd log file in your other Moogsoft AIOps instance, you’ll see it has the same cluster name. This isn’t an issue as long as the two instances are running independently, but when you configure them as an HA pair you’ll need to choose distinct cluster names.
-
Examine the log.
-
Stop the
moogfarmd
service. -
Delete the Moogfarmd log file so you can quickly locate the beginning records after startup.
-
Restart the
moogfarmd
service and examine the beginning of the log. You should see a record at the beginning with HA status information that identifies the cluster as 'MOO' and the group as 'moog_farmd' with a note that it is starting as active.Note that you shouldn’t necessarily perform all these steps whenever you check logs. For example, you’ll often be interested in only the most recent records at the end of a log, and you also might not want to stop a running service when you are in production mode.
-
Look at the log for the
rabbitmq-server
service. Navigate to/var/log/rabbitmq
and check the file names. The primary log file will have the formatrabbit@hostname.log
, where the hostname is the "short" Linux hostname of your virtual machine. -
Type
hostname
to check the hostname of your machine, and you’ll see that RabbitMQ is using the segment of the hostname before the first period. RabbitMQ can be configured to use fully qualified domain names (FQDN), but by default it uses short hostnames to identify its messaging nodes. -
Examine the log. You can stop the
rabbitmq-server
service, delete the log, and restart the service if you like, but depending on how long your instance has been running, the file might be short enough to examine as is. -
Finally, there are two control utilities that you’ll need to be familiar with: one for the Moogsoft HA configuration, and one for RabbitMQ.
-
The
ha_cntl
utility will let you check the status of your HA system. You can also use it to manually initiate failover for testing. Use the 'view' option to check the HA status. You should see two groups in your 'MOO' cluster: Moogfarmd core processing and UI servlets. Both should be active, and neither one will have an instance name. -
The
rabbitmqctl
utility will let you control the RabbitMQ message broker, including clustering multiple nodes and setting up message queue mirroring. Check the status of RabbitMQ with the 'cluster_status' option. Recall that for RabbitMQ, a cluster is a set of peer nodes. At this point there is only one RabbitMQ node present in your system. -
Check your understanding:
-
Based on the HA control utility output, how many UI servlets are there and what are their names?
-
Which service controls the Moogsoft AIOps user interface?
-
Check Network Connectivity
You have one more housekeeping task before setting up your HA system: you need to verify that your two machines can talk to each other.
-
To start, verify the external IP address, the internal IP address, and the Linux hostname for each of your virtual machines.
-
Your machines are in the same AWS security group and have been configured to accept all network traffic from other members of the same group. They are also set up to accept all incoming
ping
,ssh
,http
, andhttps
traffic. Useping
to verify that your instances can communicate with each other using URLs, internal IP addresses, external IP addresses, and/or hostnames. If this test fails, you can try editing each of the/etc/hosts
files to add the hostname and IP address for the other machine.Ping your second instance from your first instance.
-
Ping your first instance from your second instance.
-
When you are working at customer sites, you will likely have to arrange to have specific ports opened in system firewalls so that Moogsoft AIOps components can communicate with each other. Verify systematically that the following ports are accessible remotely between your machines: 9200 and 9300 for ElasticSearch; 5672, 15672, and 4369 for RabbitMQ; and 5701 and 8091 for the Hazelcast in-memory data manager. You may want to install your preferred diagnostic tool.
-
Verify that the ports are accessible in one direction, and then repeat the tests in the other direction.
If you use
telnet
for these checks, you should either connect successfully or receive a "connection refused" message, either of which indicates that the remote server responded. If your connection times out, the connection test has failed. Networks with more complex security arrangements may require alternate tests. -
Check your understanding:
-
Which ports does the message broker use to communicate?
-
Edit Configuration Files
Now that you have two running Moogsoft AIOps instances, you’re familiar with some of the Moogsoft AIOps services, and you’ve verified connectivity, you’re ready to configure a high availability system. You’ll do this in several stages.
-
First, you’ll prepare Moogsoft AIOps components for HA by editing JSON configuration files.
-
Then, you’ll use the RabbitMQ control utility to initiate communication between the RabbitMQ nodes on your two machines.
-
Finally, you’ll put it all together by identifying the two hosts to the message bus and memory manager in the system settings.
If you were configuring the Percona MYSQL database and Elasticsearch for HA, there would be additional configuration stages for those components. For a Moogsoft HA reference architecture, you would also need a third machine running RabbitMQ, Percona, and ElasticSearch.
-
To configure the Moogsoft HA components, start by choosing two cluster names. Then do the remaining steps in this section for your first Moogsoft AIOps instance, and repeat them for the second instance.
Given that you are using single-host installations, you could use the hostname as the HA cluster name for simplicity, or choose names like 'moog1' and 'moog2'. The specific names you choose for Moogsoft AIOps clusters, groups, and instances aren’t critical, but as you will see how you assign them determines the HA architecture. In the examples here 'REMY' is the primary cluster name and 'CHARLIE' is the secondary cluster name. The intention is that REMY’s core processing will originally be active and CHARLIE’s will be passive when the HA installation is complete.
-
Make backups of the configuration files. That way if your HA configuration doesn’t work and you want to start over, you can try reverting to the original configuration files and re-initializing Moogsoft AIOps instead of repeating the installations from scratch. Navigate to
$MOOGSOFT_HOME/config
and make backup copies of the system, Moogfarmd, and UI servlets configuration files. -
You’ll be making changes to the 'ha' section at the end of each of these files. All three of the files are in JSON format with embedded comments. Make sure you understand JSON syntax, and make your edits thoughtfully. When you restart after making changes, one comma too many or too few can break your entire system. If this happens, check the service logs to figure out which file is likely to be causing the problem and then look carefully at the JSON structure for that file.
In the system configuration file, change the cluster name in the 'ha' section near the end of the file. Change it to the primary (or secondary, depending on the instance) cluster name that you chose.
-
Then, to remove the rest of the default HA configuration settings, locate the process monitor section. Carefully delete or comment out the contents of the 'processes' object, but keep the empty object (don’t delete the square brackets). In Vim, you can add a hashtag comment character to multiple lines by using visual block mode. Start at the first line where you want the hashtag and use the keystrokes ESC, control-v, down arrow to the last line you want to comment, shift-i, #, and ESC.
-
In the Moogfarmd configuration file, uncomment the HA section and declare the cluster, group, and instance names. The cluster name will be the same as in
system.conf.
For the group name, you should choose something indicative of function, for example 'moog_farmd'. For the instance name choose something to distinguish between the Moogfarmd services on your two machines (ie between like components in a group on different clusters). You could use 'primary' and 'secondary' or 'REMY-moogfarmd' and 'CHARLIE-moogfarmd', for example; they are functionally equivalent. -
Set 'default_leader' and 'start_as_passive' to 'false'. These settings are alternate methods of specifying failover behavior, but as a best practice you should rely on the Moogsoft AIOps default failover programming.
-
In the UI servlets configuration file, uncomment and edit the HA section to declare the cluster, group, and instance and set 'start_as_passive' to 'false'. (Add a comma if you need one.) You want the user interfaces in your two installations to be active at the same time, so to configure them in active/active mode by using different group names. Once you’ve done that, the instance name is immaterial, so you can leave it as the default.
-
As a best practice you should rename critical files that you edit so they aren't overwritten during upgrades. You can't rename
system.conf
, but you should rename your customized version ofmoog_farmd.conf
.The convention is to put a shortened customer name in all caps before the file extension. Since you don't have a specific customer, use 'CUST'. Rename
moog_farmd.conf
tomoog_farmd.CUST.conf
. -
Next, edit the service wrapper file for Moogfarmd so that it will use your renamed configuration file. The
moogfarmd
service script is in/etc/init.d
. -
Reload services since you changed the service script.
-
Now that you have finished editing the configuration files, restart the Apache Tomcat service to make the UI changes take effect. Check
$APPSERVER/logs/catalina.out
to verify that startup was successful. -
Restart the Moogfarmd service, and go to
/var/log/moogsoft
. List the files in the directory. You’ll see there is a new log file with a name composed of the cluster name, the service name, and the instance name. -
Look at the log file contents and you should see an 'INFO' message with an HA Status section that identifies the cluster, group, instance, and whether the instance is starting up as active or passive.
-
Finally, use the HA control utility to check that the components are running and named appropriately. (You can also check running services in the user interface under System Settings>Self Monitoring.)
-
Check your understanding:
-
Which file would you edit to change the group name for Moogfarmd?
-
True or false: you should use different cluster names for the core processing service and the user interface.
-
Configure the Message Broker
After you’ve edited the configuration files for both instances, you’re ready to move on to configuring RabbitMQ.
Multiple independent groups of RabbitMQ nodes can run on the same machine or across multiple machines. RabbitMQ refers to these independent systems as virtual hosts, and the Moogsoft AIOps system configuration file calls them zones. If you used the same zone name during installation, both of your instances are using the same virtual host, but be aware that in the real world your RabbitMQ installation might be more complex.
From here onward you’ll need to keep switching back and forth between instances, so it’s a good idea to open two terminal windows and position them so that both of your instances are easy to identify and access.
-
Make sure that the zone for your HA system is defined and that all of your clusters are in the same zone. Check the 'zone' setting under the 'mooms' (Moogsoft Messaging System, or MOOMS) sections in the system configuration files and verify that it was correctly populated during initialization. If you change the zone or password in RabbitMQ, be sure to update the MOOMS settings as well.
-
Run the 'list_vhosts' option of the RabbitMQ control utility to verify that the zone, or virtual host, exists and is the same on both machines.
-
Assuming all is in order, the next step is to connect the RabbitMQ nodes in a single cluster so they share all the messages arriving at either node. The two rabbit nodes will need to have the same Erlang "cookie" file to identify themselves as part of the same cluster. (RabbitMQ is written in the Erlang programming language.) On your secondary machine, stop the RabbitMQ Server.
-
Locate the hidden Erlang cookie file in
/var/lib/rabbitmq
and make a backup in case of problems. -
Use
scp
to copy the cookie from your primary instance, overwriting the secondary cookie. Adjust the file permissions as needed before you copy the cookie, and then restore the ownership and permissions to their prior settings. -
Compare the contents of the two cookies to make sure they are identical.
-
Restart the RabbitMQ Server service on your secondary instance and use the 'cluster_status' option of the control utility to get the node names for both instances.
-
Use the 'stop_app' option to stop the RabbitMQ application on your primary instance.
-
Use the 'join_cluster' option on your primary instance with the node name from your secondary instance.
-
Restart using the 'start_app' option.
-
Check the cluster status on both machines. You should see both node names listed on each instance.
-
RabbitMQ is now configured to operate as a single redundant messaging system. But to ensure that waiting messages aren’t lost in the case of a failure or system restart, enable queue mirroring. Do this by naming a policy that mirrors all queues to every node by matching queue names with a regular expression and providing a JSON-style argument.
Set the queue mirroring policy from either instance. Use 'ha-all' as the name of the policy, ".+\.HA" as the regular expression, and '{"ha-mode":"all"}' as the JSON key-value pair. Replace <zone> with the zone name that you chose.
-
Use the list_policies option on both instances to check that the operation was successful.
-
Check your understanding:
-
Name two of the things you have to do to join RabbitMQ nodes in a cluster.
-
True or false: you can have only one RabbitMQ node on each virtual host.
-
Enable Automatic Failover
Now that RabbitMQ is clustered, enable failover behavior for Moogsoft AIOps components by editing the system configuration files on both your instances. Note that the order in which you list hosts in the settings is important. The first host you list will be treated as the primary, active instance by Moogsoft AIOps.
-
Edit the 'brokers' settings in the MooMS section of
system.conf
by adding a second set of host and port values. Keep the port as 5672 for both entries, but update the hosts from 'localhost' to the full hostnames of your primary and secondary instances. Put the second entry inside curly braces with a comma separating the two entries. -
Make sure that the 'message_persistence' setting is set to true.
-
In the 'mysql' section, change the host from 'localhost'. Use your primary hostname for both instances, since you will be running only one database.
-
In the 'failover' section, remove 'localhost' and add the two hostnames to 'hosts'.
-
Set 'automatic failover' to 'true'.
-
To let the changes take effect, restart Apache Tomcat and Moogfarmd. Check the logs for both.
-
After you've made the system configuration changes on both instances, stop the MySQL service (
mysqld
) on your secondary instance since it won’t be used. -
Use the list_queues option for the RabbitMQ control utility to check that the Moogfarmd and UI servlets queues are active for both instances.
-
Finally, use the HA control utility to check the HA status of both instances. Verify that Moogfarmd and the UI servlets are shown for both clusters.
-
Check your understanding:
-
Which service(s) on which of your instances are in passive mode?
-
What would happen if you stopped Moogfarmd on your primary instance?
-
Reflect
Congratulations! You’ve successfully completed an HA installation of Moogsoft
AIOps. You should be proud of your hard work. Take a screenshot of your
ha_cntl -v
output and email it to Moogsoft University to verify completion. (You'll
probably need two screenshots to show the full output.)
Your boss is very pleased, and she is looking forward to having you learn more about the next implementation stages, including data ingestion, enrichment, and situation design. For now, though, take some time to enjoy your accomplishment.