Troubleshoot Percona

Percona XtraDB Cluster is the database clustering solution installed with this version of Moogsoft AIOps. If you are an upgrading customer, strongly recommends that you upgrade to Percona.

This document provides guidance on how to deal with Percona-related issues.

Nodes in the Percona cluster are down

If two nodes in the Percona cluster go down simultaneously, it is a critical failure. It can produce the following symptoms:

  • The Alert Builder can appear to become "stuck". It may be consuming events from the Message Bus but is not writing them to the database, causing a message queue to form.

    Example output using the HA Control utility is as follows:

    WARN : [0:HA Controller][20190614 10:47:20.194 +0100] [CAbstractPool.java:214] +|[moog_farmd] POOL DIAGNOSTICS:|+
    WARN : [0:HA Controller][20190614 10:47:20.194 +0100] [CAbstractPool.java:216] +|[moog_farmd] Pool created at [20190613 16:24:53.302 +0100].|+
    WARN : [0:HA Controller][20190614 10:47:20.194 +0100] [CAbstractPool.java:222] +|[moog_farmd] [12] invalid resources have been removed during the lifetime of the pool.|+
    WARN : [0:HA Controller][20190614 10:47:20.194 +0100] [CAbstractPool.java:227] +|[moog_farmd] Pool size is [30] with [23] available connections and [4] busy.|+
    WARN : [0:HA Controller][20190614 10:47:20.198 +0100] [CAbstractPool.java:244] +|The busy resources are as follows:
    0: Held by 1:AlertBuilder for 584832 milliseconds. Currently in 
    		java.net.SocketInputStream#socketRead0 - SocketInputStream.java:-2
    		java.net.SocketInputStream#socketRead - SocketInputStream.java:115
    		java.net.SocketInputStream#read - SocketInputStream.java:168
    		java.net.SocketInputStream#read - SocketInputStream.java:140

To resolve, try the following:

  1. Ensure that you have bootstrapped the first node (started the node without any known cluster addresses). Depending on your installation type, see one of the following guides for more information:

  2. Ensure that the other nodes do not have a file named grastate.dat in the MySQL data directory. If the file is present, delete it.

  3. Restart one of the secondary nodes and wait for it to sync from the bootstrapped node. Note that this can create a temporary write lock on the bootstrapped node.

  4. Once the second node is up and running, start the remaining nodes.

You can also refer to the Percona documentation on how to recover a PXC cluster in various scenarios.