Monitor Moogfarmd Health Logs

The CFarmdHealth class in Moogfarmd logs detailed health information in JSON format once a minute. The log file provides the following information:

  • totals: running totals since Moogfarmd was started.

  • interval_totals: running totals since the last 60 second interval).

  • current_state: a snapshot of the important queues in Moogfarmd.

  • garbage_collection: JVM garbage collection data.

  • JVM_memory: JVM memory usage data.

  • message_queues: Queue usage and capacity.

  • locked_thread_count: Sum of locked threads.

  • total_thread_count: Sum of all threads.

  • top_blocking_query: The query that is blocking the most other queries in the database. Only logged if there are blocking queries in the database at the time.

  • top_blocking_query_count: The number of other queries being blocked by the top blocking query. Only logged if there are blocking queries in the database at the time.

For example, you can search on "CFarmdHealth" in the Moogfarmd log to view health messages:

WARN : [0:HLog][20190730 14:48:28.524 +0100] [CFarmdHealth.java:566] +|{"db_stats":
{"top_blocking_query_count":12,"locked_thread_count":18,"total_thread_count":35,"top_blocking_query":"DELETE FROM notification WHERE sig_id = i_sig_id"}, "garbage_collection":{"total_collections_time":12827,"last_minute_collections":0,"last_minute_collections_time":0,"total_collections":1244},"current_state":{"pending_changed_situations":0,"total_in_memory_situations":4764,"situations_for_resolution":0,"event_processing_metric":0.047474747474747475,"message_queues":{"AlertBuilder":0,"TeamsMgr":0,"Housekeeper":0,"Indexer":0,"bus_thread_pool":0,"Cookbook3":0,"Cookbook1":0,"SituationMgr":0,"SituationRootCause":0,"Cookbook2":0},"in_memory_entropies":452283,"cookbook_resolution_queue":0,"total_in_memory_priority_situations":0,"active_async_tasks_count":0},"interval_totals":{"created_events":1782,"created_priority_situations":0,"created_external_situations":0,"created_situations":10,"messages_processed":{"TeamsMgr":182,"Housekeeper":0,"AlertBuilder":1782,"Indexer":2082,"Cookbook3":1782,"SituationRootCause":172,"Cookbook1":1782,"SituationMgr":172,"Cookbook2":1782},"alerts_added_to_priority_situations":0,"alerts_added_to_situations":111,"situation_db_update_failure":0},"JVM_memory":{"heap_used":1843627096,"heap_committed":3007840256,"heap_init":2113929216,"nonheap_committed":66912256,"heap_max":28631367680,"nonheap_init":2555904,"nonheap_used":64159032,"nonheap_max":-1},"totals":{"created_events":453252,"created_priority_situations":0,"created_external_situations":0,"created_situations":4764,"alerts_added_to_priority_situations":0,"alerts_added_to_situations":36020,"situation_db_update_failure":0}}|+

The message_queues block contains string values and queue limits. "-" represents an unlimited queue. An example message_queues block is as follows:

"message_queues":{"AlertBuilder":"0/-","Cookbook":"0/-","Housekeeper":"0/-","Indexer":"0/-","bus_thread_pool":"0/-","SituationMgr":"0/-"}

In a healthy system that is processing data:

  • The count of created events and created Situations increases.

  • The messages_processed shows that Moolets are processing messages.

  • The current_state.message_queues does not accumulate (there may be spikes).

  • The total_in_memory Situations increases over time but reduces periodically due to the retention_period.

  • The situation_db_update_failure should be zero.