Monitor Moogfarmd Health Logs

Moogfarmd writes detailed health information in JSON format to its log file once a minute. Information falls into eight logical blocks:

  • totals: running totals since Moogfarmd was started.

  • interval_totals: running totals since the last 60 second interval)

  • current_state: a snapshot of the important queues in Moogfarmd

  • garbage_collection: JVM garbage collection data

  • JVM_memory: JVM memory usage data

  • message_queues: Queue usage and capacity

  • locked_thread_count: Sum of locked threads

  • total_thread_count: Sum of all threads

Example output:

WARN : [0:HLog][20190730 14:48:28.524 +0100] [CFarmdHealth.java:566] +|{"db_stats":{"locked_thread_count":0,"total_thread_count":3},"garbage_collection":{"total_collections_time":12827,"last_minute_collections":0,"last_minute_collections_time":0,"total_collections":1244},"current_state":{"pending_changed_situations":0,"total_in_memory_situations":4764,"situations_for_resolution":0,"event_processing_metric":0.047474747474747475,"message_queues":{"AlertBuilder":0,"TeamsMgr":0,"Housekeeper":0,"Indexer":0,"bus_thread_pool":0,"Cookbook3":0,"Cookbook1":0,"SituationMgr":0,"SituationRootCause":0,"Cookbook2":0},"in_memory_entropies":452283,"cookbook_resolution_queue":0,"total_in_memory_priority_situations":0,"active_async_tasks_count":0},"interval_totals":{"created_events":1782,"created_priority_situations":0,"created_external_situations":0,"created_situations":10,"messages_processed":{"TeamsMgr":182,"Housekeeper":0,"AlertBuilder":1782,"Indexer":2082,"Cookbook3":1782,"SituationRootCause":172,"Cookbook1":1782,"SituationMgr":172,"Cookbook2":1782},"alerts_added_to_priority_situations":0,"alerts_added_to_situations":111,"situation_db_update_failure":0},"JVM_memory":{"heap_used":1843627096,"heap_committed":3007840256,"heap_init":2113929216,"nonheap_committed":66912256,"heap_max":28631367680,"nonheap_init":2555904,"nonheap_used":64159032,"nonheap_max":-1},"totals":{"created_events":453252,"created_priority_situations":0,"created_external_situations":0,"created_situations":4764,"alerts_added_to_priority_situations":0,"alerts_added_to_situations":36020,"situation_db_update_failure":0}}|+

The message_queues block contains string values and queue limits. "-" represents an unlimited queue. An example message_queues block is as follows:

"message_queues":{"AlertBuilder":"0/-","Cookbook":"0/-","Housekeeper":"0/-","Indexer":"0/-","bus_thread_pool":"0/-","SituationMgr":"0/-"}

In a healthy system that is processing data:

  • The count of created events and created Situations increases.

  • The messages_processed shows that Moolets are processing messages.

  • The current_state.message_queues does not accumulate (there may be spikes).

  • The total_in_memory Situations increases over time but reduces periodically due to the retention_period.

The situation_db_update_failure should be zero.