Monitor Moogfarmd Health Logs
Moogfarmd writes detailed health information in JSON format to its log file once a minute. Information falls into eight logical blocks:
-
totals: running totals since Moogfarmd was started.
-
interval_totals: running totals since the last 60 second interval)
-
current_state: a snapshot of the important queues in Moogfarmd
-
garbage_collection: JVM garbage collection data
-
JVM_memory: JVM memory usage data
-
message_queues: Queue usage and capacity
-
locked_thread_count: Sum of locked threads
-
total_thread_count: Sum of all threads
Example output:
WARN : [0:HLog][20190730 14:48:28.524 +0100] [CFarmdHealth.java:566] +|{"db_stats":{"locked_thread_count":0,"total_thread_count":3},"garbage_collection":{"total_collections_time":12827,"last_minute_collections":0,"last_minute_collections_time":0,"total_collections":1244},"current_state":{"pending_changed_situations":0,"total_in_memory_situations":4764,"situations_for_resolution":0,"event_processing_metric":0.047474747474747475,"message_queues":{"AlertBuilder":0,"TeamsMgr":0,"Housekeeper":0,"Indexer":0,"bus_thread_pool":0,"Cookbook3":0,"Cookbook1":0,"SituationMgr":0,"SituationRootCause":0,"Cookbook2":0},"in_memory_entropies":452283,"cookbook_resolution_queue":0,"total_in_memory_priority_situations":0,"active_async_tasks_count":0},"interval_totals":{"created_events":1782,"created_priority_situations":0,"created_external_situations":0,"created_situations":10,"messages_processed":{"TeamsMgr":182,"Housekeeper":0,"AlertBuilder":1782,"Indexer":2082,"Cookbook3":1782,"SituationRootCause":172,"Cookbook1":1782,"SituationMgr":172,"Cookbook2":1782},"alerts_added_to_priority_situations":0,"alerts_added_to_situations":111,"situation_db_update_failure":0},"JVM_memory":{"heap_used":1843627096,"heap_committed":3007840256,"heap_init":2113929216,"nonheap_committed":66912256,"heap_max":28631367680,"nonheap_init":2555904,"nonheap_used":64159032,"nonheap_max":-1},"totals":{"created_events":453252,"created_priority_situations":0,"created_external_situations":0,"created_situations":4764,"alerts_added_to_priority_situations":0,"alerts_added_to_situations":36020,"situation_db_update_failure":0}}|+
The message_queues block contains string values and queue limits. "-"
represents an unlimited queue. An example message_queues block is as follows:
"message_queues":{"AlertBuilder":"0/-","Cookbook":"0/-","Housekeeper":"0/-","Indexer":"0/-","bus_thread_pool":"0/-","SituationMgr":"0/-"}
In a healthy system that is processing data:
-
The count of created events and created Situations increases.
-
The messages_processed shows that Moolets are processing messages.
-
The current_state.message_queues does not accumulate (there may be spikes).
-
The total_in_memory Situations increases over time but reduces periodically due to the retention_period.
The situation_db_update_failure should be zero.