We have a big infraestructure, monitoring about 1200 msg/sec.
-From time to time, our data_engine probe is dropping and its not re-connecting automatically, producing a queue to grow very big.
-This does not happens on a daily basis, but when the DBA team is doing a manual failover, maintenance, or there`s a network issue.
-Restarting data_engine solves this, but then the queue has to be drawn and this takes times, and resources of course.
What I`m concerned is why on this disconnection events, the data_engine its not capable by himself of re-connecting/re-attaching its queue to the hub again.
Is this a normal behaviour?
Do you think we should implement some script to check the queue is connected, and, if not, restart the data_engine? ---> If someone did something similar in the past, I would appreciate if you could share this tip.
hub is 7.80, data_engine is 8.10 and NAS is 4.60