data_engine dropping and unable to re-connect

Question asked by NicolasRoyo81983169 on Dec 21, 2015
Hi folks,


We have a big infraestructure, monitoring about 1200 msg/sec.


-From time to time, our data_engine probe is dropping and its not re-connecting automatically, producing a queue to grow very big.

-This does not happens on a daily basis, but when the DBA team is doing a manual failover, maintenance, or there`s a network issue.

-Restarting data_engine solves this, but then the queue has to be drawn and this takes times, and resources of course.


What I`m concerned is why on this disconnection events, the data_engine its not capable by himself of re-connecting/re-attaching its queue to the hub again.


Is this a normal behaviour?

Do you think we should implement some script to check the queue is connected, and, if not, restart the data_engine? ---> If someone did something similar in the past, I would appreciate if you could share this tip.


hub is 7.80, data_engine is 8.10 and NAS is 4.60