Automic Unix/Windows Agents - Best practice for monitoring

Discussion created by JohnO'Mullane on Sep 1, 2016
Latest reply on Jan 13, 2018 by Krum_Ganev
We now have over 1000 agent running and I'm struggling a bit to find the best way of monitoring these.
I first started with monitoring the actual process (Unix) and Service (Windows) on the host but soon discovered that even if the Process is running it may have lost connection with my Engine.

I then looked at used the HOST table which tells me which agent has lost connection.
This work relatively well and I was able to build a workflow that tries to recover before escalation.  

I'm still not 100% convinced that this is the best way.

tbh Our agents rarely go down/loose connection. It's really only when our sysadmins are doing some work on a host like an upgrade and so a restart is needed as IP's change.

What is the recommended "Best Practice" from Automic's perspective?
Is there any auto recovery built into the Agent?
What are others doing to monitor your Agents?