AnsweredAssumed Answered

EM "Holding" on to disconnected agent

Question asked by Srikant.Noorani on Aug 8, 2013
Latest reply on Aug 19, 2013 by venvi05
One of the things we noticed between EM and agent is that in case the agent is not "stopped cleanly" ( say if the VM in which agent is running crashes) then EM thinks that the agent is still on although no metrics are getting updated on the EM side. It could be up to 15 to 20 mts before EM marks the agent as disconnected (note: this is not the regular agent disconnect where you can look at agent connectivity metrics ). During the period ( before EM realizes ) if we restart the agent it will come in as "%1" (thinking this is duplicate agent). Basically EM seems to rely on the network layer to decide if the socket connection to agent is stale or not and when the OS or network layer marks it as stale EM will kick the agent out. I am thinking a better option would be to have some mechanism built in the application layer itself that looks at data coming or something and after certain configurable timeout period kick the agent out if no traffic or data from agent. This is especially evident in a cloud env where VM's with agents are all the time up and down and you end up with multiple "%" agent names. I am wondering is there an EM hidden property that we can set that would enforce such time out on the application side i.e. see if the socket connection with agent ( or for that matter with collector) is "active" with live data coming in and if not knock that connection out.

thanks for you help in adv.