Release Automation

Expand all | Collapse all

Anyone else having the issue that all agents appear to be offline?

  • 1.  Anyone else having the issue that all agents appear to be offline?

    Posted Oct 28, 2015 03:40 AM

    Hi there,

     

    today it happened the second time for us, that all agents appeared to be offline, this seems to be a problem of the management server, as restarting the nolio sevice on this one, brings everything back to live.

     

    I already opened a case for CA to investigate it, but I just wanted to check if some of you might have experienced something similar over the years, as we're still pretty new to the system.

     

    we're currently running on 5.5.2

     

    thanks.

     

    best regards

    michael



  • 2.  Re: Anyone else having the issue that all agents appear to be offline?

    Broadcom Employee
    Posted Oct 28, 2015 05:15 AM

    Hi Michael,

     

         Could you share some more details on your environment:

         1      how many execution servers do you have?

         2     Are the execution servers connected over slow / WAN links?

         3     When the agents disconnect do you have to restart the management server or does restarting the execution server resolve the problem?

     

    Regards

    Keith



  • 3.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Oct 28, 2015 05:18 AM

    hi keith,

     

    we have 1 execution server and running on win2012r2. the connection between the servers is not over slow/wan links, they're all located in the same LAN.

     

    to fix this I need to restart the management service, only doing it on the execution server doesn't help. that's why I attached the management server logs to the case I've opened.

     

    thanks

    michael



  • 4.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Oct 29, 2015 09:16 AM

    We have three execution servers in our environment.  One of them consistently showed the agents offline.  A workaround that we found was to log into the ASAP, navigate to the agent management screen under administration tab and then toggle the connection type from HTTP to HTTPS and then back again.  This would reset whatever needs to be reset and then the agents would show online again.  Further investigation has revealed that the issue could be isolated to the message broker service over port 61616.  We do have a WAN connection between this execution server and the management server.  Occasional spikes in traffic on the ISP side cause enough of a disruption that the message broker service drops it's connection and then can't re-establish. 

     

    I know that you said you were in a LAN situation so it might not be the same thing at all but we've definitely seen some of the same results. Support informed us that CA is currently working on a patch to address this so if it is the same thing then hopefully the patch will help to resolve it.  In the meantime, you could try the workaround listed above as we have found that to be the quickest and easiest workaround to get the agents back online.



  • 5.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Oct 29, 2015 09:53 AM

    thanks for the feedback! I will see what support does say, currently Saurabh is looking into it and said, that he couldn't see anything in the management server logs, so he's checking 2 agents and the execution logs as well.

     

    Toggling every agent is sadly not an option, as I would need to do it for all agents. Currently what I'm doing is restarting the nolio service on the management server and this works as well, after it is back up, all agents are listed as online again.



  • 6.  Re: Anyone else having the issue that all agents appear to be offline?

    Broadcom Employee
    Posted Oct 29, 2015 10:01 AM

    Just a quick note that toggling http to https and back to https wouldn't need to be done at the agent level. It would be done at the execution server level. This causes the NAC to attempt a new connection to that NES.

     

    Kind regards,

    Gregg



  • 7.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Oct 29, 2015 10:05 AM

    oh ok, that's interesting, if this happens again (which I don't hope) I will try this one, I'm just not sure if it would help, as restarting the execution server nolio service didn't solve the issue.

     

    thanks



  • 8.  Re: Anyone else having the issue that all agents appear to be offline?

    Broadcom Employee
    Posted Oct 29, 2015 10:11 AM

    That is because the NES doesn't try to establish the active mq connection with the NAC. The NAC establishes the connection with active mq on the NES.



  • 9.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Oct 29, 2015 10:13 AM

    thanks for the input :-)



  • 10.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Mar 01, 2017 04:56 PM

    Curiosly, we have this same trouble in 6.1.0.852 version, but well, thank you CJFarrar for your workaround, that was very helpful.



  • 11.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Apr 05, 2016 10:33 AM

    I got this issue twice in the past weeks and each time, it was impossible to deploy on any agent.

    So, I opened the ASAP and I saw that all agent were offline. I clicked on the refresh button and magically all agents came back online and deployments were possible again.

    Really strange!



  • 12.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Aug 08, 2016 09:22 AM

    I am having the same problem.

    I have 2 management servers installed to provide HA, but from time to time one of them always shows that all agents are offline. When that happens I have to access the other management server through ROC and then all agents appear online again for both nodes.

     

    I only access ROC web interface using our Load Balancer that is configured as active-passive. But when this problem happens then I have to access the web interface connecting directly to the management servers nodes.

     

    PS: We have CA Release Automation 5.5.2.



  • 13.  Re: Anyone else having the issue that all agents appear to be offline?

    Posted Aug 25, 2017 12:32 PM

    CA Release Automation 6.1 here.  We've also seen this issue, most often after one of our maintenance windows installs Windows Updates and causes reboots to happen.  Stopping and restarting the Nolio management service usually does the trick.  For us, the management and execution servers are the same box.  We will consider separating the two the next time we upgrade CA RA (Nolio).