DX Application Performance Management

Expand all | Collapse all

Setting an Alert when Agent Goes Down

  • 1.  Setting an Alert when Agent Goes Down

    Posted Sep 07, 2010 12:04 PM
    Hi All,

    Is there a way to send an alert when the Wily Agent goes down. I tried creating an alert on the agent based on the number of active threads, but its not working. Please let me know what metric I should use for the Alert or is there anyother way we can be notified if one of the agent goes down. Thanks.


  • 2.  RE: Setting an Alert when Agent Goes Down

    Broadcom Employee
    Posted Sep 07, 2010 02:12 PM
    Currently, it is suggested to monitor the GC: Bytes In Use metric.
    Set up an alert to notify your support teams when the metric goes to zero.

    -Hiko


  • 3.  RE: Setting an Alert when Agent Goes Down?

    Posted Sep 08, 2010 09:58 AM
    Thanks Hiko.

    The alerts on GC or BytesInUse works well as long as the agent is running, but if the agent is itself down, then the alerts are not working.


  • 4.  RE: Setting an Alert when Agent Goes Down?

    Broadcom Employee
    Posted Sep 08, 2010 11:32 AM
    To monitor the agent, set up an alert on the ConnectionStatus metric for your agent.
    This metric can be found in the superdomain. It may look something like this for a clustered environment:

    Superdomain|Custom Metric Host (Virtual)|Custom Metric Agent (Virtual) (EM1) (*Superdomain*)|Agents|MyServer|MyCustomDomain|MyAgent|ConnectionStatus

    Set the alert for when ConnectionStatus is zero.

    -Hiko


  • 5.  RE: Setting an Alert when Agent Goes Down?

    Posted Sep 08, 2010 02:23 PM
    Thanks Hiko,

    I created the alert with the metric expression you mentioned. As per our environment following is the path for the ConnectionStatus metric.
    Agents\|XXXXX02\|WebLogic\|XXXXXADmn//XXXXServer01:ConnectionStatus.

    The alerts are working fine, but in the other way :)

    When I bring the Agent Down, the ConnectionStatus goes to value 3.
    But if the Agent is up ConnectionStatus remains at value less than 3 (between 0,1,2).

    But alerts are working if set the Threshold Value > 2 (ie email is sent when the server is down).

    Am not very sure, about setting the value to "0" for the ConnectionStatus. It would be very helpful if you can share any dcoumentation on the ConnectionStatus metric.

    Thanks


  • 6.  RE: Setting an Alert when Agent Goes Down?

    Broadcom Employee
    Posted Sep 08, 2010 02:33 PM
    I've not seen the scenario you're describing in regards to ConnectionStatus.
    It appears that you're trying to monitoring Java agents on WebLogic.

    What version of Introscope are you using?
    What version of WebLogic are you using?


  • 7.  RE: Setting an Alert when Agent Goes Down?

    Broadcom Employee
    Posted Sep 08, 2010 03:44 PM
    I'm going to check in one of our environments and get back to you.
    I'll check with contact I have in CA about this.

    -Hiko


  • 8.  RE: Setting an Alert when Agent Goes Down?
    Best Answer

    Broadcom Employee
    Posted Sep 08, 2010 04:21 PM
    Okay, let's try this again.... :grin:

    ConnectionStatus has three metric conditions: 1, 2, and 3.
    1-indicates the agent is available and reporting.
    2-indicates an issue with the EM heartbeat with the agent.
    3-indicates that connectivity with the agent has been lost.

    So you are correct in setting your alerts to when the severity increases. This hasn't been clearly documented in versions 8.x or lower. This is covered in the 9.0 APM documents.

    Sorry for the confusion.

    -Hiko


  • 9.  RE: Setting an Alert when Agent Goes Down?

    Posted Sep 13, 2010 11:18 AM
    Hi Hiko,

    Thanks for your explanation.. Now I have setup the alerts when the agent goes down..

    It would be more helpful, if you can provide(if you have ) any documentation on the different metrics we use,


  • 10.  RE: Setting an Alert when Agent Goes Down?

    Broadcom Employee
    Posted Sep 20, 2010 01:49 AM
    Did you only want the documentation about ConnectionStatus, or are you also interested in the KB article about monitoring the JVM?


  • 11.  RE: Setting an Alert when Agent Goes Down?

    Broadcom Employee
    Posted Sep 23, 2010 06:10 PM
    Page 41 of the APM 9.0 Workstation Guide, it discusses the connection state of agents. While it doesn't talk about connection state 2, this was described to me by the CA resource I spoke to.

    -Hiko


  • 12.  Thread Split

    Broadcom Employee
    Posted Jun 09, 2014 12:05 PM
    Original posted has been answered. The new thread can be found at [url=https://communities.ca.com/web/ca-wily-global-user-community/message-board/-/message_boards/view_message/119511062]https://communities.ca.com/web/ca-wily-global-user-community/message-board/-/message_boards/view_message/119511062[/url].


  • 13.  RE: Setting an Alert when Agent Goes Down?

    Posted Nov 05, 2010 11:51 AM
    There is a problem with monitoring the WebLogic Connection Status when you have a MOM environment with multiple collectors.
    The WebLogic Instance can connect with any collector or be reload balanced to another collector. Therefore, you have to setup a
    Metric Grouping that includes the WebLogic connection status for every Collector using a regex expression.

    However, a problem exists that If the WebLogic instance is restarted or reloadbalanced to another collect, the old Collector will show
    a "3" and the new collector a "1". I've seen it take up to 30 minutes before the other collector removes the "3" status. Is there any way
    around this?

    Here's what the graph looks like:

    3 ------------------------- (Collector 1)

    1 --------------- ----------------------------------- (Collector 3)
    (Collector1)

    Collector 1 is connected to the WebLogic instance which is bounced or switches to Collector 3.
    Collector 1 continues reporting the agent at "3" or disconnected for another 30 minutes.

    James


  • 14.  RE: Setting an Alert when Agent Goes Down?

    Broadcom Employee
    Posted Nov 05, 2010 12:17 PM
    James,
    I have done exactly what you're speaking of with our implementation here, so we don't worry about the failovers at all.
    We're only concerned when the agent doesn't go back to one within our specified time range for our alert and dashboard.

    I have not, however, seen the condition you're describing with the 30 minute lag.

    What version of Introscope are you using? Do you have your agents only pointed to the MOM?


  • 15.  RE: Setting an Alert when Agent Goes Down?

    Posted Nov 05, 2010 02:46 PM
    We have all the agents pointed towards the MOM. Version 8.2.2

    James


  • 16.  RE: Setting an Alert when Agent Goes Down?

    Posted Nov 07, 2010 04:06 AM
    We wrote a javascript which checks if there is any collector connected to one specific agent. So if one collector reports "1" every other collector connection status is ignored. Only if every collector reports 2 or 3 we generate an alarm. This works in cases where the mom rebalances the collector cluster.

    Do you mean with the 30 minute delay the value of
    introscope.enterprisemanager.autoUnmountDelayInMinutes in IntroscopeEnterpriseManager.properties ?


  • 17.  RE: Setting an Alert when Agent Goes Down?

    Posted Jul 09, 2013 11:12 AM
    We are trying to use the Connection Status metric for this. Would it be possible to get a copy of the javascript you implemented?


  • 18.  RE: Setting an Alert when Agent Goes Down?

    Broadcom Employee
    Posted Jun 08, 2014 08:28 AM

    I have previously posted a calculator which brings all of the ConnectionStatus metrics under the MOM to make it easier to manage.

    You can find it either from the wiki or going directly to the community documents under Tools > JavaScript Calculators.



  • 19.  RE: Setting an Alert when Agent Goes Down?

    Posted Jun 05, 2014 03:51 PM

    when we first went to a load balanced environment I noticed this and had a big discussion with some with some my former PS Friends. Ultimately we came up with a solution  that seems to take care of a load balanced change using just the alert set up with no javascript involoved.

    You do this by setting your connection status metric group to pick up all the collectors:

    (.*)\|Custom Metric Process \(Virtual\)\|Custom Metric Agent \(Virtual\).*

    Then set your alert to Combination to All, use the Not Equal To Comparison and set the Danger and Caution to 1

    We have found that 20 for 20 on the Danger and 12 for 12 on the Caution works most of the time. I have seen it where it takes more that 5 minutes to reconnect on a busy MoM/Collector Cluster but it seems to work most of the time so your mileage may vary...

    We use the Whenever Serverity Changes  trigger alert notification since these alert go to a high alert authority (NetCool).