DX Application Performance Management

  • 1.  November 2013 - Webcast Replay - High Availability

    Posted Nov 21, 2013 04:53 PM
      |   view attached

    Here are the webcast recording details for the CA Wily/APM Global User Community webcast on November 21, 2013.

    Topic:
    Mike Sydor, CA Sr Engineering Services Architect, will discuss high availability scenarios and solutions for APM.

    Here is the webex version that will expire at some point: https://catechnologies.webex.com/catechnologies/lsr.php?RCID=ec7e7f81226a63cc69dc6d14dc220112

    Here is the YouTube version: http://youtu.be/2mWd2hfWHA8

    Mike's presentation is attached.

    Attachment(s)

    pdf
    APMBP - Failover.pdf   848 KB 1 version


  • 2.  Re: November 2013 - Webcast Replay - High Availability

    Posted Feb 13, 2015 07:35 AM

    First Thank you Mr Sydor on your insights into the fail-over strategies and purposes.

     

    What changes, improvements or possible new release features have been added to this discussion over the last year?

     

    This topic has risen again at my company and the argument of value has some traction but there are still the undertones.  From my understanding there are two points of failure that the load-balancing of agent-collector does not provide for.

     

    1.  MOM Failure

        This is the more critical since it is the nerve center of our solution and without out, we can not access the metrics and the solution can not send out alerts.  The primary concern is the not sending out alerts.   While the band-aid file share and lock file possible directions are present, they are both filled with very undesirable side-effects that is worse than the problem they are trying to solve.

         It seems the main issue is the MOM is both the communication hub and the primary consumer of system requests.  Decoupling the communication bus and move to a distributed request publish/subscribe model might be a solution to not only help solve this issue but also remove the artificial limits on having only one MOM within a cluster.  But that is only based on my guess at how the black-box communication between the MOM and collectors function.    

     

    2.  APM DB Failure

         This is very secondary since the storage elements of the app map and CEM transactions become secondary when the metrics and abilities of APM provide worth during a critical failure event.  Would like to know more about the behaviors of the APM collectors when the APM DB has failed, but I think that is a different discussion.

    For CEM, with 9.6 requirement to have a RedHat OS and my failure of helping people understand the worth and use of the CEM metrics, we have removed our TIMs.

     

     

    Again, Thank you for your insight,

     

    Billy



  • 3.  Re: November 2013 - Webcast Replay - High Availability

    Posted Feb 13, 2015 09:37 AM

    I'll have to defer to

     

     

    for the status of any additional features advancing our fail-over capabilities.  A prototype for a distributed fail-over capability was built and validated but Product Management makes the decision when (or if) a feature will be introduced.  You should re-post the "MOM Failure" topic as a "feature request" to immortalize the request!

     

    Likewise for the APM DB - you would need to get a futures presentation to understand the direction regards the app map.  Florian "knows all" - and is empowered to discuss it (as appropriate).

     

     

     

     



  • 4.  Re: November 2013 - Webcast Replay - High Availability

    Posted Feb 17, 2015 01:17 PM

    Thanks Mr Sydor.

     

    As you have suggested, I've posted a feature request (idea) at:

    Fail Over - Protect the MoM, the Collectors can deal