DX Application Performance Management

  • 1.  CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Posted Aug 04, 2017 08:56 AM

    Currently we have CA APM 10.0 running on virtual servers with SuSE and we are planning our upgrade to 10.5.2 and our UNIX admins have offered RHEL virtual servers to replace our current servers.

     

    So my initial plan is to clean install APM 10.5.2 on RHEL 7.x then as the agents are upgraded, have them report to the RHEL APM cluster.  I really don't want to try to figure a way to export/migrate SmartStor data so keep the SuSE APM 10.0 environment for 90 days and set the unmount time on the agents in the 10.0 environment to 130,000 minutes.  That way if any of the APM end users need access to the older metrics, they can switch over to the 10.0 APM environment.

     

    The created elements such as Management Modules, JavaScript Calculators, custom email script should all port directly over.  There are setting that we have had in the Enterprise Manager configuration since 9.0.5.6 to 9.1.1.1 upgrade that I am not sure if we still need.  

     

    introscope.enterprisemanager.metrics.live.limit=500000
    introscope.enterprisemanager.query.datapointlimit=0
    transport.outgoingMessageQueueSize=6000
    introscope.enterprisemanager.application.overview.baselines=true
    introscope.enterprisemanager.events.limit=1250
    introscope.enterprisemanager.metrics.historical.limit=4000000
    introscope.enterprisemanager.query.returneddatapointlimit=0
    introscope.enterprisemanager.dbfile=data/baselines.db

     

    Do we still need to add these settings, or is there a guide in addition to the installation guide that has this type of additions for 10.5.2?

     

    Has anyone attempted to switch OS for the APM EM cluster?

    Anyone have any advice on this effort?

     

    Thank you,

     

    Billy



  • 2.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Broadcom Employee
    Posted Aug 04, 2017 02:17 PM

    Dear APM Admins and Community:   

          It is great Billy is sharing his plans. Does anyone have any feedback for how to make this change a success?

    Thanks

    Hal German



  • 3.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x
    Best Answer

    Posted Aug 04, 2017 03:04 PM

    Hi Billy,

     

    since EMs run over JRE, they are pretty portable between Linux distros.

     

    I would run the clean installation of the new cluster, then copy MOM's:

    • {em_home}/config/modules folder from the older version to the new one (do not overwrite the default modules, unless you have customized them in 10.0). 10.5 has Default module with newer stuff tha 10.0. This will migrate all your management modules, scripts, etc.
    • {em_home}/config/realms.xml, server.xml, domains.xml and users.xml should be copied, so you copy all the sign on data.

     

     

    About the properties you mention, all the limits are available at apm-events-thresholds-config.xml, so you will be able to check them there.

    I'm not really sure about transport.outgoingMessageQueueSize=6000, since I cannot find it as default in 10.5; anyway I used it when migrating from 10.3 to 10.5

     

    If you run any plugin or extension you should check where you installed it. i.e. for customized mails, I'm using a jar that is located under product/enterprisemanager/plugins

     

    Your plan sounds fine, good luck!!

     

     

    Regards,

    Roger



  • 4.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Posted Aug 04, 2017 03:31 PM

    Thanks Roger.

     

    One problem I think we have is we have management modules with metric grouping for the epagent and also the Java application agent.  The application agents won't be transitioned all at the same time.  So, I'm trying to plan out how to either unwire the epagent alerts, along with the agent connection status from the management modules so we can do a agent phased transition.

     

    One of the options I've come up with is to transition the epagent at the same time as the application agents that way no temporary modifications to the management module alerts/metric groupings would need to be made.

     

    The one metric that is at issue is the use of the 

    • *SuperDomain*|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual) ([collector]@5001)|Agents|[agenthost]|LinuxAgent|PerfMonAgent:ConnectionStatus

    So when the agent stops reporting to the SLES cluster, alerts will attempt to trigger and will mask any other alerts within the summary alert.

       The rest of the alerts are bound to metrics that if the agent isn't present, then there is no data, no data no alert. No problem.  

     

    If it wasn't for the agent connection status, we could enable the RHEL management modules, knowing that no alerts would be generated till the agent is reporting to RHEL and then just disable the SLES management modules.

     

    I'll update this when I figure out what we are going to do.

     

    Billy



  • 5.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Posted Aug 04, 2017 03:39 PM

    Billy,

     

    I don't fully get what you say about the EPAgent. If it is the Connection Status, the metrics would not exist, so no alerts should be triggered.

     

    I agree that migrating related EPAgents when app agents are switched to the new cluster is the best option

     

    Regards,

    Roger



  • 6.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Posted Aug 07, 2017 07:12 AM

    The connection status is a collector generated metric and if the agent disconnects, the collector will report a value of three till the agent is unmounted.  So even if the agent no longer reports, this metric will report causing the alert to trigger.

     

    We are using the

    • *SuperDomain*|Custom Metric Host (Virtual)|Custom Metric Process (Virtual)|Custom Metric Agent (Virtual) ([collector]

    To detect if the agent is reporting since building a calculator for every agent and contending with when the agent is load balanced would take quite a bit of effort since we have a few thousand agents.

     

    Now to try to capture which servers have epagents with a connection status alert and an application agent connection status.

     

    Hope that helps clarify about the collector.agent.connection status,

     

    Billy



  • 7.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Broadcom Employee
    Posted Aug 04, 2017 04:01 PM

    Thank you Roger for providing such a thorough and a helpful response to Billy! Greatly Appreciated!



  • 8.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Posted Aug 10, 2017 02:38 PM

    A few wrinkles ....

    So, the all of WebSphere Java agents can't be restarted on the same day.  They currently have a 1, 2, 3, 4 Sunday and 1, 3, 4 Saturday groups. This would cause the production environment to be split for three weeks, which isn't selling all that well with the end users.

     

    So thought of a new possible process to do this.

    1.  Create a network virtual ip address configured to the SLES 10.0 MOM

    2.  Reconfigure all agents to point to the new VIP so when they restart normally over the month, they will be addressing the VIP

    3.  Install 10.5.2 on RHEL (MOM/Collectors)

    4.  Switch the VIP from the 10.0 SLES MOM to the 10.5.2 RHEL MOM

    5.  Shutdown the SLES MOM/Collectors forcing the agents to ask the VIP MOM for a new collector list

    6.  Agents get a RHEL collector list and start report to the RHEL cluster

    7.  Keep the SLES environment down long enough for all agents to switch to RHEL

    8.  Start the SLES environment with the unmounts period set to 130,000 minutes (90 days) for historic searches

    9.  After 90 days, shutdown and decommission the SLES APM cluster.

     

    Once we get the 10.0 agents reporting to a 10.5.2 environment, we can then do a phased upgrade of the agents and hopefully this plan will prevent having two different APM instances we would need to review or to triage with.

     

    How does that sound?

     

    Thanks,

     

    Billy



  • 9.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Broadcom Employee
    Posted Aug 11, 2017 12:51 PM

    Anyone have any guidance for Billy? 



  • 10.  Re: CA APM 10.5.2 OS shift from SuSE to RHEL 7.x

    Posted Aug 11, 2017 03:15 PM

    For anyone in the future, if you are building any alerts from the agent connection status, understand the complexities that will accompany the use.

    If you build any calculators, define the 0 or no data state that if the calculator finds no matching metrics to report a zero or no data. 

    The alerts based on the agent metrics will not be triggered if there is no data, agent no longer reporting.

     

    While using agent connection status is useful, and a pretty good stop-gap but the APM isn't geared toward the availability use cases.  The availability use case would be better served using an infrastructure monitoring system instead, like CA UIM. 

     

    Our implementation uses the agent connection status to give a very quick, less than one minute, notice when a system or JVM is in trouble, but that decision is making our upgrade and switching OS quite a bit more complex.

     

    Another part that might be useful, if you can, if it is possible, create adapter virtual addresses and do not hand out the actual server names so when you need to switch out the server you are having to hunt down all of the links, documentation and also field all of the end users that did not read the 100 notices that the change was occurring and can no longer log into the APM.