SergioMorales

CA Tuesday Tip: 2013 - Java Agent Hangs, Crashes, OOM, High CPU, Agent conf

Discussion created by SergioMorales Employee on Apr 10, 2013
Latest reply on May 6, 2015 by rodfloprz
CA APM Tuesday Tip by Sergio Morales, Principal Support Engineer for 4/10/2013

Hi Everyone,
Here is an update of my prevoius post sent last 2011 .Below a checklist of the points you must review whenever you have any of the above mentioned Java agent issues:

Checklist:

1.Is the issue affecting all the AppServers?

2.Does the problem occur on start-up?

3.Verify that the configuration is supported – see APM compatibility guides:
https://support.ca.com/irj/portal/anonymous/phpsupcontent?contentID=883df031-705e-425b-9a0e-73130da8a204&productID=5974
If not listed, open an Enhancement via the CA APM Community site at the following address:

https://caideation.secure.force.com/ideation/idealist?c=09a30000000JkclAAC&sort=popular&lang=en_US&lrUrl=https://communities.ca.com/web/ca-wily-global-user-community/welcome&isLoggedIn=false&lrCUrl=https://communities.ca.com/web/ca-wily-global-user-community/welcome&lrCname=CAWily/APM Global User Community
Link to instructional video for the proper way to submit enhancement requests:
www.ca.com/media/idea_vision_21610/index.html

4.If the agent is configured correctly, check if there is any log file.

5.Look for possible syntax errors in the Autoprobe.log. The issue could be related to a syntax error in the pbd/pbls preventing the agent to startup.

6.Find out if the problem is related to the Agent instrumentation or a JVM bug. This is a very common issue.

NOTE: A jvm crash has to do with a defect/bug of the jvm as it is not supposed to crash under any circumstances.

Open the IntroscopeAgent profile, set introscope.autoprobe.enable=false, you need to restart the jvm.
If the problem persists, it will confirm that the problem is not related to the Agent instrumentation:

a.Try switching from -javaagent to –Xbootclasspath
If the problem persists, you need to open a support incident with the jvm vendor.
b.Upgrade to a latest JVM or use an alternate JVM

If the problem does not persist, enable the instrumentation back but disable SQLagent, any custom PBD/PBL and Agent extensions:
a)Stop the Appserver
b)In the IntroscopeAgent profile, set
introscope.autoprobe.enable=true
c)Disable SQLAgent by removing the SQLAgent.jar out of the AGENT directory.
If you are using v9.1, you can use: Introscope.agent.sqlagent.sql.turnoffmetric=true
d)Disable JMX collection by setting introscope.agent.jmx.enable=false
e)Turn off tracers for network, filesystem and System File Metrics in toggles pbd file.

#TurnOn: SocketTracing
#TurnOn: UDPTracing
#TurnOn: FileSystemTracing
#TurnOn: ManagedSocketTracing

f)Disable any additional Agent extension such as: ChangeDetector, Leakhunter, Powerpacks.
g)Disable any additional custom pbd or agent extension or formatter created by the Professional Service team.
If the problem does not occur, you must then introduce back each component one by one until you reproduce the problem.

7.If the issue is related to memory, make sure you set introscope.agent.reduceAgentMemoryOverhead=true in the IntroscopeAgent.profle

8.If the issue is related to an OOM, confirm if the problem occurs in native or heap space. You should be able to confirm this from the stack trace or threadump. If the issue occurs in native memory, the only component that works directly with native memory is Platform Monitor, try to disable it.

9.If the issue related to High CPU, disable Platform monitor – for more details about how to disable this extension see the Agent guide.

10.If the issue is related to memory, crash or performance and you are using v9.1 and IBM JDK J9, switch to use AgentNoRedefNoRetrans.jar and IntroscopeAgent.NoRedef.profile instead of Agent.jar and IntroscopeAgent.profile

11.If the issue is related to memory, crash or performance and you are using v9.x, try to disable deep inheritance (introscope.autoprobe.deepinheritance.enabled=false): note that deep Inheritance cache that we build at startup which looks at each class loaded by the JVM could have a significant memory\CPU overhead if there are huge number of classes loaded at startup. Currently, it is unlimited in size until we start aging out entries which are 10 minutes old.

12.If you are using 9.1.1.1+ and Oracle RAC as backend you might notice an overhead as we use reflection to get the correct RAC instance name. You should see some relief in CPU utilization if you add the below agent property: introscope.agent.sqlagent.cacheConnectionsURLs=true

13.Mixed mode is NOT supported. In presence of legacy extensions, tracers, pbd, please switch to using the legacy mode in agent. Turning On mixed mode (new tracers + legacy) will cause performance issues.

14.If the problem persists and you are using 9.1.x, switch to use pre 9.1 legacy mode, set:
introscope.agent.configuration.old=true (hidden property) you must add it to the profile if you want to switch
Legacy pbd/pbls are located in wily/examples/legacy, copy all files to wily/core/config and reconfigure the agent as below:
introscope.autoprobe.directivesFille=default-typical-legacy.pbl, hotdeploy,spm-legacy.pbl

15.If you are using v9.1 and Java1.7 use –XX:-UseSplitVerifier as an additional parameter along with normal Agent parameters in order to start the JVM 7 without the new verifier. Reason: JRS 202 has made the change in class verification by type checking. Classfiles with version number 51 are exclusively verified using the type-checking verified, and thus the methods must have StackMapTable attributes when appropriate. Exceptions you will see if UseSplitVerifier is not used are:
java.lang.VerifyError:StackMapTable error:bad offset
java.lang.ClassFormatError:Illegal local variable table
Java.lang.InternalError

16.For some appservers, additional configuration steps are required in order to enable the agent.

For example: “java.lang.NoClassDefFoundError: com/wily/introscope/agent/trace/IMethodTracer” error will occur if you are using OSGI felix configurations.

a) If Glassfish 3.1.2, open <glassfish_home>\glassfish\config\ osgi.properties and add wily classes to the property: eclipselink.bootdelegation=oracle.sql, oracle.sql.*, com.wily.*
b) If Glasfish 3.1.1. open <glassfish_home>\glassfish\osgi\felix\conf\config.properties and add a regexp to wily : org.osgi.framework.bootdelegation=sun.*,com.sun.*,com.wily.*
c) If Weblogic and Apache felix, add sling.bootdelegation.com.wily=com.wily.* to the sling.properties file. This setting unconditionally adds the com.wily.* package to the org.osgi.framework.bootdelegation property
d) If Tomcat and Apache felix, add the below system property:
-Datlassian.org.osgi.framework.bootdelegation=com.wily.*,sun.*,net.customware.*,org.apache.*
e) If Jboss6, use the below JVM options:
-javaagent:%WILY_HOME%\Agent.jar -Dcom.wily.introscope.agentProfile=%WILY_HOME%\core\config\IntroscopeAgent.profile -Xbootclasspath/p:%JBOSS6_HOME%\lib\jboss-logmanager.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dorg.jboss.logging.Logger.pluginClass=org.jboss.logging.logmanager.LoggerPluginImpl
f) If Jboss7: See JBoss7 1 Issues.doc

If you see the below exceptions, the problem is related to https://issues.jboss.org/browse/JBAS-7427 .
LOGMANAGER is in a wrong state due to JBOSS Class Loading problem. Application CLASSLOADING is controlled by application server, you should contact JBOSS AS:

java.lang.IllegalStateException: The LogManager was not properly installed (you must set the "java.util.logging.manager" system property to "org.jboss.logmanager.LogManager")
at org.jboss.logmanager.Logger.getLogger(Logger.java:61)
at org.jboss.as.server.Main.main(Main.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

17. Review “Known issues” section from latest product readme files
For v8.x: Introscope8.x.x.x_README.pdf
For v9.0.x: APM_Known_Issues9.0.x.x.pdf
For v9.1.xAPM_Release_Notes_9.1.x.x_EN.pdf

APM Product Releases and Announcement: https://support.ca.com/irj/portal/anonymous/phpsupcontent?contentID=378db89f-3375-42c3-a07d-9e983d13c0a6&productID=5974



What to collect if the problem persist?

Collect the following information and open an incident with CA Support.
1.Zipped content of AGENT_HOME/logs
2.IntroscopeAgent.profile
3.Generate a series of 5 thread dumps on the application server for OOM/high CPU situations spaced 5 -10 seconds apart.
4.Appserver logs
5.App server config or startup script files.
6.Core dump, if applicable.
7.Exact version of the application server, jvm and OS.
8.In case of OOM, collect heapdump. Additional jvm switches will be required for this.
For Sun jvm, add the following jvm switch: -XX:+HeapDumpOnOutOfMemoryError
9.Enable GC log. Additional jvm switches will be required for this.
For Sun jvm, add the following jvm switches: -Xloggc:<filename>.log -XX:+PrintGCDetails

Regards,
Sergio

Attachments

Outcomes