TEC614151: Why are WebSphere nodes in a cluster failing to start, even if one node is not instrumented?

Document created by KulbirNijjer Employee on May 26, 2016Last modified by SamCreek on May 26, 2016
Version 2Show Document
  • View in full screen mode

Document ID:  TEC614151

     Last Modified Date:  7/18/2014

     Author: KulbirNijjer

 

  • Products
    • CA Application Performance Management
    • CA Introscope
  • Releases
    • CA Application Performance Management:Release:9.1.1
  • Components
    • APM AGENTS
    • WILY INTROSCOPE
    • INTROSCOPE
    • INTROSCOPE AGENT
    • JAVA AGENTS
    • APPLICATION PERFORMANCE MANAGEMENT

 

Description:

The IBM Class Sharing feature, which is enabled out of box, causes classes to be stored locally. The classes could contain bytecode instrumented by the Agent. So even if the Agent is removed, those references are still loaded and can cause startup issues. This article addresses how to resolve this issue and what can be done to configure Class Sharing properly.

 

Solution:

On one instrumented server, WebSphere does not start, even after uninstalling the Introscope Agent. Additionally, a different node in the cluster on the same machine does not start, and it is not instrumented.When the first server is started and then trying to start the second server, the following error message appears, even though it is not configured for Introscope and it throws a "class not found" error in the logs. This "class not found" error message refers to introscope classes. For example:

SRVE0232E: Internal Server Error.
Exception Message: [com.wily.introscope.agent.trace.IMethodTracer]
or
java.lang.NoClassDefFoundError: com.wily.introscope.agent.AgentShim
at com.ibm.wsspi.bootstrap.WSLauncher.main(WSLauncher.java)
at com.ibm.wsspi.bootstrap.WSLauncher.run(WSLauncher.java:74)

Explanation

This issue is the result of the Class Sharing feature implemented in the IBM 1.5 JVM or higher JRE's. (Even in 1.4.2 if J9 VM.)
The issue manifests itself in the following ways:

    • ClassNotFound exceptions or nodes not able to start up after removing Introscope.
    • Two nodes (only one Wily enabled) running on the same machine (managed through a node agent) with the uninstrumented one failing to startup complaining about Wily classes.

The issue has been resolved by:

  1. Rebooting the machine.
  2. Shutting down all the JVMs on the machine and running "ipcs" and "ipcrm" commands to clear shared memory segments and semaphores.
  3. Copying the Agent.jar to the other node, even though it is not instrumented.

 

However, the root cause appears to be that upon startup, by default the IBM 1.5 JVM creates a shared class cache persisted to the disk in a /tmp directory named as "javasharedresources." This directory typically contains all the core Java API classes, as well as the core WAS classes. This cache is reused by other JVMs started on the same machine. This is an enhancement to reduce the virtual memory usage with multiple WAS servers running on the same server. Also, it reduces the startup time for subsequent VMs created. However, since the APM agent inserts probes into the native/core Java classes (for example, threads, sockets), as well the core WAS classes, this could cause the above issues.For example, if the java.lang.Thread class is substituted with a call to the ManagedThread class from the Wily code and this class was persisted to the cache, any other JVM starting on the same machine trying to use the same Thread Class from the shared cache will fail since this core class now has references to the Wily codebase.

JVMs make use of this shared cache feature through Inter Process Commmunication (IPC) and shared memory segments, which is why solutions A and B above work.
SolutionWily software uses bytecode manipulation to modify WAS and JVM classes to reference Wily classes. WAS 6.1 uses class sharing to cache the contents of class files to provide time and memory performance benefits. However, if it is populated with bytecodes from classes that have been manipulated by Wily software, then all server JVMs will attempt to load Wily classes. If Wily software is not active on a particular server, then failures such as this can occur. Use one of the following to resolve this:

  1. Consistently enable or disable Wily software on all servers from the specified installation.
  2. Use the -Xshareclasses option:
    • o Add -Xshareclasses:none to all servers using Wily software so that classes modified by Wily are not stored in the shared class cache.
    • o Add -Xshareclasses:name=wily to all servers using Wily software so that classes modified by Wily are stored in their own shared class cache.

Note: This is possible only with Websphere 6.1 as the class sharing feature is available only this release..

Additional Troubleshooting

This shared class feature is controlled through a JVM argument -Xshareclasses[:name=<cachename>] passed on the command line to the JVM.
Options to address this issue are as follows:

    • Disable Class Sharing by adding -Xshareclasses:none to the JVM startup arguments.
    • Use a named cache for the instrumented JVM so that other unsuspecting JVM's donot try to reuse it.
    • Clearing out existing caches on the box.
    • Find out any existing caches on the box using "java - Xshareclasses:listAllCaches"
    • Destroy them using -Xshareclasses:destroy[=<cachename>] or -Xshareclasses:destroyAll

 

However, keep in mind that these options have not been tested by CA Wily QA and should only be implemented after thorough testing in a UAT environment. For more information on the shared class cache feature, see the article on Java technology, IBM style: Class sharing.

 

Search the Entire CA APM Knowledge Base

 

search-kb.jpg

Attachments

    Outcomes