Since the WebAgent process is getting unresponsive; it would need some deeper investigation via Support Channels.
Is this issue reproducable OR has occurred repeatedly over a period of time i.e. not just one time occurrance.
Assuming the latter; my recommendation would be in different parts to investigate deeper.
NOTE : Running PART-A, PART-B and PART-C in production could have slight performance implications. Always safe to run these on non-Prod Environments. Turn'em off once debugging is complete in Production. PART-D is more of performance tuning.
PART-A : Run diagnostics on the LLAWP and w3wp.exe Process.
Running diagnostics would ensure, that in the event of a Crash or Hang; CA Engineering would have substantial data of the State of process & memory when the Process is hung.
- Run a Debug Diagnostics Tool on the LLAWP and IIS process to print a DUMP when a CRASH or HANG state occurs.
- We could use DebugDiag or ADPlus; since WebAgent OS is Windows.
- Make sure which ever tool is used, it is configured for a "FullDump" and not a "MiniDump".
- Since we have configured "FullDump"; there would be lot of dumps [FirstChance Exceptions] getting generated (Each may be in size of 700MB to 900MB).
- What is of our interest is a Dump which is called [SecondChance Exception] and one or two [FirstChance Exception] just before the [SecondChanceException].
- Until such time there are no [SecondChance Exception] dump; we could delete all [FirstChance Exception] dumps.
PART-B : Run PerfMon on LLAWP and w3wp.exe Process.
Running Performance Monitor and save it in CSV format, this would ensure we are capturing the memory state of LLAWP Process. Just to make sure there are no memory leaks. Configure Performance Monitor to run every 5mins and configure it to capture the below attributes.
- memory\available bytes
- memory\committed bytes
- process(LLAWP)\processor time
- process(LLAWP)\private bytes.
- process(LLAWP)\thread Count.
- process(w3wp)\thread Count.
- process(w3wp)\processor time
- process(w3wp)\private bytes.
PART-C : Enable WebAgent Tracing.
Enabling Tracing does have performance implication. However need of the hour is for as much details possible, it would worth to sacrifice a bit of performance until the issue is resolved. Turn of Tracing in Production once issue is resolved OR support has all necessary information.
PART-D : Tune the HCO, Increase the Max Sockets Per Port in the HCO.
The trusted host and Policy Server communicate across TCP/IP connections. The number of available sockets for the authorization, authentication, and accounting ports of the Policy Server determines the number of available TCP/IP connections. The number of sockets per port controls the number of simultaneous threads accessing the Policy Server from the web server. Separate web server threads handle each user access request. Each thread requires its own socket. The web server maintains a pool of threads for requests and only creates one when there are no more available threads. As traffic increases, the number of sockets per port must increase. Several settings affect the TCP/IP connections between the trusted host and the Policy Server.
- Maximum Sockets Per Port
Defines the maximum number of TCP/IP connections that the trusted host uses to communicate with the Policy Server. By default, this value is set to 20, which suits low- and medium-traffic web sites. Increase this number in the following situations:- You are managing a high-traffic web site.
- You have defined agent identities for virtual servers.
- Minimum Sockets Per Port
Determines the number of TCP/IP connections open for the Policy Server at startup. The default value is 2. If you are managing a high-traffic web site, increase this number. - New Socket Step
Specifies the number of TCP/IP connections that the Agent opens when new connections are required. The default value is 2. Increase number of sockets to add at each required at each level when you require more sockets
PART-E : Open a support Ticket.
Unfortunately we do not have the capability to read and validate dumps on public forums. Hence when there is a dump available for a Crash or Hang. Open a Support Ticket and have all of the above information ready for Support; as this would save crucial time and to/fro communications. Trust me the to/fro comms could be a bit frustrating especially when there are Service Outages and we do not have the right data for analysis.
PART-F : Provide a bit more deeper info on the Setup in Support Ticket.
IIS Setup Info
- Application Pool Bit Version.
- Application Pool mode i.e. Integrated or Classic.
- Is Web Gardening being used, how many processes is w3wp set to.
- How many WebSites on IIS or just the on Default WebSite.
- Is there any other plugins enabled on IIS e.g. any Reverse Proxy Plugins?
PART-G : Check Policy Server Heatlh.
Kindly check the Policy Server for any abnomalities being reported during the tenure when WebAgent stops responding.
Regards
Hubert