Kelly Wong

Tech Tip - CA Single Sign-On: Secure Proxy Server becomes unresponsive under load

Blog Post created by Kelly Wong Employee on Apr 14, 2016

CA Single Sign-On Tech Tip by Sau Lai Wong, Senior Support Engineer for 14th April 2016

 

ISSUE:

 

Secure Proxy Server logged the exception - java.lang.OutOfMemoryError: unable to create new native thread and becomes unresponsive as load increased to a certain level. SPS needs to be restarted to return to operational state.

 

Following errors are logged corresponding to the event:

== server.log ==

[31/Mar/2016:11:19:56-699] [ERROR] - Exception in thread "ajp-bio-8009-Acceptor-0"

[31/Mar/2016:11:19:56-700] [ERROR] - java.lang.OutOfMemoryError: unable to create new native thread

[31/Mar/2016:11:19:56-700] [ERROR] - at java.lang.Thread.start0(Native Method)

[31/Mar/2016:11:19:56-700] [ERROR] - at java.lang.Thread.start(Thread.java:714)

[31/Mar/2016:11:19:56-700] [ERROR] - at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)

[31/Mar/2016:11:19:56-700] [ERROR] - at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)

[31/Mar/2016:11:19:56-700] [ERROR] - at org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(Unknown Source)

[31/Mar/2016:11:19:56-700] [ERROR] - at org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(Unknown Source)

[31/Mar/2016:11:19:56-700] [ERROR] - at org.apache.tomcat.util.net.JIoEndpoint.processSocket(Unknown Source)

[31/Mar/2016:11:19:56-700] [ERROR] - at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(Unknown Source)

[31/Mar/2016:11:19:56-700] [ERROR] - at java.lang.Thread.run(Thread.java:745)

 

== error_log ==

[Thu Mar 31 11:18:10.576507 2016] [mpm_worker:notice] [pid 45393:tid 4151531264] AH00292: Apache/2.4.4 (Unix) mod_jk/1.2.37 configured -- resuming normal operations

[Thu Mar 31 11:18:10.576572 2016] [core:notice] [pid 45393:tid 4151531264] AH00094: Command line: '/opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd'

[Thu Mar 31 11:19:56.698431 2016] [mpm_worker:alert] [pid 46743:tid 4128971632] (11)Resource temporarily unavailable: apr_thread_create: unable to create worker thread

[Thu Mar 31 11:19:56.698446 2016] [mpm_worker:alert] [pid 46737:tid 4128971632] (11)Resource temporarily unavailable: apr_thread_create: unable to create worker thread

[Thu Mar 31 11:19:56.700397 2016] [mpm_worker:alert] [pid 46740:tid 4128971632] (11)Resource temporarily unavailable: apr_thread_create: unable to create worker thread

[Thu Mar 31 11:19:56.701504 2016] [mpm_worker:alert] [pid 46742:tid 4128971632] (11)Resource temporarily unavailable: apr_thread_create: unable to create worker thread

[Thu Mar 31 11:19:56.702053 2016] [mpm_worker:emerg] [pid 46737:tid 4085238640] (22)Invalid argument: ap_queue_info_set_idle failed. Attempting to shutdown process gracefully.

[Thu Mar 31 11:19:56.702093 2016] [mpm_worker:crit] [pid 46737:tid 4116708208] (22)Invalid argument: ap_queue_pop failed

[Thu Mar 31 11:19:56.702102 2016] [mpm_worker:crit] [pid 46737:tid 4116708208] (22)Invalid argument: ap_queue_pop failed

[Thu Mar 31 11:19:56.702107 2016] [mpm_worker:crit] [pid 46737:tid 4116708208] (22)Invalid argument: ap_queue_pop failed

 

== nohup.out ==
Exception in thread "ajp-bio-8009-Acceptor-0" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(Unknown Source)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(Unknown Source)
at org.apache.tomcat.util.net.JIoEndpoint.processSocket(Unknown Source)
at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)

== mod_jk.log ==
[[Thu Mar 31 11:19:56.702 2016] [17852:4095728496] [error] ajp_get_reply::jk_ajp_common.c (2126): (ajp13) Tomcat is down or refused connection. No response has been sent to the client (yet)
[[Thu Mar 31 11:19:56.702 2016] [17852:4095728496] [error] ajp_service::jk_ajp_common.c (2643): (ajp13) connecting to tomcat failed.

  

CAUSE:

 

Following are some settings on SPS:

== httpd-mpm.conf ==
<IfModule mpm_worker_module>
StartServers 3
MinSpareThreads 75
MaxSpareThreads 250
ThreadsPerChild 25
MaxRequestWorkers 400
MaxConnectionsPerChild 0
ThreadLimit 128
MaxClients 800
ServerLimit 800
</IfModule>

== Server.conf ==
ajp13.accept_count=10
ajp13.min_spare_threads=10
ajp13.max_threads=410
worker.ajp13.connection_pool_timeout=0

http_connection_pool_min_size="4"
http_connection_pool_max_size="420"
http_connection_pool_incremental_factor="4"

 

== SmSpsProxyEngine.properties ==
SpsNETE_SPS_PROXYENGINE_CMD="%NETE_SPS_JAVA_HOME%\bin\java.exe" -Xms512m –Xmx2048m -XX:MaxPermSize=256M

== proxyserver.sh ==
JVM_MEM_OPTS="-ms256m –mx2048m"

  

Customer is using ‘spsuser’ account to run both Apache and Tomcat processes. ‘spsuser’ account has max user processes (ulimit –u) set to 1024. Hence, the user is restricted to a total of 1024 threads/processes across the whole Linux server.

As load increases, Apache spawns more child processes and each child process takes up a further 27 threads. However, the Tomcat process takes up to a total of ~410 threads. That only leaves 614 threads/processes - which is equivalent to 22 httpd processes (providing the user is not running any other process or command). When 22 httpd processes are reached and Tomcat process logged the “unable to create new native thread” exception and SPS hangs.

 

[spsuser@wonsa03-I151000 proxy-engine]$ ps -o nlwp,pid,lwp,args -u spsuser | sort -n

NLWP PID   LWP COMMAND

   1 11331 11331 bash

   1 11371 11371 -bash

   1 11565 11565 -bash

   1 11761 11761 bash

   1 13262 13262 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

   1 13264 13264 /opt/CA/secure-proxy/httpd/bin/rotatelogs /opt/CA/secure-proxy/httpd/logs/mod_jk.log 10M

   1 16462 16462 ps -o nlwp,pid,lwp,args -u spsuser

   1 16463 16463 sort -n

1  4078  4078 sshd: spsuser@pts/0

1  4079  4079 -bash

  27 13282 13282 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 13283 13283 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 13901 13901 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 13960 13960 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 13961 13961 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 14074 14074 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 14080 14080 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 14299 14299 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 14303 14303 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

  27 14304 14304 /opt/CA/secure-proxy/httpd/bin/httpd -d /opt/CA/secure-proxy/httpd -k start

256 13831 13831 /opt/CA/jdk1.7.0_65/bin/java -ms256m -mx2048m -server -XX:MaxPermSize=256M -Dcatalina.base=/opt/CA/secure-proxy/Tomcat -Dcatalina.home=/opt/CA/secure-proxy/Tomcat -Djava.io.tmpdir=/opt/CA/secure-proxy/Tomcat/temp -DHTTPClient.log.mask=0 -DHTTPClient.Modules=HTTPClient.RetryModule|org.tigris.noodle.NoodleCookieModule|HTTPClient.DefaultModule -Dlogger.properties=/opt/CA/secure-proxy/Tomcat/properties/logger.properties -Dhttp_connection_timeout=60000 -Dhttp_socket_timeout=60000 -Djava.endorsed.dirs=/opt/CA/secure-proxy/Tomcat/endorsed -DNETE_WA_ROOT= -DPWD=/opt/CA/secure-proxy -classpath /opt/CA/secure-proxy/Tomcat/bin/proxybootstrap.jar:/opt/CA/secure-proxy/Tomcat/properties:/opt/CA/secure-proxy/resources:/opt/CA/jdk1.7.0_65/lib/tools.jar:/opt/CA/secure-proxy/Tomcat/bin/bootstrap.jar:/opt/CA/secure-proxy/Tomcat/lib/smi18n.jar com.netegrity.proxy.ProxyBootstrap -config /opt/CA/secure-proxy/proxy-engine/conf/server.conf

 

The above command output all the running threads/processes associated with spsuser. The result indicates that each httpd process is associated with 27 threads while the Tomcat process is associated with 256 threads at the time that this command is executed.

 

[spsuser@wonsa03-I151000 proxy-engine]$ ps -o nlwp,pid,lwp,args -u spsuser | sort -n | grep -v "NLWP" | cut -c1-5 | paste -sd+ | bc

539

 

In total, spsuser is associated with ~539 running threads/processes.

  

The most important directives used to control Apache MPM worker are ThreadsPerChild, which controls the number of threads deployed by each child process and MaxRequestWorkers, which controls the maximum total number of threads that may be launched. The maximum number of clients that may be served simultaneously (i.e., the maximum total number of threads in all processes) is determined by the MaxRequestWorkers directive. The maximum number of active child processes is determined by the MaxRequestWorkers directive divided by the ThreadsPerChild directive.

With MaxRequestWorkers=400 and ThreadsPerChild=25 in httpd-mpm.conf, we expect maximum of 16 active child processes.

However, we observed more than 16 httpd processes during load test. This is due to the conflicting directives in httpd-mpm.conf -- MaxClients vs MaxRequestWorkers. MaxRequestWorkers was called MaxClients before version 2.3.13. However, the old name is still supported.
 

RESOLUTION:

 

Options to overcome the issue:

  • increase the max user processes (ulimit -u)
  • remove the conflicting directive (MaxClients) from httpd-mpm.conf to restrict the number of active Apache child processes

Outcomes