Ujwol Shrestha

Tech Tip : CA Single Sign-On :Web Agent : How to troubleshoot agent initialization issues

Blog Post created by Ujwol Shrestha Employee on Sep 6, 2017

Summary:

In this guide, we will discuss about the steps performed during web agent initialization.

 

Then, we will also deep dive into some of the common agent initialization issues and discuss approaches to troubleshoot and resolve theses issues.

 

Environment:

  • Web Agent : 12.5 and above
  • OS : ANY 

 

For this tech tip , we will test on following platform :

  • Web agent version : 12.52 SP1 CR7
  • Web Server : Apache 2.2
  • Web Server OS : RHEL 6.5 64 bit

 

Web Agent Startup Process 

 

On the high level the web agent startup process happens in the following order :

 

  1. Read WebAgent.conf
  2. Locate the path to the SmHost.conf file from WebAgent.conf and read SmHost.conf
  3. Identify the following details from SmHost.conf :
    • Policy server IP ( this policy server is used only for the initial bootstrapping)
    • Shared Secret
    • Trusted Host Name (hostname)
    • Host configuration object (HCO)
      #agentname="<AgentName>, <IPAddress>"
      HostConfigFile="/opt/CA/webagent/config/SmHost.conf"
      AgentConfigObject="aco_rhel65"
      EnableWebAgent="YES"
      ServerPath="/etc/httpd/conf"
      #localconfigfile="/etc/httpd/conf/LocalConfig.conf"
      LoadPlugin="/opt/CA/webagent/bin/libHttpPlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libSessionLinkerPlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libAffiliate10Plugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libSAMLAffiliatePlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libeTSSOPlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libIntroscopePlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libSAMLDataPlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libOpenIDPlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libDisambiguatePlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libOAuthPlugin.so"
      #LoadPlugin="/opt/CA/webagent/bin/libCertSessionLinkerPlugin.so"
      AgentIdFile="/etc/httpd/conf/AgentId.dat"

      Figure : WebAgent.conf

      hostname="th-rhel65-4"
      sharedsecret="{RC2}ovOEr7teMKP9xpKisg157/t4T1tqGXwNT0SWGsfi1QnajkcDjumFEmF9kBbw1d2MZb8CGf2ueSfWkfKmZEWrYVeM3hZ/HRI1F2bh6v4lBQq9uqi0Lp2iIQJb0flOY2oUGLmDE/iZFDQt3ceo7aCij9YQY72YD2iLLtJTKJGlu6e2nJcMzGlf2roiNfd2MwMV"
      sharedsecrettime="0"
      enabledynamichco="NO"
      hostconfigobject="hco"
      #Add additional bootstrap policy servers here for fault tolerance.
      policyserver="shruj01-i1849.ca.com,44441,44441,44441"
      requesttimeout="60"
      cryptoprovider="ETPKI"
      fipsmode="COMPAT"

          Figure : SmHost.conf

  4. Establish Agent API connection with the Policy server listed in the SmHost.conf file. This includes 3 way handshake (more details below ).
  5. Read HCO (Host configuration object) from the policy server/policy store.
  6. Establish the Agent API connection with the primary policy server listed in the HCO ( this could be same or different from the bootstrap policy server as listed in SmHost.conf)
  7. Once the connection is established with the primary policy server listed in the HCO , read ACO (Agent configuration object as per the WebAgent.conf)
  8. Initialize agent log/trace file as per the ACO configuration.

 

Three way agent to policy server handshake

 

  1. Agent opens a TCP socket connection with the policy server.
  2. Agent sends a Hello message which includes following info among other details :
    • MD5 Hash of shared secret and trusted host name combined (stronger encryption is used if using FIPS only mode)
    • Trusted host name
  3. Policy server validates the shared secret based on the trusted host name passed. It may validate both current and previous shared secret from the policy store against the shared secret sent by agent. If the shared secret validation is successful , policy server sends Hello Reply message which consists of following info among other details :
    • Session Keys
    • New shared secret (optional - this is sent only if the agent currently doesn't have the current shared secret)
    • New shared secret generated time (optional)
  4. Agent sends Hello Confirm message encrypted with the Session Keys previously sent by Policy server.
  5. (Optional ) Agent updates the SmHost.conf file with the new shared secret.

 

                                             Figure : Three way agent to Policy server handshake

 

Web Agent Initialization Error Codes :

 

Code             Meaning                                   
00 00 00 00     Debug version of SiteMinder agent is running.                                   
01 00 00 00     Unable to determine SiteMinder agent configuration file path.                                   
02 00 00 00     Unable to open SiteMinder agent configuration file or file is corrupt.                                  
03 00 00 00     Unable to load SiteMinder host configuration object or host configuration file.                                  
04 00 00 00     Unable to load SiteMinder agent configuration object.                                  
05 00 00 00     Unable to load SiteMinder local agent configuration file or file is corrupt.(EG: Web Server user does not have permissions on the Web Agent repositories & files.)
06 00 00 00     SiteMinder agent has encountered initialization errors and is exiting.                                  
07 00 00 00     SiteMinder agent has encountered initialization errors and will not service requests.                                  
08 00 00 00     SiteMinder agent is not enabled.                                  
09 00 00 00     SiteMinder agent is enabled.                                  
10 00 00 00     DefaultUserName configured for agent cannot logon to the web server. Please provide a new user name or password through central agent configuration or in the local configuration file. The current user name configured is shown below.                                  
11 00 00 00     Secure credential cache has failed to start. The data is the error code. Please check the System events for problems with service startup.                                  
12 00 00 00     SiteMinder agent is running.                                  
13 00 00 00     There was an error allocating memory for the base configuration object.                                  
14 00 00 00     Sm_AgentApi_Init Failed.                                  
15 00 00 00     Failed to Start the LLAWP process.                                  
16 00 00 00     Resource cache failed to initialize.                                  
17 00 00 00     Session cache failed to initialize.                                  
18 00 00 00     Failed to send message to the LLAWP.                                  
19 00 00 00     Failed to initialize the message bus.                                  
20 00 00 00     Failed to initialize the log queue.                                  
21 00 00 00     Failed to initialize the configuration manager.                                  
22 00 00 00     Server already running.                                  
23 00 00 00     Unable to open file.                                  
24 00 00 00     Configuration file path:                                  
25 00 00 00     Failed to send close message to LLAWP.                                  
26 00 00 00     LogonUser failed for specified user shown below.                                  
27 00 00 00     Invalid character found in the server path variable. Make sure that alphanumeric values are used. the invalid character shown below.                                  
28 00 00 00     Message bus already initialized.                                  
29 00 00 00     PID Cache error.                                  
30 00 00 00     Resource cache re-initialized.                                  
31 00 00 00     Session cache re-initialized.                                  
32 00 00 00     Web-agent process is exiting...                                  
ff ff ff ff     unable to get the HostConfigurationObject from any Policy Server                                  

 

Basic troubleshooting:

 

  • On UNIX platform, ensure you have sourced the web agent environment script (ca_wa_env.sh) before starting the webserver.
    [root@rhl65 webagent]# pwd
    /opt/CA/webagent
    [root@rhl65 webagent]# source ./ca_wa_env.sh
    [root@rhl65 webagent]#

    On Windows, the web agent environment variable are set as system environment variable. Ensure that the user running the web server process has access to these system environment variables.

  • Ensure path to the host configuration file (HostConfigObject) is valid in WebAgent.conf. 
  • Ensure the name of the agent configuration object (AgentConfigObject) is valid in WebAgent.conf (This is case sensitive field and need to match against the name of the ACO in the policy store)
  • Ensure that the user under which web server process runs has write permission to SmHost.conf (This is optional requirement. This is required only if the shared secret rollover functionality is used. )
  • Ensure that either defaultagentname or the agentname ACO parameter must be set in the policy store.
  • Ensure that the policy server FQDN/IP is specified in SmHost.conf file. Also ensure that you can ping and resolve the DNS for the specified policy server from the web server host.
  • Ensure that a valid Policy server FQDN/IP is specified in Host configuration object in the policy store. Also ensure that you can ping and resolve the DNS for the specified policy server from the web server host.
  • If there are multiple policy server IP listed in SmHost.conf and HCO , it is usually best to start with just one policy server and comment the remaining servers out. This will help and ease the troubleshooting.
  • On Unix platform, if there are multiple web agent instance running on the same box, ensure that a unique ServerPath is specified for each instance. 
  • Ensure that you can telnet to the Policy server ports from web server
telnet <policyserverIP> <policy server ports>

Test connectivity to all the ports - accounting, authentication , authorization 

 

Advance troubleshooting:

 

Following logs will be required to be analyzed for advanced troubleshooting:

 

Linux :

  • web server error/startup log
  • policy server logs (smps.log) and trace log (smtracedefault.log)

At minimum use following profiler for policy server trace log :

components: AgentFunc, Server/Connection_Management, Server/Policy_Server_General, Tunnel_Service
data: Date, Time, Pid, Tid, TransactionID, SrcFile, Function, User, UserDN, Directory, SessionID, SessionSpec, ErrorValue, ErrorString, Realm, Resource, Action, Rule, Policy, Domain, Message, PreciseTime, ReturnValue, Group, AgentName, AgentType, ObjectClass, DomainOID, SearchKey, ObjectOID, Property, IPAddr, IPPort, AuthStatus, AuthReason, AuthScheme, CertSerial, SubjectDN, IssuerDN, CertDistPt, RealmOID, State, ClusterID, HandleCount, FreeHandleCount, BusyHandleCount, ResponseTime, Throughput, MaxThroughput, MinThroughput, Threshold, TransactionName, Data, HexadecimalData, Query, ActiveExpr, CallDetail, RequestIPAddr, Returns, Expression, Result, CacheHits, CacheSize, RefCount, ExecutionTime, Tenant
version: 1.1
  • strace log from the web server startup ( this will help to identify any issue related to file permission/library dependency/environment variable etc )

    strace -Ff -t -i -v -o strace.log -s 16384 <command to start web server>

    e.g.

strace -Ff -t -i -v -o strace.log -s 16384 apachectl start 
  • tcpdump from webserver

tcpdump -i <network interface> -s 65535 -w <some-file.pcap>
To find the available network interface, you can run following command : ifconfig

[root@rhel65_3 ~]# ifconfig
eth2      Link encap:Ethernet  HWaddr 00:50:56:21:B6:A8 
          inet addr:155.35.245.220  Bcast:155.35.245.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fe21:b6a8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:432695 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3238 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:46333929 (44.1 MiB)  TX bytes:926017 (904.3 KiB)

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:31 errors:0 dropped:0 overruns:0 frame:0
          TX packets:31 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2012 (1.9 KiB)  TX bytes:2012 (1.9 KiB)

[root@rhel65_3 ~]#


then, run the tcpdump command as below :

[root@rhel65_1 Desktop]#  tcpdump -i eth2 -s 65535 -w watopsnetworkcapture.pcap
tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
^C244 packets captured
248 packets received by filter
0 packets dropped by kernel
[root@rhel65_1 Desktop]#
  • web agent logs & trace ( note : if there is an agent initialization issue, these logs will most likely not be created as they are created only at the end of init process, however it is good to get this configured just in case )At minimum enable following profiler for web agent trace :
    components:  AgentFramework, HTTPAgent, WebAgent, Agent_Con_Manager
    data: Date, Time, Pid, Tid, SrcFile, Function, ResponseTime, IPAddr, IPPort, AgentName, Resource, User, Threshold, Throughput, MinThroughput, MaxThroughput, HandleCount, BusyHandleCount, FreeHandleCount, State, ClusterID, Message

 

Solaris/AIX

All the logs from Linux except strace logs are applicable for Solaris/AIX based system.

The Linux's strace equivalent is truss in Solaris.

You can capture truss output from web server startup as below :

truss -a -e -f -D -l -o /tmp/truss.out -rall -wall <command to start webserver>

 

Windows:

  • For windows the equivalent of strace is process monitor logs (procmon.exe)

Process Monitor - Windows Sysinternals | Microsoft Docs 

  • TCPDump can also be replaced with the wireshark network capture.

Wireshark · Go Deep. 

  • Event Viewer would also be helpful.

 

Commons Issues :

Problem : Policy server smps.log shows following error :

[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2035][ERROR][sm-Tunnel-00010] Bad security handshake attempt. Handshake error: 3154
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2046][ERROR][sm-Tunnel-00050] Handshake error: Shared secret incorrect for this client
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2207][ERROR][sm-Server-01070] Failed handshake with 155.35.245.220:50711

The web server log (Apache for our test) shows agent initialization errors:

[04/Sep/2017:23:41:45] [Error] SiteMinder Agent
Unable to load SiteMinder host configuration object or host configuration file.
/opt/CA/webagent/config/SmHost.conf
06 00 00 00
[04/Sep/2017:23:41:45] [Error] SiteMinder Agent
Failed to initialize the configuration manager.
LLAWP unable to get configuration, exiting.
nm: '/etc/httpd/bin/httpd': No such file

 

No Agent Log/Trace is created.

 

Cause :

If no changes has been done either in the policy server side or on the agent side since the last working state, then this error indicate a possible change in the hostid (for unix based system) on the web agent side.

 

On all non-Windows platforms, the agent code used to encrypt and decrypt the shared secret uses a key that is derived from a hard coded value (Web Agent Host Key) combined with the results of calling gethostid() on the platform in question. gethostid() is a standard C Library function that returns a 32-bit long value.

Different UNIX system implements this function differently. For e.g Linux, AIX and solaris , the system implementation for the gethostid() C library function is not the same.
As such, SiteMinder web agent might not be able to decrypt the shared secret generated in one UNIX system when moved to other system. Not only that, if the host ID of the same system changes (due to change in IP, hostname etc ) , the webagent will not be able to decrypt the shared secret which was originally generated on the same system.

 

Testing :

 

Set

  • hostname = rhel65_1.ca.com
  • IP =192.168.0.6 (in the hosts file)

hostid gives output as a8c00600

 

 

Agent initializes fine with the 3 way agent to PS handshake being successful

 

 

Now, change the IP address to 192.168.0.7 in the hosts file with everything else remaining the same.

This time hostid command gives a different result : a8c00700

 

 

3 way agent to PS handshake now fails with the following error in smps.log :

[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2035][ERROR][sm-Tunnel-00010] Bad security handshake attempt. Handshake error: 3154
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2046][ERROR][sm-Tunnel-00050] Handshake error: Shared secret incorrect for this client
[6512/7048][Tue Sep 05 2017 06:41:45][CServer.cpp:2207][ERROR][sm-Server-01070] Failed handshake with 155.35.245.220:50711

 

 

Also, note in the network capture, the encrypted data in front of the trusted hostname is now different.

If the shared secret+trusted hostname+hostid combination is same, the encrypted data should remain same.

 

Resolution :

As we saw above, a simple change in the IP address resulted in the change in the hostid in RHEL system. This in turn invalidated the shared secret stored in SmHost.conf. There could be more factor contributing to the change in the hostid which is dependent on the platform being used.

 

The only way to fix this issue is by re-registering the trusted host or reverting to the previous hostid( reverting to previous IP address in this case).

 

smreghost -i <policyserver_ip>:44441,44442,44443 -u "siteminder" -p <siteminder super user password> -hn <trustedhost> -hc  <hco> -cf COMPAT -f <Path_To_SmHost.conf> -o

e.g

 

[root@rhl65 bin]# pwd
/opt/CA/webagent/bin
[root@rhl65 bin]# smreghost -i shruj01-i1849.ca.com:44441,44442,44443 -u "siteminder" -p "siteminder" -hn th-rhel65-5 -hc  hco -cf COMPAT -f /opt/CA/webagent/config/SmHost.conf -o
Host Registration written to '/opt/CA/webagent/config/SmHost.conf'.
[root@rhl65 bin]#

 

If it is expected to have frequent changes in the web server IP address (due to reboot/change in network interface/dns server etc.) , it is recommended to specify a static hostid.

 

In RHEL you can do this by running command : genhostid. The static hostid is then stored in /etc/hostid file.

(Please refer to your respective OS documentation on how to set static hostid)

 

 

 

To be continued....  

 

 

 

 

 

Outcomes