SungHoon_Kim

Troubleshooting Web Agent Handshake issues

Blog Post created by SungHoon_Kim Employee on Mar 15, 2016

Web Agent is not able to handshake with Policy Server.

Then following information will help you to capture the necessary data to troubleshoot.

 

Basic INFO

IP 10.0.0.1 is Policy Server

IP 10.0.0.2 is WA1

 

HCO Name: hco

ACO Name: aco

TrustedHostName: trustme

 

Policy Server has multiple IP address(to simulate multiple Policy Server environment) which are:

 

PS1: 10.0.0.11

PS2: 10.0.0.12

PS3: 10.0.0.13

 

SmHost.conf file lists following IP.

policyserver="10.0.0.3,44441,44442,44443"

policyserver="10.0.0.1,44441,44442,44443"

 

HCO has a cluster and have following IPs.

  1. 10.0.0.11
  2. 10.0.0.12
  3. 10.0.0.13

 

This will make it easier to track which request was for bootstrap and which one was for agent requests from users.

 

Run wireshark on PS and use "tcp.port == 44443" filter to display only the WebAgent related traffic.

 

Do the same at the WebAgent machine.

Use "tcp.port == 44443" filter to display only the Web Agent related traffic.

 

Once sufficient data is captured, stop capturing network trace from both PS and WA side.

At Wireshark, click on "Edit ==> Mark All Displayed Packets" highlighted below in yellow.

 

Now that the traffic is selected, goto "File ==> Export Specified Packets…"

 

This will export only the packets that are displayed and selected.

So, no other traffic are exported. This is useful if customer is concerned about what data is being exported and if there are any sensitive information that is being exposed. (YES, IP ADDRESSES ARE REVEALED!)

 

Export the captured traffic.

In this sample, it is saved as "TestPSnetwork", the extension will be added automatically.

Double click on the file to open from wireshark.

 

Without any filter used, the whole traffic in that capture file was 174 because only those were exported.

 

Same goes to the network captured at the Web Agent side.

It has only 188 frames without using any filter.

 

Now, to see what is being communicated, all the data are encrypted with sharedsecret in the SmHost.conf file so it cannot be decrypted.

However, there is one information that is not encrypted. It is the "Trusted Host Name".

If you look at the frame 4 above, 10.0.0.2(WA) is pushing data to 10.0.0.1(PS in SmHost.conf) and you can see "trustme" trusted host name.

 

Policy Server acknowledge and push some data to WA.

It is all encrypted so we will not know what data it is but can assume it is doing siteminder handshake to establish the encrypted tunnel.

 

Then back and forth there are communications and should be downloading HCO.

Once the WebAgent has handshaked with Policy Server in SmHost.conf and downloaded the HCO, it will disconnect the connection.

 

You can see that from frame #12. You can also confirm that the port 49286 was used for contacting the Policy Server.

So, there was only 1 connection to the Policy server.

 

Then WebAgent connects to 10.0.0.11(PS1 in HCO) at frame #13.

Because the HCO lists 3 policy servers, WebAgent need to try connecting to all 3 Policy Servers.

You can see frame #13 is connection to PS1, #16 is to PS2 and #19 is to PS3.

WebAgent used Port 49287, 49288 and 49289 to attempt that connection.

 

At frame #22, WebAgent is performing 1st sharedsecret handshake with PS1.

You can see the trusted host name.

After that handshake was successful, webagent creates another connection to PS1 and is using port 49290 at frame #28.

Frame #31 shows webagent handshaking with PS1 for the 2nd connection and sending trustedhostname.

 

The reason being, the HCO said New Socket Step is 2. So, when WebAgent need to create new connections, it will have to create 2 connections each time.

Same is repeated for PS2 and PS3.

 

 

By default, wireshark displays the time in elapsed format.

Meaning, the first frame will be starting from 0 and all the subsequent frames will display the time spent from the first frame.

 

You need to change it to display the "Year/Month/Day Hour:Minute:Second" format.

 

To do so, goto "View ==> Time Display Format ==> Date and Time of Day" as highlighted below.

 

Then it will display the time as below.

This timestamp can be matched with your webagenttrace.log and smtracedefault.log to determine what is going wrong when the handshake is failing.

 

SiteMinder logs themselves will not be meaningful without the network trace.

(You will need to run "smpolicysrv -stats" at every minute to get statistic on the load changes and need to enable smtracedefault.log to get some more information if policy server was getting unexpected requests or if it had other problems while trying to handle agent requests).

 

If your environment has handshake issues, then try comparing with the sample network trace demonstrated here.

Better way is to capture your own working handshake use case from your environment and then compare it with the not-working use case.

 

If you see any "TCP Re-Transmission", then try to see if there are any ACK going missing or firewall blocking those or dropped packets.

 

Your network administrator should be able to give you statistics on the network health and also the traffic load changes between specific servers(WA to PS, PS to UserStore and etc) that might have caused a bottleneck causing unexpected behaviours such as taking longer time to handshake.

 

 

 

Sample logging options below.

 

smps.log ==> run "smpolicysrv - stats" every minute

smtracedefault.log ==> minimum you will need the following 3 lines.

components: AgentFunc/Init, AgentFunc/UnInit, AgentFunc/Tunnel, AgentFunc/GetConfig, AgentFunc/DoManagement, Server/Connection_Management, Server/Policy_Server_General, Tunnel_Service

data: Date, PreciseTime, Pid, Tid, SrcFile, Function, IPAddr, IPPort, Message, ClusterID, Throughput

version: 1.1

WebAgentTrace.log ==> use AgentConMgr.conf for tracing.

 

Also, refer to attachment "samplefiles.zip" which contains the network captured and exported from both PS and WA.

 

Attachments

Outcomes