Skip navigation
All Places > CA Security > CA Single Sign-On > Blog > 2017 > January
2017

The SMPolicyTraceAnalysis tool can be used to review trace logs from the policy server, agents and a number of other trace logs.  The link to the download is here : 

Siteminder Policy Trace Analysis 

 

 

This techtip is to explain the purpose of two new(ish) graphs that are now generated by the tool. 

 

1. Viewing the Data 

The best way to view the graphs (it is a bit confusing, since there are too many of them, fixing that is on the, long, todo list) - is to view the Bookmarks or Index side tab, as per the screenshot below you can see it is the blue selected item, and then you get a list of the chapter headings.   You can also see the two new graph items as sections :

       1.12. AccumOpenTransTime_ProcessRequest 

       1.13. AvTransTimeOverTime_ProcessRequest

 

 

 

2. Purpose of the Graphs  

The generated pdf report prints a summary at the top with the av transaction time, and a distribution of transactions times for all of the transactions in the reviewed trace files.  

 

But often, it would be good to see what is happening to the transaction time over the course of the trace file, not just a summary at the end of the whole trace.   That way if there is a holdup at a specific point in the trace, perhaps a failover of the LDAP user store, or spike in load, then it would be good to see what effect that had on the transaction times during the timesteps of the failover and how that compared to the background transaction time at other times.  

 

So these graphs are an attempt to see what is happening along the timeline of the trace logs, rather than just a summary of the whole trace log. 

 

 

3. Section 1.13 AvTransTimeOverTime_ProcessRequest 

I've started with the last new graph, as that order makes it easier to explain the Section 1.12 one, even though they are in different order in the pdf report.

 

The AvTransTimeOverTIme graph shows the transaction time over the last 5 recorded steps.  The default is usually 1 sec graph increments, and so this grap will show the average trans time (blue) and max transaction (red) time for all transactions that finished in the last 5 sec.  

 

It gives a reasonable running total of performance over the trace log.  But interestingly, since we need the transaction to be completed before we can plot it.  So if there is an immediate bad effect on performance, the graph will likely be the transaction will take longer, but the visible effect on the graph will be delayed until the transaction is completed, and not start showing as soon as the effect is felt (In this way it is a bit similar to the unix uptime command, which shows running averages, here we have av over the last 5sec, but we could also be graphing averages over the last 10sec, and last 30sec as well).    

 

Another effect on performance is change in load, if we suddenly have spike, and now have 30 active worker threads rather than 15, then that also is not quite as visible in this graph (sure you can see that in the Concurrent Threads graph).  It should have some effect, but if all 30 threads are active, and all take just as long as it took to process 15 threads, then there won't be any clue, in this graph, that we have doubled the load. 

 

 

 

Nevertheless, graphing the running av/max transaction time over the last 5 sec, is still useful - and is also a fairly easy value to explain.  In the below graph for example we have av of about 5sec response, generally with max response of 30sec, and as we can see a peak of 50sec max response, and peak of ~15sec av response. 

 

 

 

 

4. Section 1.13 AccumOpenTransTime_ProcessRequest 

As explained in the previous section, there are limitations to what you can see in graphing a running average and we need completed transactions. This graph is an attempt to get a bit better insight into the working of those incomplete transactions.

 

What we are graphing here is the total accumulated time for all in-progress transactions. That way we can see if the policy server starts going slow, as that will result in both the existing open tranactions taking longer as they are held open, and possibly also an increase in in-progress transactions as new worker threads are started to absorb the continued requests coming into the queue. 

 

So in the graph below, we can see are holding about 450sec worth of transactions (since I peaked at the Parallel/Concurrent Requests graph) I know there are 30 active threads, so each thread has been running for 450/ 30 = 15sec on av.   We can see at 8:30am the accumulated load dropped to 50sec, again since I looked at the threads running at the time, I know it dropped to 5 active threads, so we can see that on av each thread has been open for 50/5 = ~10sec.   Its not perfect, but it does give a bit of an insight into how the in-progress transactions are going, and should give a more immediate response when something adverse happens to the policy server.

 

 

 

Cheers - Mark

This is fairly widely known, but though I should document it since we often get trace log files with default settings, and that isn't enough to assess performance or throughput.

 

The default SSO logging (generally) does not include [PreciseTime].   And if you want better graphs, or better resolution in the SSO products trace logs, then you should add [PreciseTime] [SrcFile] and [Function] to your SSO trace log configuration. 


Graphs & Stats with default [Time] (1sec) resolution 

The default for most SSO products is still to use [Date] and [Time], so often with a first cut at trace analysis we get a graph that looks like this : 

 

Generally most transactions take less than a sec, so the graph and also any estimate of av transaction time is pretty useless.   Even the transactions that took "2 sec" are probably unlucky enough to have occurred at over a timeframe of : 02:24.994 to 02:25.001 - but all we see is 2:24 to 2:25 , so we can only (badly) guess the transaction time  must have taken between 1 - 2 sec. 

 

Graphs & Stats with [PreciseTime] (millisec) resolution 

If we add [PreciseTime] to the trace log then we get a better estimate of av transaction time, and a better understanding of distribution of request transaction times for example  : 

We can see two spikes of transaction times, and a small number that took longer.   Not show but we also include a summary of min max and av transaction times which then also have much more accurate values. 

 

Additional Graphs for [PreciseTime] (millisec) resolution 

Also if we add [PreciseTime][SrcFile] and [Function] to the trace log then we can also tell from the time gaps between each trace line "where" the thread spent time in processing.   

 

So we can produce a pdf report with a graph like this :

Which identifies which trace log steps (effectively thread state transitions) are taking the most time. That is a first step for identifying bottlenecks.

 

Additionally, given [SrcFile] (roughly) identifies the state of processing of the request, we can also show how many worker threads are in the same state at the same time.  for example :  

This helps identify delays and bottlenecks in processing.  if we want to improve throughput, we review this graph, pick the major bottleneck (hopefully) find a configuration change to allow it to have greater throughput.
(A common example of a policy server bottleneck and it's resolution,  if we have a single LDAPBank, we often find in this graph, that all worker threads are queued up in a state waiting to get access to the one LDAPBank.  Adding more LDAPBanks allows more threads to proceed in parallel, that decreases the number of threads in this [SrcFile] "wait"  state, and increases the overall throughput of the server)  

 

How to add PreciseTime, SrcFile and Function to the trace logging. 

The data items for SSO log file configuration are fairly simple.  There will be a configuration file eg : 

ServerTrace config file location
Policy Server/opt/ca/siteminder/config/smtracedefault.txt
Web AgentC:\Program Files\CA\webagent\win32\config\WebAgentTrace.conf

Access Gateway

(formerly Secure Proxy Server)

/opt/ca/secure-proxy/proxy-engine/conf/SecureProxyTrace.conf

 

 

The last two lines of the .conf file are the important ones they will look something like :
components: AgentFramework, HTTPAgent, WebAgent, AgentFunc, Agent_Con_Manager
data: Date, Time, PreciseTime, Pid, Tid, SrcFile, Function, TransactionID, IPAddr, IPPort, AgentName, Resource, User, Message

(note that last line will probably auto-wrap on your screen - it will be one line in the file)

We want to add and make sure that last data: line then contains [PreciseTime, SrcFile and Function. eg: 

data: Date, Time, PreciseTime, Pid, Tid, SrcFile, Function, TransactionID, IPAddr, IPPort, AgentName, Resource, User, Message

 

 

And that is all you need to do, when you next start trace logging, it will have the new values. 

 

SMTraceAnalysisTool  - where do you get it?  

The policy trace analysis tool is available from the CA communities site here: 

 

Siteminder Policy Trace Analysis 

 

It has settings to parse the following log types : 

x Policy Server Trace Log

x Policy Server smps (stats) Log

x SM Agent trace log

x SPS Agent trace log

x FWS Agent trace log

x SM SDK trace log

x ASA Websphere agent trace log

x ASA Weblogic agent trace log

x WWSI trace log

x CA Directory Query trace log

 

Hope that helps with your transaction analysis. 

 

Cheers - Mark

PS: I believe there is also an enhancement request to add these fields to the default trace logs, I will hunt up the link for that as well.

Problem

After upgrading CA SSO policy server to 12.52SP2 , it's no more able to establish a secure connection to LDAP.

The older version of CA SSO (12.52SP1) is able to establish the secure connection to LDAP just fine.

 

smps.log shows :

[2700/3456][Thu Jan 05 2017 15:35:55][SmDsLdapConnMgr.cpp:788][WARNING][sm-Ldap-02910] SSLv3 client protocol is disabled. If connection fails configure LDAP server to support TLS protocols.

[2700/3456][Thu Jan 05 2017 15:35:55][SmDsLdapConnMgr.cpp:950][ERROR][sm-Ldap-01370] SmDsLdapConnMgr Bind. Server ad2k8-01 : 636. Error 81-Can't contact LDAP server

Environment

  • Policy Server : R12.52SP2 and above
  • Policy Server OS : ANY
  • User Store : ANY LDAP

Cause

Starting r12.52 SP2 CA SSO Policy Server, the support for SSLv3 protocol for secure connection to LDAP store is disabled by default.

This change was done to mitigate the SSLv3 Poodle Vulnerability : 

https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2014-3566

 

This can be seen in the smps.log as well :

[2700/3456][Thu Jan 05 2017 15:35:55][SmDsLdapConnMgr.cpp:788][WARNING][sm-Ldap-02910] SSLv3 client protocol is disabled. If connection fails configure LDAP server to support TLS protocols.

 

What this now means is that , Policy Server now uses the TLS protocol instead to establish a secure channel to LDAP store.

The detailed list of TLS protocol supported by different version of CA SSO Policy server is listed here :

https://www.ca.com/us/services-support/ca-support/ca-support-online/knowledge-base-articles.TEC2147705.html

So, if the supported TLS protocols are NOT enabled on the LDAP server, Policy server wouldn't be able to establish a secure connection to it.

Resolution

Configure LDAP server to support the TLS protocol supported by the version of CA SSO Policy server as per :

https://www.ca.com/us/services-support/ca-support/ca-support-online/knowledge-base-articles.TEC2147705.html

As of r12.52SP2 CR1 (as of this writing) , the Policy server supports only TLSv1.0 and will fail to connect it on any other protocol.

 

So ensure that TLSv1.0 is enabled on the LDAP Server to resolve this connectivity issue. 

For e.g  In case of Active Directory you can configure the SSL protocols as per this guide :

https://technet.microsoft.com/en-us/library/dn786418(v=ws.11).aspx#BKMK_SchannelTR_TLS10

Testing:

With TLSv1.0 Disabled on Active Directory 

1. Screenshot TLSv1.0 Disabled on AD

AD_TLSV1.0_Disabled.jpg

2. Screenshot SSL Handshake failure on Policy server side

NONWorking_TLSV1.0Disabled.png

3. Screenshot Admin UI showing connection failure

ADMINUI.png

 

With TLSv1.0 Enabled on Active Directory 

1. Screenshot TLSv1.0 Enabled on AD

AD_TLSV1.0_Enabled.jpg

2. Screenshot SSL Handshake Successful on Policy server side

WORKING_TLSv1.0.png

3. Screenshot Admin UI showing connection success and retrieving result

Admin UI working.png

 

Additional Information