The SMPolicyTraceAnalysis tool can be used to review trace logs from the policy server, agents and a number of other trace logs. The link to the download is here :
This techtip is to explain the purpose of two new(ish) graphs that are now generated by the tool.
1. Viewing the Data
The best way to view the graphs (it is a bit confusing, since there are too many of them, fixing that is on the, long, todo list) - is to view the Bookmarks or Index side tab, as per the screenshot below you can see it is the blue selected item, and then you get a list of the chapter headings. You can also see the two new graph items as sections :
2. Purpose of the Graphs
The generated pdf report prints a summary at the top with the av transaction time, and a distribution of transactions times for all of the transactions in the reviewed trace files.
But often, it would be good to see what is happening to the transaction time over the course of the trace file, not just a summary at the end of the whole trace. That way if there is a holdup at a specific point in the trace, perhaps a failover of the LDAP user store, or spike in load, then it would be good to see what effect that had on the transaction times during the timesteps of the failover and how that compared to the background transaction time at other times.
So these graphs are an attempt to see what is happening along the timeline of the trace logs, rather than just a summary of the whole trace log.
3. Section 1.13 AvTransTimeOverTime_ProcessRequest
I've started with the last new graph, as that order makes it easier to explain the Section 1.12 one, even though they are in different order in the pdf report.
The AvTransTimeOverTIme graph shows the transaction time over the last 5 recorded steps. The default is usually 1 sec graph increments, and so this grap will show the average trans time (blue) and max transaction (red) time for all transactions that finished in the last 5 sec.
It gives a reasonable running total of performance over the trace log. But interestingly, since we need the transaction to be completed before we can plot it. So if there is an immediate bad effect on performance, the graph will likely be the transaction will take longer, but the visible effect on the graph will be delayed until the transaction is completed, and not start showing as soon as the effect is felt (In this way it is a bit similar to the unix uptime command, which shows running averages, here we have av over the last 5sec, but we could also be graphing averages over the last 10sec, and last 30sec as well).
Another effect on performance is change in load, if we suddenly have spike, and now have 30 active worker threads rather than 15, then that also is not quite as visible in this graph (sure you can see that in the Concurrent Threads graph). It should have some effect, but if all 30 threads are active, and all take just as long as it took to process 15 threads, then there won't be any clue, in this graph, that we have doubled the load.
Nevertheless, graphing the running av/max transaction time over the last 5 sec, is still useful - and is also a fairly easy value to explain. In the below graph for example we have av of about 5sec response, generally with max response of 30sec, and as we can see a peak of 50sec max response, and peak of ~15sec av response.
4. Section 1.13 AccumOpenTransTime_ProcessRequest
As explained in the previous section, there are limitations to what you can see in graphing a running average and we need completed transactions. This graph is an attempt to get a bit better insight into the working of those incomplete transactions.
What we are graphing here is the total accumulated time for all in-progress transactions. That way we can see if the policy server starts going slow, as that will result in both the existing open tranactions taking longer as they are held open, and possibly also an increase in in-progress transactions as new worker threads are started to absorb the continued requests coming into the queue.
So in the graph below, we can see are holding about 450sec worth of transactions (since I peaked at the Parallel/Concurrent Requests graph) I know there are 30 active threads, so each thread has been running for 450/ 30 = 15sec on av. We can see at 8:30am the accumulated load dropped to 50sec, again since I looked at the threads running at the time, I know it dropped to 5 active threads, so we can see that on av each thread has been open for 50/5 = ~10sec. Its not perfect, but it does give a bit of an insight into how the in-progress transactions are going, and should give a more immediate response when something adverse happens to the policy server.
Cheers - Mark