Stall with no response time over 30s?

Back to discussions

Expand all | Collapse all

1. Stall with no response time over 30s?

0 Recommend
andfer
Posted Apr 12, 2016 03:39 PM

Reply Reply Privately
Hello guys, need help to understand this

We are seeing stall counts happening at a method we instrumented on our custom pbd, but the response times for that method (we checked max/min) are never greater than 15k ms.
Here is what our metrics look like:
We have a max of 8 threads that could invoke this class/method so we believe that the stall toping a count of 7 is not a coincidence, but we just do not understand why the stall is showing up on the method that is invoked and not on the caller.
We are also not seeing stall error (error tab) for this method/class, anyway to force that to show up there?

Any ideas?
2. Re: Stall with no response time over 30s?

0 Recommend
rogelio.dipasquale
Posted Apr 13, 2016 01:53 PM

Reply Reply Privately
Are you sure that the call actually ends? I mean, Avg response time shows up when the call ends, but stall count reports when transactions are active. So, if there is anything that just kills the stalled transaction, but does not let it end correctly, we might lose contact with the call, and might not see a response time.

It's a guess, I don't know if this sounds logical to your situation

Regards,
Roger
3. Re: Stall with no response time over 30s?

0 Recommend
Broadcom Employee

Haruhiko Davis
Posted Apr 13, 2016 01:56 PM

Reply Reply Privately
You must use create an entry in your PBD using the tracer type "ExceptionErrorReporter" in order to increment "Errors Per Interval". Otherwise, EPI will always report zero.
4. Re: Stall with no response time over 30s?

0 Recommend
Billy Cole
Posted Apr 13, 2016 02:41 PM

Reply Reply Privately
Howdy Andfer,

Does the target application use AJAX or reverse proxy AJAX (comet)?

In either case the AJAX will return a response to the request, thus allowing APM to register the Average Response Time, but the thread is still opened and could be forming additional responses/data, or browser call back along the AJAX path. This would keep the thread alive which would trigger the stall counts.

Hope this helps,
Billy
5. Re: Stall with no response time over 30s?

0 Recommend
andfer
Posted Apr 13, 2016 07:09 PM

Reply Reply Privately
Thanks for the insights,

This is basically how the app is structured:
there is a pool of threads that are always up (sleep/running when it makes sense) when running, each of these threads might call that class/method we are instrumenting.
The threads are also being monitored and actually show the endless stalls and huge response times. But I still dont understand why the method that is called on another class behaves like that.

rogelio.dipasquale
There should not be anything killing the transaction (at least not that we know of so far )

Hiko_Davis
We are actually already using the ExceptionErrorReporter, we do see the stall errors from the threads we are monitoring, but none from this class/method.

bwcole
We do not have an ajax/comet running but we do have some threads that we keep on, but from the graphics, the weird thing is that we see the stalls coming and going, so I expected that at some point the response time would be computed. Now if we look at the threads we are monitoring they show a constant number of stalls (as expected).
6. Re: Stall with no response time over 30s?

0 Recommend
Billy Cole
Posted Apr 14, 2016 07:15 AM

Reply Reply Privately
Hi andfer

I'm on shaky ground on this but from what I know, the average response time is the java agent/auto probe listening for the request, method call, and the response or return of the method call.
The Stall metric is based on the active thread id and monitors threads within the thread pool of the JVM. Autoprobe/java agent will capture the thread ids within the thread pool and compare the active thread ids to the previously captured thread ids.
This would be like a J2EE data source or EJB pool.

I'm going to try to provide an example of what I think I know...again shaky ground ahead.

1. Request is sent from client to server
2. Server uses a process thread, web container thread from the web container pool. Thread 1234
3. Autoprobe - stall captures that thread 1234 is within the current metric measurement cycle
4. Autoprobe - average response time clocks request on 1234
5. Web container thread processes the request which includes a call to your process thread pool
6. Process thread pool currently has no threads within the thread pool so creates thread 5678
7. The process thread pool assigns thread 5678 to the thread 1234 transactional context
8. Autoprobe - average response time clocks request on 5678
9. Autoprobe - stall captures that thread 5678 is within the current metric measurement cycle
10. Thread 5678 processes the request and returns the data response
11. Autoprobe - clocks response on 5678, determines average response time on thread 5678
12. Process thread pool returns response to thread 1234
13. At this point, the thread 5678 is returned to the process thread pool and still listed within the threads of the JVM. Since your custom thread pool isn't one of the standard JVM thread pools, auto-probe would not recognize that the thread is a thread in a pool and monitors it like any other JVM thread.
14. Web Container thread 1234, returns response to original client thread
15. Autoprobe - clocks response to 1234 determines average response time on thread 1234
16. Autoprobe enters next measurement cycle
15. There are no client threads but the thread enlisted at step 11 is still active. Stall captures that thread 5678 is still active
16. Now, if that thread ages out of the pool or gets used by another then the stall capture, I'm guessing, would then understand that the thread is processing a different request thus resetting the stall marker. If not each cycle that the pool exists, it becomes another stall tick
17. If the thread is still marked for presence within the JVM thread pool for one more cycle then it is reported as a stall.

Hoping Hiko_Davis will correct me so I can understand how the ART and stall stuff works.

Billy
7. Re: Stall with no response time over 30s?

0 Recommend
Broadcom Employee

Haruhiko Davis
Posted Apr 14, 2016 04:07 PM

Reply Reply Privately
Stalls come from BlamePointTracer.
Incrementing EPI comes from ExceptionErrorReporter.

Make sure you have both in your PBD.

DX Application Performance Management

Stall with no response time over 30s?

andferApr 12, 2016 03:39 PM

rogelio.dipasqualeApr 13, 2016 01:53 PM

Haruhiko DavisApr 13, 2016 01:56 PM

Billy ColeApr 13, 2016 02:41 PM

andferApr 13, 2016 07:09 PM

Billy ColeApr 14, 2016 07:15 AM

Haruhiko DavisApr 14, 2016 04:07 PM

1. Stall with no response time over 30s?

2. Re: Stall with no response time over 30s?

3. Re: Stall with no response time over 30s?

4. Re: Stall with no response time over 30s?

5. Re: Stall with no response time over 30s?

6. Re: Stall with no response time over 30s?

7. Re: Stall with no response time over 30s?