Clarity

  • 1.  Process Engine Response 'Sluggish' After 13.2 Upgrade

    Posted Dec 17, 2013 05:16 PM

    Hi all. How does one troubleshoot a 'sluggish' background engine?

    We recently upgraded to 13.2 fixpack 4. Since our upgrade our users are having issue with the responsiveness of 'data management' processes such as a checkbox we use to copy the previous week's status reports and a few other 'attribute setter' processes. They've been trained that the process engine isn't like an API call, that there *is* some processing time, but usually they give it a beat, click refresh and get their expected results. Post our 13.2 upgrade, the response time on these process that usually took seconds is now minutes.

    Previous to the upgrade we deleted all completed process instances. As you can see it's not sweating and there isn't a deluge of completed processes (every quarter we purge the quarter before last to retain, in effect, a least a running quarter of history).

    While troubleshooting, I 'opened up' the pipelines to the max possible in hopes that this would help. So far this has not.

    JVM Paremeters are -Xms2048m -Xmx3072m -XX:MaxPermSize=192m -XX:-UseGCOverheadLimit.   Message Time To Live is 120, Message Reciever Interval is 1, Exception Run Interval is Normal.

    This is a dedicated process engine server. As you can see it doesn't appear to be overburdened.

    Any experience with an issue like this or tips & best practices for configuring, troubleshooting and maintaining the process engine is greatly appreciated.



  • 2.  RE: Process Engine Response 'Sluggish' After 13.2 Upgrade

    Broadcom Employee
    Posted Dec 18, 2013 11:05 AM

    Hi Rob,

    To understand better you have one app server and one server where you have BG & Beacon running ? If thats the case please make sure the multicast is working properly else you will see the performance issue. To eliminate multicast please have the PE running on the app server to see if it improves.


    Regards

    Suman



  • 3.  RE: Process Engine Response 'Sluggish' After 13.2 Upgrade
    Best Answer

    Posted Dec 18, 2013 12:18 PM

    Hi Rob,

    Sorry to hear about your PE troubles.   Here are some general recommendations I'd make for situations with slow/intermittent PE response times.

    1) Rule out multicast communications troubles between the nodes (I can share a doc with you that includes steps to follow to eliminate multicast issues).  Probably the single most common fix is to ensure that your bindAddress value is set on the CSA tab in the CSA for all nodes and is set to an IP that all your nodes share on the same IP subnet.  After setting it, restart all services.  This ensures that the multicast traffic all goes across the same IP subnet NIC that the other nodes reside on.  If bindAddress isn't set, it's nondeterministic which NIC (including loopback adaptor) the multicast will go out.

    2) Return the PE settings to the original settings as they do function most efficiently that way (messageTimeToLive=120, messageReceiverInterval=5, exceptionRunInterval=less_often).  The messageReceiverInterval is a fall-back mechanism just in case multicast is not working.  If multicast is working well, it's best to back that off to the default of 5 (minutes).  Otherwise the PE will poll the NMS_MESSAGES table every 5 minutes to see if it missed any messages.  Under normal conditions the PE receives a multicast message to wake it up when an event occurs and a new row is inserted into NMS_MESSAGES.

    3) Your pipelines are configured fine.  I recommend post cond = 5 to speed up the PE startup time as it must process the list of running processes and evaluate their states, which more than usually is that they are waiting in a post condition. 

    Sean Harp



  • 4.  RE: Process Engine Response 'Sluggish' After 13.2 Upgrade

    Posted Dec 18, 2013 03:43 PM

    Hi Suman, Sean. 

    Thanks for pointing me at muliticast. We moved to Hostfile mode with our 13.2 upgrade because, well... we're tired of chasing mulitcast gremlins and thought this was eliminating multicast all together. I've already spotted two issues. Our two dedicated BG servers had different mulitcast addresses (230.0.1.1) than my app server (230.0.0.1) . Either A) I screwewed this up, or B) When re-adding the BGs after the upgrade, our previous value (230.0.0.1) was lost (CLRT-73069 changed the JVM parameters, yes?). It could be something - once is an observation.

    That fixed, I decided to run my MCAST multicast test harness (we're on Win2K8R2) and I observed that server 1 (app) can broadcast to servers 2 & 3 (bg servers), but any packets sent from 2 or 3 are not recieved by 1. Off to open a ticket with my infrastructure groups.

    Q: Who needs to talk via multicast and how? Bi-directional App to PE? Right now it appears I can send multicast from my App to my PE but my App server is not recieving packets from my PE. Is this unidirectional communication OK?



  • 5.  RE: Process Engine Response 'Sluggish' After 13.2 Upgrade

    Posted Dec 18, 2013 04:14 PM

    Rob,

    Unfortunately there is no way to divorce Clarity from multicast at the present time.  Multicast is used extensively by all services for both cache consistency, session management and process event messages.  You must have bi-directional functionality with multicast for Clarity to work correctly.
        
      beacon uses multicast for heartbeats to show up in the CSA list (you disabled just this portion by using host mode).
        
      app uses multicast for generating event messages, cache consistency and session management.

      bg (pe only) uses multicast for both receiving and sending bpm event messages, cache consistency and session management.

    Definitely make sure your bind addresses are set correctly on all servers, multicast addresses and ports are identical (you've done that) and that your NSA password is the same for each server. 
     

    Sean Harp



  • 6.  RE: Process Engine Response 'Sluggish' After 13.2 Upgrade

    Posted Dec 19, 2013 09:04 AM

    Thanks all for the great tips, background information on the architecture & plumbing of the system, troubleshooting doc and configuration best practices. I have my head around this now - time to dig in and do the work on my end.

    Thanks!