Automic Workload Automation

  • 1.  AWA, On what basis the normal WP becomes pWP at the event of an existing pWP crashed/stopped in AE

    Posted Jul 10, 2017 02:44 AM

    AWA, On what basis the normal WP becomes pWP at the event of an existing pWP crashed/stopped in AE system with multiple AE nodes?

    Assume we have 3 AE nodes in Active-Active mode ( no NWP licences) scenario:

    As far as I understand,One of the node(very first node), one WP will assume as pWP ( usually the very first WP started in the AE system and connected to the DB). All other WP(s) in the AE system will be active and do the job processing as per its role.

    At the event of an existing pWP crashed/stopped in AE system, one of the other running WP will assume as pWP.

    However there is no proper explanation found in the documentation on what basis this happen. Will it first try one of the WP in the same node or it can be randomly on any WP on any one. Or is it as per the WP number sequence . Or is it depend on the performance value gathered at the WP start-up.  Or any other criteria?

    Appreciate if someone can share some knowledge on this?

    Rgds,

    IP



  • 2.  AWA, On what basis the normal WP becomes pWP at the event of an existing pWP crashed/stopped in AE

    Posted Jul 10, 2017 04:15 PM
    I haven't done extensive testing on this, but I've never seen it happen where a PWP on server A crashes and the next PWP comes up on server B.  I've always seen the PWP start up on the same server, but that may just be luck / chance.  My assumption would be that the next PWP to be selected would be done randomly just like the other workload balancing happens - to whichever WP is available next.  So I don't think that it's impossible for the PWP to jump to server B.  There are many Automic personnel on this forum - hopefully one of them can answer more definitively.  I'm not sure it really matters though does it?  It might be helpful to post what your concern might be about this so that they can answer / address that question as well.


  • 3.  AWA, On what basis the normal WP becomes pWP at the event of an existing pWP crashed/stopped in AE

    Posted Jul 11, 2017 03:15 AM

    Hi,

    I think I can give some explanation on this topic, in general there are 2 steps a WP needs to accomplish to become primary:

    1. Bind the primary port on os level (default is 2270).
    2. Register as primary in the database (table MQSRV).
      This registering need to be renewed continues.

    During a regular startup situation the very first WP started can easily catch both and will become PWP.

    The second WP started on the same AE note cannot bind the primary port, it will continue as regular WP.

    Another WP started on a different AE note can bind the primary port, but cannot register as primary to the database (first WP did already), it will continue as regular WP.

     

    In case the current PWP somehow crashes it’s difficult to forecast which WP will become PWP. It depends on the system setup (e.g. 1 active AE note, 2 active-active AE notes, 3 notes, etc.) and on the way the PWP disappeared.

     

    Ways PWP can become unresponsive are:

    1. Process stopped – that means the ucsrvwp process was stopped on os level; regular stop
    2. Process crashed – that means the ucsrvwp process aborts on os level or was killed
    3. Process hang up – that means the ucsrvwp process is still running on os level, but has somehow stopped acting within the AE system; e.g. waiting endless for the database, “dead-end” in code, etc.
    4. Process loops – that means the ucsrvwp process is still running on os level; consuming lot of CPU, typically one core.

    In all 4 situation the primary cannot continue to update the PWP registering according to 2) above.

    For the PWP port it’s different, simplified we can merge 1. and 2., in both situation the PWP port will become available again. We can also merge 3. and 4., in in both situation the PWP port will not become available again.

     

    Knowing this, think about some examples:

     

    Example A ) System setup: 1 AE node & PWP hand up / loop situation:

    • No other WP will become PWP, because the PWP port is not released.


    Example B ) System setup: 2 AE nodes & PWP hand up / loop situation:

    • In this case we can forecast, the WP which will become PWP. It will be the WP on the other note, which has already binded the PWP port.


    Example C ) System setup: 3 AE nodes & PWP hand up / loop situation:

    • In this case it’s between 2 WPs: On note 2 and 3 we have a WP which has already binded the PWP port. Which one it will make is randomly, it will be the one which is faster to register in MQSRV.


    Example D ) System setup: n AE nodes & PWP crash situation:

    • In this case we cannot forecast, the WP which will become PWP. Most likely it will be a WP which has already binded the primary port on another note, because it has already accomplished 1 of 2 steps to become primary. However due to timing constrains it can also happen another WP on the first note become PWP, even it need to do 2 steps.


    I hope this internals helps to understand how the Engine works.
    Enjoy working with the AE.

    KR, Josef



  • 4.  AWA, On what basis the normal WP becomes pWP at the event of an existing pWP crashed/stopped in AE

    Posted Jul 12, 2017 01:45 AM

    Thank you Josef_Scharl_103 and LauraAlbrecht608310 for the valuable inputs. 

    I’m collecting information for learning purpose and I might have some more questing on this topic soon  :)  

    I have another post on  “AE nonstop-server operation (NWP)” and I’m sure you have good knowledge on that too.

    My end goal is to create good "One page references for AE" .Appreciate your inputs in advance.

    Rgds,

    IP