Automic Workload Automation

  • 1.  What is 'Keep Alive' and how to use it

    Posted Sep 26, 2016 06:55 AM
    General:
    The so called "Keep Alive" is a healthy mechanism which checks periodically if the TCP/IP connection between Automic Agent and Automic Server works.
     
    The KEEP_ALIVE parameter is set in the UC_HOSTCHAR_DEFAULT (or the corresponding UC_HOSTCHAR_*) variable in client 0:
    Time interval for the periodic Automic Automation Engine check; Allowed values: 60 and above; Default value: 600 seconds
    The value that is defined here must not be less than 60 seconds. Otherwise, the default value is used.
    The specified value must also result in complete minutes (such as 60, 120, 180). If you use a different value, it is rounded up to the next minute (for example, a value of 99 seconds results in 120 seconds).

    The mechanism in detail:
    The mechanism works in the following way:
    1. The PWP sends an EXQUERY message to the Agent and waits for an EXINFO message sent back by the Agent.

    1.1. If the answer arrives at the PWP within the KEEP_ALIVE time everything is fine. The PWP starts the next check after the KEEP_AILVE time is over (è back to 1.).
    1.2. If the answer doesn't arrive within the KEEP_ALIVE time the Server drops the connection to the Agent.

    2. The Agent gets the KEEP_ALIVE parameter when it connects to the Server and it adds 60 seconds. This is logged in the Agents logging (where &02 = KEEP_ALIVE + 60):
    U2000017 The check interval for 'Server' has been set to '&02' seconds.

    If the Agent gets no EXQUERY message within that time (KEEP_ALIVE + 60) it will send a SRVQUERY message to the Server. This happens only, if for any reason the EXQUERY form the Server doesn't reach the Agent!

    2.1. If the answer from the Server (PWP) arrives within the KEEP_ALIVE + 60 time everything is fine. The Agents starts the next check, if necessary (è back to 21.).
    2.2. If the answer doesn't arrive within the KEEP_ALIVE +60 time the Agent drops the connection to the Server.
     
    Once an Agent is disconnected (case 1.2. or 2.2.), it will try to reconnect within the reconnect interval until the reconnect was successful.
     
    So the KEEP_ALIVE is a bi-directional health check for the Agent - Sever connection, which guaranties a reconnect in case of any connection failure.

     
    Note: Other parameter which influence the KEEP_ALIVE processing:
    A.) UC_SYSTEM_SETTINGS: SERVER_OPTIONS -->  9th digit
    With this setting, the Agent is not disconnected if the time specified in KEEP_ALIVE is exceeded (see case 1.2. above). A message is written to the Server logging and the monitoring period is extended for the time specified in KEEP_ALIVE.

    B.) UCSRV.INI: [CPMsgTypes], srvquery
    Performance optimization if many (several thousand) Agents log on at the same time. Allowed values: "0" (default value) and "1"; (see case 2.1. above)

    "0" - The primary work process responds to the Agents' live messages.
    "1" - The communication processes can process these specific messages and in doing so, they increase the Automic system performance.
     
    Recommendations:
     
     I. KEEP_ALIVE: Should be default, which is 600.
    In really well performing environments (Automic Server, Automic Database, Automic Agents and Network) it can also be set lower, but never lower than 300!
    A lower setting than 600 should only be used for single Agents, which should have the highest availability need!
     II. UC_SYSTEM_SETTINGS: SERVER_OPTIONS 9th digit:
    Should not be used. Make more new troubles else it fixes.
     III. UCSRV.INI: [CPMsgTypes], srvquery:
    Should be set to 1. It makes more sense that a CP answers a connection query, than the PWP. Good for PWP performance, especially in high PWP load situations.

     
    Note:
    The KEEP_ALIVE is independent from the job submission or any other processing (Filetransfer, Events, etc.) of the Agent.


  • 2.  What is 'Keep Alive' and how to use it

    Posted Sep 26, 2016 07:21 AM
    Hi,  
    This is interesting, thanks for sharing.
    Just to make sure I fully understand this bit...


     
    Once an Agent is disconnected (case 1.2. or 2.2.), it will try to reconnect within the reconnect interval until the reconnect was successful.
     

    Are you saying that if an Agent ever goes down, i.e. loses connection with the Engine, every 600 secs(default), there is an attempt made to reconnect the Agent to the Engine?

    John.



  • 3.  What is 'Keep Alive' and how to use it

    Posted Sep 27, 2016 06:37 AM
    Thanks for sharing.


  • 4.  What is 'Keep Alive' and how to use it

    Posted Sep 28, 2016 05:33 AM
    JohnO'Mullane
    "Are you saying that if an Agent ever goes down, i.e. loses connection with the Engine, every 600 secs(default), there is an attempt made to reconnect the Agent to the Engine?"

    Yes exactly - the connection is always initiated from the agent to the CP.


  • 5.  What is 'Keep Alive' and how to use it

    Posted Sep 28, 2016 05:46 AM
    So, if I go to my system overview now and "Quit" one of my unix agents, in 600 seconds an attempt will be made to "Start" this agent?

    That's not what I'm seeing happen..


  • 6.  What is 'Keep Alive' and how to use it

    Posted Sep 29, 2016 03:56 PM
    No, quit will stop the agent.

    We have to differ between Stop (a.k.a. Quit or shutdown) and disconnect.

    Everytime the connection is lost or interrupted and the agent process on the os keeps running the agent will try to reconnect.

    if the agent process on the Os crashes or is killed then the Agent really died and its nothing left to reconnect :-)
    then you must Start  the agent via shell or smgr or UI.

    if you diconnect the agent from system overview the process on OS keeps running and tries to reconnect.




  • 7.  What is 'Keep Alive' and how to use it

    Posted Sep 30, 2016 03:54 AM
    FrankMuffke
    Yes, that makes sense, thanks for that.
    Does this include all agents like SQL, Webservice etc?



  • 8.  What is 'Keep Alive' and how to use it

    Posted Sep 30, 2016 04:01 AM
    You are welcome :-)

    Yes that includes every type of Agent.
    Yes and RA Agents too :-)

    Just looked at my UI - "disconnect Agent connection" is the menu name in System Overview/Agents to diconnect the agent :-)


  • 9.  What is 'Keep Alive' and how to use it

    Posted Sep 30, 2016 04:16 AM
    Excellent, thanks.


  • 10.  What is 'Keep Alive' and how to use it

    Posted Sep 30, 2016 05:25 AM
    THX too :-)