Automic Workload Automation

Expand all | Collapse all

Check if the WP and CPs are running fine

Anon Anon

Anon AnonJun 23, 2016 07:46 AM

  • 1.  Check if the WP and CPs are running fine

    Posted Jun 22, 2016 08:30 AM

    Hi,

    Is there a way check if the WPs and CPs are up and running once the server is reboot and if not send out a notification.

    I think we can use some scripting language(powershell) to check that, so should that go through the database query way or is there a file which we can access and check ?

    Thanks in advance.

    Avinash



  • 2.  Check if the WP and CPs are running fine

    Posted Jun 22, 2016 08:36 AM
    I seem to remember someone else posting this question recently.

    This doesn't answer your question, but once everything is up, we are running a script (compliments of Automic staff) that checks for WP's that are over 90%, and it notifies staff. 

    :IF SYS_BUSY_01() > 90
    :   set &send_email# = activate_uc_object(CALL.SYS_INFO_WP_BUSY)
    :ENDIF


  • 3.  Check if the WP and CPs are running fine

    Posted Jun 22, 2016 08:46 AM

    Thanks Rick.

    But I needed was, when the server gets rebooted, sometimes all the WPs and CPs or webservice etc don't  come up .

    So sometimes Automic would not be responding, so in this case we would not be able to check through Automic.

    So in this case can something be done

    Thanks and Regards,

    Avinash



  • 4.  Check if the WP and CPs are running fine

    Posted Jun 22, 2016 10:24 AM
    Hello Avinash,

    I would think of 2 different ways  to do that :

    1 - You may use the ServiceManager in batch mode in order to get the list of active processes on a given host, with parameter "GET_PROCESS_LIST".

    2 - You can also use SYS_SERVER_ALIVE to get the status of a server process, or SYS_HOST_ALIVE for agents, and trigger the execution of a CALL object with ACTIVATE_UC_OBJECT if the result is negative.

    Best regards,
    Antoine


  • 5.  Check if the WP and CPs are running fine

    Posted Jun 22, 2016 01:13 PM
    We run lights-out quite a bit, so this was a big concern for us too.

    Here is the PowerShell script that we run every 30 minutes from the windows scheduler to make sure our WPs and CPs are running.  Sometimes a WP or CP can crash and not give you any indication that it has crashed, but this PowerShell detects the problem and sends an alarm, even if all WPs are down or the database has become unavailable. This PowerShell also overlays a log file every time it runs so you can tell when it last checked.  I sleep a lot better knowing this is covering our backs. What I still lose sleep about though is what if access to our email server is unavailable... then none of these alarm solutions will work.

    Note that it is configured for a CPThreashold of 2 and a WPThreshhold of 5.  This is now many you expect to be running.

    <#
    .Synopsis
       Check for UC4 running processes.  Sends alarm emails if some of them are inactive.
       EXECUTED BY WINDOWS SCHEDULER PERIODICALLY
       12/30/2014, Pete Wirfs, SAIF Corporation
    .DESCRIPTION
       Load the email function
       Set the two variables with the process information
       Check if the process count is less than the specified amount
       Send an email alert if below amount is true.
    .EXAMPLE
       Get-UC4Processes
    #>
    function Get-UC4Processes
    {
        Begin{
            $ServerName   = get-content env:computername
            $LogFileName  = "Get-UC4Processes.log"
            write-host "ServerName: $ServerName"
            echo       "ServerName: $ServerName" > $LogFileName
            $Server       = 'mail.yourorganizations.email.server.com'
            $To           = "emailaddr1@target.com","emailaddr2@target.com","emailaddr3@target.com"
            $Subject      = $ServerName + " Key Process Alert"
            $From         = "Alert_" + $ServerName + "@Saif.com"
            $CPThreshhold = 2
            $WPThreshhold = 5
            # Set process variables
            $CPProcess = Get-Process | Where ProcessName -Like "UCSrvC*"
            $WPPRocess = Get-Process | Where ProcessName -Like "UCSrvW*"
            # Capture and display the counts
            $CPCount = $CPProcess.count
            $WPCount = $WPProcess.count
            write-host "CPCount: $CPCount, expected: $CPThreshhold"
            write-host "WPCount: $WPCount, expected: $WPThreshhold"
            echo       "CPCount: $CPCount, expected: $CPThreshhold" >> $LogFileName
            echo       "WPCount: $WPCount, expected: $WPThreshhold" >> $LogFileName
        }
        Process{
            if($CPCount -lt $CPThreshhold){$Body = "UC4 CP process count of $CPCount is less than $CPThreshhold! (PowerShell script Get-UC4Processes.ps1)"; 
                Send-MailMessage -smtpServer $Server -to $To -from $From -subject $Subject -Body $Body -priority High;
                write-host "CPCount alarm email sent!";
                echo       "CPCount alarm email sent!" >> $LogFileName}else{}
            if($WPCount -lt $WPThreshhold){$Body = "UC4 WP process count of $WPCount is less than $WPThreshhold! (PowerShell script Get-UC4Processes.ps1)"; 
                Send-MailMessage -smtpServer $Server -to $To -from $From -subject $Subject -Body $Body -priority High;
                write-host "WPCount alarm email sent!";
                echo       "WPCount alarm email sent!" >> $LogFileName}else{}
            echo       "****** end of process ******"
            echo       "****** end of process ******" >> $LogFileName
        }
        End{
            
        }
    }
    Get-UC4Processes




  • 6.  Check if the WP and CPs are running fine

    Posted Jun 23, 2016 07:46 AM
    Thanks a lot Pete..


  • 7.  Check if the WP and CPs are running fine

    Posted Jun 24, 2016 06:49 AM
    Hi,

    I'm running this script every 15 minutes to check if all CP and WP processes are running and if one or more of them are NOT running, I send an email (SEND_MAIL) to myself and activate (ACTIVATE_UC_OBJECT) a job whichs sends an alert to our monitoring system.

    Here's the script:

    :set &i# = 1
    :define &processType#, string, 2
    :set &processType#[1] = "WP"
    :set &processType#[2] = "CP"
    :while &i# <= 2
    :  print "Starting &processType#[&i#] check."
    :  set &count# = 1
    :  set &count# = format(&count#, "000")
    :  set &processName# = "&$SYSTEM##&processType#[&i#]&count#"
    :  set &ret# = sys_server_alive(&processName#)
    !  Return code 20349 indicates process does not exist, so stop loop.
    :  WHILE &ret# <> 20349
    :    IF &ret# = "N"
    :      SET &Message# = '&processName# is not running !!'
    :       PRINT &Message#
    :       SET &ACT# = MODIFY_SYSTEM("STARTUP",&processName#)
    :       PUT_READ_BUFFER Message# = '&Message#'
    !       SET &OUT# = SEND_MAIL('keld.mollnitz@nordea.com',,'&Message#','The following Server Process is not running:  &processName#')
    !       SET &rn# = ACTIVATE_UC_OBJECT('TEC.UC4.SERVER.PROCESS',,,,,PASS_VALUES)
    :      ELSE
    :       PRINT '&processName# is OK'
    :    ENDIF
    :    SET &count# = &count# + 1
    :    SET &count# = format(&count#, "000")
    :    SET &processName# = "&$SYSTEM##&processType#[&i#]&count#"
    :    SET &ret# = sys_server_alive(&processName#)
    :   ENDWHILE
    :  PRINT " &processName# does not exist. &processType#[&i#] check is complete."
    :  PRINT ""
    :  SET &i# = &i# + 1
    :ENDWHILE


  • 8.  Check if the WP and CPs are running fine

    Posted Jun 24, 2016 08:38 AM

    Thanks Keld.

    But I was checking for case where the server was getting rebooted and wanted to check the WPs and CPs if they are up and running

    As some times it don't come up. So wanted to check using outside the scope of Automic.

    - Avinash



  • 9.  Check if the WP and CPs are running fine

    Posted Jun 24, 2016 08:44 AM

    Keld,

    Thanks for posting that.  may not be what  Avinash is looking for, but I may be putting this to use




  • 10.  Check if the WP and CPs are running fine

    Posted Jun 24, 2016 09:48 AM
    Hi Avinash,

    Another option is to query the Automic database directly.  Though not officially supported, the MQSRV table is (basically) where the System Overview gets its information.  The entirety of the MQ* tables is a workspace that handles the temporary information of an Automation Engine.  Doing a simple SELECT statement on startup shouldn't really affect the system.  You could easily output the results of this to a dashboard of some kind.


  • 11.  Check if the WP and CPs are running fine

    Posted Jun 27, 2016 05:12 PM
    I seem to remember someone else posting this question recently.

    This doesn't answer your question, but once everything is up, we are running a script (compliments of Automic staff) that checks for WP's that are over 90%, and it notifies staff. 

    :IF SYS_BUSY_01() > 90
    :   set &send_email# = activate_uc_object(CALL.SYS_INFO_WP_BUSY)
    :ENDIF
    I'm curious if you have seen this solution catch any issue? I've been looking over the documentation on this function recently, and the way it is worded suggests that it reports back the usage on the particular WP that is used to run the script.

    If you have one overburdened WP, won't the system select a lesser used one to run your script, effectively remaining blind to the issue?

    From the docs: Script Function: Returns the workload percent of the Automation Engine work process where the script is executed during the last minute.

    http://docs.automic.com/documentation/AE/11.2/english/AE_WEBHELP/help.htm?product=awa#ucaask.htm?Highlight=SYS_BUSY_01


  • 12.  Check if the WP and CPs are running fine

    Posted Jun 28, 2016 07:25 AM
    After reading the doc, I would say that you are correct.  However, this seems to be reporting on "any" WP's that are over 90%.  Maybe it is just a coincidence, but it does appear to be checking all

    I looked at a few more pages of doc, and found:  Returns the size of the workload of the Automation Engine during the last minute (in percent).

    This sort of sways from the individual WP, and more towards all.

    Hopefully someone with a little more experience with this script can chime in


  • 13.  Check if the WP and CPs are running fine

    Posted Jun 28, 2016 07:27 AM
    On catching the issue, it would probably helpful to get a script that could check the logs for that specific WP and send an email.

    Mark from Automic was onsite for a few days last month, and he worked on a few scripts for us.  Wish I had thought about this back then


  • 14.  Check if the WP and CPs are running fine

    Posted Jun 29, 2016 09:33 AM
    Rick Murray said:
    After reading the doc, I would say that you are correct.  However, this seems to be reporting on "any" WP's that are over 90%.  Maybe it is just a coincidence, but it does appear to be checking all

    I looked at a few more pages of doc, and found:  Returns the size of the workload of the Automation Engine during the last minute (in percent).

    This sort of sways from the individual WP, and more towards all.

    Hopefully someone with a little more experience with this script can chime in
    Thanks for that Rick. Yes, it'd be nice to hear some definitive feedback on exactly what these internal functions are reporting on. I set it up in our environment anyway to see how it looks over time.


  • 15.  Check if the WP and CPs are running fine

    Posted Jul 12, 2016 11:05 PM
    Hi everyone,

    Eric and I spent some time today building a small tool that uses the Java API to retrieve various Engine-related data (such as list of CPs and WPs, their respective status, B01, B10, B60 etc.). 

    I believe Eric will adapt it to be able to continuously monitor his AE processes and related performance.

    This tool can be made available if needed: just send me a request at bsp@automic.com and I will be happy to help.

    Bren




  • 16.  Check if the WP and CPs are running fine

    Posted Jul 13, 2016 07:18 AM
    Just sent you an email on this.

    If it is more accurate then this script we have been running, then I am very interested.


  • 17.  Check if the WP and CPs are running fine

    Posted Jul 13, 2016 08:38 AM
    Huge thanks to Brendan for the time and expertise working with the Java API on this solution!

    To be clear, Brendan developed the code to pull the individual WP and CP process usage metrics out of the system, which does not appear to be possible via any of the internal functions. SYS_BUSY_*() seems to be an aggregate (would like this confirmed), which will not help you if only a small number of your processes end up having trouble and you're hoping to be alerted. For example, yesterday we had a CP process running at ~100 for at least an hour, but when I look at the SYS_BUSY_10() readings from that same time, the values are in the 15-23 range.

    So Brendan's solution outputs the current usage (01:10:60) at runtime, for example:
    UC4OMS#CP001[2:2:2]
    UC4OMS#CP002[3:3:3]
    UC4OMS#CP003[0:0:0]
    UC4OMS#CP004[1:1:1]
    UC4OMS#CP005[0:0:0]
    UC4OMS#WP001[4:3:3]
    UC4OMS#WP002[3:3:3]
    UC4OMS#WP003[6:4:5]
    UC4OMS#WP004[16:18:10]
    UC4OMS#WP005[3:3:3]
    UC4OMS#WP006[14:17:28]
    UC4OMS#WP007[16:14:24]
    UC4OMS#WP008[3:5:5]
    It's up to us to take the numbers and do something with them. I'm going to massage the output and feed them into an in house tool we use for graphing metrics in time, so we'll have persistent records of these going forward. Will also setup monitoring, so we'll be alerted if any individual proc is having trouble. I've been wanting this for a long long time.



  • 18.  Check if the WP and CPs are running fine

    Posted Jul 13, 2016 09:09 AM

    I was hoping that Brendan's solution would do the actual notification, but I am glad to see that it is part way there.

    What are you looking at to do the notification piece?

    I was thinking about a process to read through the results, and if there is anything 90+ in each of the 01:10:60 columns, have an email sent out with the proc# and the useage

    This sounds good in my head, but writing it out is another thing.  I need to get more practice on that side




  • 19.  Check if the WP and CPs are running fine

    Posted Jul 13, 2016 09:47 AM

    What are you looking at to do the notification piece?

    I was thinking about a process to read through the results, and if there is anything 90+ in each of the 01:10:60 columns, have an email sent out with the proc# and the useage


    What you described is exactly what needs to happen, although this may be easier said than done depending on your background.

    I'm going to explore our trending/graphing tool, which I think has an alerting mechanism. Sorry I know this doesn't help you.

    Personally, while we'll capture all metrics for trending, we will probably use only the B10 metric for monitoring/alerting, at least as starting point. B01 would be noisy for us, as activity over past 1 minute varies greatly and does get naturally high around turn of the hour since any recurring jobs that are scheduled at any of 3/5/10/15/20/30/60 minute intervals will all run at turn of the hour. And B60 feels like taking average of too much time at once, although I might revisit that.

    Java has classes for sending email. Would take some dev cycles, but the parsing and alerting piece could itself be built into the solution. At that point you could just run the .jar at perhaps 10 minute intervals and alerting would be baked in. And of course you'd also want to alert if the solution can't get at the metrics--that could mean you have total platform meltdown.

    Much more involved than using the internal Automic functions, but the API appears to be the only way to get at individual proc usage. Also, in general you don't want to monitor the health of a tool with itself, so externalizing this is step in right direction.




  • 20.  Check if the WP and CPs are running fine

    Posted Jul 13, 2016 10:02 AM

    I see what you are saying.  It would be best to have the metrics stored somewhere, and then check that file to make sure:

    - it has been updated within the last minute

    - B10 has anything showing 90+




  • 21.  Check if the WP and CPs are running fine

    Posted Jul 13, 2016 09:35 PM
    agreed! i'd avoid putting any logic not directly related to pulling data from AE into the Java binary.

    i like the idea of keeping it as small as possible and using another tool as a monitoring means as Eric is setting up.

    the binary in the link below is a bit more complete and can pull most of the info available in the System Overview (per individual section or in bluk) and outputs them as a JSON structure.

    https://github.com/brendanSapience/UC4-Automic-AE-CLI-Binary-Repository/blob/master/UTIL_SystemOverview_Show.jar

    its source code is available here:

    https://github.com/brendanSapience/UC4-Automic-AE-CLI-Show-System-Overview


  • 22.  Check if the WP and CPs are running fine

    Posted Jul 29, 2016 01:48 PM
    KarthikMalali604894, you might find this discussion interesting.

    there is no script function to retrieve data from the System Overview, but you could use this:

    https://github.com/brendanSapience/UC4-Automic-AE-CLI-Binary-Repository

    the binary of interest to you is call: UTIL_SystemOverview_Show.jar

    you can read the page itself for a quick primer on how to use it, or you can simply use the binary with option “-h” for available options.

    -C: Client number (must be 0)

    -L : login

    -D: Department

    -W: password

    -H: AE Host name or IP

    -P: primary AE port

    -all: extract all info from system overview

    -file: record info extracted in file

    Ex:

    C:\Automic\CLI\UC4-Automic-AE-CLI-Binary-Repository>java -jar UTIL_SystemOverview_Show.jar -C 0 -L UC -D UC -W UC -H AETestHost -P 2217 -all -file ./MyFile.json

    %% Saving Server Info to File: ./MyFile.json

    %% Saving User Info to File: ./MyFile.json

    %% Saving Agent Info to File: ./MyFile.json

    %% Saving Cache Info to File: ./MyFile.json

    %% Saving Client Info to File: ./MyFile.json

    %% Saving Quarantine Info to File: ./MyFile.json

    %% Saving Queue Info to File: ./MyFile.json