DX Unified Infrastructure Management

Expand all | Collapse all

Verify all Robots / Probes On Specfic HUB

  • 1.  Verify all Robots / Probes On Specfic HUB

    Posted May 04, 2018 06:05 AM

    We have a script that runs every 15mins which looks for the alarm message from the controller "FAILED to start" and then it runs the Probe_verify & probe activate against these.  

     

    We have a dedicated HUB for Servers built from images and we have hundreds of these in this state on a daily basis, does anyone have a similar script we could run against all Robots against a specific HUB as we want to remove the reliance on the "FAILED to start" alert messages.



  • 2.  Re: Verify all Robots / Probes On Specfic HUB

    Broadcom Employee
    Posted May 04, 2018 06:37 AM

    Share your existing script and I will add in the necessary loop, if you like.



  • 3.  Re: Verify all Robots / Probes On Specfic HUB

    Posted May 04, 2018 06:41 AM

    Cheers Rowan, current script below.  We would like to run a separate script against all Robots, only on one HUB.

     

    -- Start of script
    local al = alarm.list() -- Get alarm list

    local re = "%p%a+%d*_*%a*%d*%p" -- Regex to match probe name with alpha, numbers and underscore

    if al ~= null then
    for i = 1,#al do
    if al[i].prid == "controller" then -- First, filter to get alarms from controller probe only

    if string.match(al[i].message,"FAILED to start") then -- Second, filter to get controller alarms with specific text i-e "FAILED to start"

    probe = string.gsub(string.match(al[i].message,re),"'","") -- Get probe name from alarm message and then remove quotes from probe name to use in probe_verify callback
    --print(al[i].message.."! Probe-> "..probe) -- View alarms with probe names which failed to start

    addr = "/"..al[i].domain.."/"..al[i].hub.."/"..al[i].robot.."/".."controller" -- Build Nimsoft address
    printf("/"..al[i].domain.."/"..al[i].hub.."/"..al[i].robot.."/".."controller".."<->Probe="..al[i].prid) -- Print Nimsoft address(es)


    -- Now run the probe_verify callbacks on each probe which FAILED to start

    local args = pds.create()
    pds.putString(args,"name",probe)
    nimbus.request(addr,"probe_verify",args)
    nimbus.request(addr,"probe_activate",args)
    pds.delete(args)
    sleep (100) -- A little delay between each probe callback
    end
    end
    end
    end
    -- End of script



  • 4.  Re: Verify all Robots / Probes On Specfic HUB

    Posted May 04, 2018 06:56 AM

    Hi,

     

    Did you open a support ticket about your problem ? Or have you troubleshooted the core issue ? Most of the time it's because a probe have to start after another one (this is configurable in the probe definition if you edit the nimsoft package).

     

    Making a callback for all robots (all probes) seem to be a bad solution (that's my developer opinion).

     

     

    Best Regards,

    Thomas



  • 5.  Re: Verify all Robots / Probes On Specfic HUB

    Posted May 04, 2018 07:10 AM

    The issue occurs because the Servers in question are spun up from an image on a regular basis, we have a local scripts which clears the niscache on them when they start but nothing else.

     

    What do you mean by this?

     

    "his is configurable in the probe definition if you edit the nimsoft package"



  • 6.  Re: Verify all Robots / Probes On Specfic HUB
    Best Answer

    Broadcom Employee
    Posted May 04, 2018 07:10 AM

    Agreed, you should really solve the problem you have as it certainly isn't normal to have this problem.

     

    But if you want a quick fix by hub then you can modify this script for your purpose…

     

    --

    -- check_robot_probe_by_hub.lua

    -- rowan collis @ ca

    --

    print('Robot & Probe Status')

    print('====================')

    print(' ')

     

    hublist = nimbus.request("hub","gethubs");

    hubs = hublist.hublist

    args = pds.create()

    for hub_key,hub_table in pairs(hubs) do

       hub = hubs[hub_key]

     

       if hub.name == "xxxxxxxxxxx" then

     

          print ("Processing hub: " .. hub.name .. "\n")

          robots = nimbus.request(hub.addr,"getrobots")

          if robots ~= nil then

             for r_key,r_value in pairs(robots.robotlist) do

                controller = r_value.addr.."/controller"

                print ("  Processing robot: " .. r_value.name .. "\n")

                probes = nimbus.request(controller,"probe_list")

                if probes ~= nil then

                   for p_key,p_value in pairs(probes) do

                         if p_value.active == 0 then

                           print ("    * Probe: " .. p_value.name .. " is Inactive on robot: " .. r_value.name .. " *\n")

    --                       local resp,rc = nimbus.alarm(1, "Check_robot_probe_status - Probe: " .. p_value.name .. " is Inactive on robot: " .. r_value.name .. "","check_probe_"..r_value.name.."_"..p_value.name)

                         end

                   end

                else

                   print("  ** Robot: " .. r_value.name.." is Inactive **\n")

    --               local resp,rc = nimbus.alarm(1, "Check_robot_probe_status - Robot: " .. r_value.name.." is Inactive","check_robot_"..r_value.name)

                end

             end

          end

       end

     



  • 7.  Re: Verify all Robots / Probes On Specfic HUB

    Posted May 04, 2018 07:27 AM

    Great stuff, I'll have a play and update the thread early next week.



  • 8.  Re: Verify all Robots / Probes On Specfic HUB

    Posted May 04, 2018 11:09 AM

    Quick test of the script in our UAT environment and I get the following error:

     

    Error in line 36: 'end' expected (to close 'for' at line 11) near '<eof>'



  • 9.  Re: Verify all Robots / Probes On Specfic HUB

    Broadcom Employee
    Posted May 04, 2018 11:17 AM

    Sorry, stick another "end” on the end

    Cut and paste casualty !!



  • 10.  Re: Verify all Robots / Probes On Specfic HUB

    Posted May 08, 2018 05:27 AM

    The script works a treat to get the list of failed / deactivated probes per HUB, however, we would like to use the extraction to run the probe verify commands against the failed ones.  Is it possible to add this to the script?



  • 11.  Re: Verify all Robots / Probes On Specfic HUB

    Broadcom Employee
    Posted May 08, 2018 06:08 AM

    Mick, you will need a way to identify your error situation…. maybe inactive it sufficient… and then change the script to check for this in the loop.

    Then in the "if p_value.active ~= 1 then " (or whatever check you have) if statement put this….

     

                           verify, rc = nimbus.request(controller,"probe_verify",p_value.name)

                           if rc == 0 then

                                print (" probe "..p_value.name.." verified ")

                           else

                                print (" probe "..p_value.name.." verify failed - code:"..rc)

                           end  

                           activate, rc = nimbus.request(controller,"probe_activate",p_value.name)

                           if rc == 0 then

                                print (" probe "..p_value.name.." activated ")

                          else

                                print (" probe "..p_value.name.." activate failed - code:"..rc)

                           end              

    hope this helps

    cheers