DX Unified Infrastructure Management

Back to discussions

Expand all | Collapse all

changing severity of Robot inactive alarms.

1. changing severity of Robot inactive alarms.

0 Recommend
rkalidh87
Posted Feb 27, 2018 06:19 AM

Reply Reply Privately
Hi,

Am trying to change severity of Robot Inactive alarms, when the alarm arrives, it will ping host name and if it success, am changing the severity and message, below is the code,

alerts= alarm.list("Message","%inactive%")
if alerts ~= nil then
for i=1,#alerts do
a=alerts[i]
--print(a.hostname)
--print(a.severity)
if (action.ping(a.hostname)) then
--print("Success")
message_add_OK = "server responds to ping OK"
update = {}
update.nimid = a.nimid
update.message=a.message.." "..message_add_OK
update.level = NIML_MAJOR
alarm.set(update)
else
-- print("Failure")
end
end
end

challenge here is if i create AO profile with script, alarm arrives in NAS and severity changed with message, but whenever there is increase in message counter, it is reverting back to original value(critical) and then changes to major.

If i place in Pre-processing rules, alarm itself is not showing in NAS, and if remove it from pre-processing alarms coming to NAS.

please help to achieve this,
2. Re: changing severity of Robot inactive alarms.

0 Recommend
Garin Walsh
Posted Feb 27, 2018 07:45 AM

Reply Reply Privately
This script gets all alarms that match the pattern - if you are doing this with a profile you probably want to use alarm.get() in order to get just the single alarm that fired the profile.

When you do your alarm.set() it puts the alarm back on the message bus and it's processed again. You need to make sure that your profile doesn't keep getting triggered. Filter on the original priority for instance so that when you change the priority, it doesn't trigger the profile a second time.

I'd not make a copy of the alarm in the code. instead just update the result of alarm.get. That way you are ensured you don't accidentally drop a necessary identifier.

And finally, when iterating through a list of items in LUA, you are much better of using pairs(). The count of items in a table is somewhat badly defined (accurate in its definition but usually unusable) and tables don't have to have sequential indexes. It is perfectly fine to have a table with 50 items in it where the # operator returns 3. Or 200. It's a special condition where # will give you the exact number of entries in a table.

-Garin
3. Re: changing severity of Robot inactive alarms.

0 Recommend
rkalidh87
Posted Feb 27, 2018 08:10 AM

Reply Reply Privately
Hi Garin,

Thanks . Alarm .get () should be used to get alarms in AO. Pls help on how
to ensure alarms will not be processed on severity changed.. I selected
only critical severity and message counter greater than or equal to 1 in AO
profile.

On 27-Feb-2018 6:16 PM, "Garin" <communityadmin@communities-mail.ca.com>
4. Re: changing severity of Robot inactive alarms.

1 Recommend
Garin Walsh
Posted Feb 27, 2018 10:12 AM

Reply Reply Privately
The sequence of events here is what's causing you issues.

1. Something happens that makes the robot unresponsive
2. Hub detects that and creates alert
3. Nas receives alert and stores it
4. Nas evaluates AO profiles that match and schedules them for execution
5. Matching AO profile executes and updates alert and puts it back on message bus
6. Nas receives alert (created by the profile) and recognizes based on suppression rules that it matches an existing alert so existing alert is updated to match new information
7. Nas evaluates AO profiles that match and finds none
8. Time passes
9. Hub detects robot still inactive and creates alert
10. Nas receives alert and recognizes based on suppression rules that it matches an existing alert so existing alert is updated to match new information and count is increased
11. Nas evaluates AO profiles that match and schedules them for execution
12. Matching AO profile executes and updates alert and puts it back on message bus
13. Nas receives alert and recognizes based on suppression rules that it matches an existing alert so existing alert is updated to match new information
14. Nas evaluates AO profiles that match and finds none
15. Time passes
16. Hub detects that robot is inactive and creates alert
........

So the problem here with using an AO profile to adjust the level of the alert is that the alarm level is guaranteed to bounce around. If you want to avoid this bouncing then you need to use a preprocessing rule which fires before storing the alarm. Problem there is that you are severely limited on what you can do - you can't ping something for instance.

And you need to use "on arrival" on your AO profile because each time the alert arrives it's going to undo your changes so you have to redo them.

There are versions of the nas that have problems with the counter function. You mention using the counter as part of the profile. I'd remove that and just rely on priority and matching the message text.

The thing that I'd consider doing given a guess at what your goal is would be to use triggers. One trigger would be the robot inactive alert, one trigger the result of the ping via net_connect. Then set up trigger logic such that failure from the robot trigger plus failure from the net_connect trigger results in a new alarm.

The other alternative would be to solve the robot inactive problem. Presumably adding the test to ping the server is a double check to remove the erroneous robot inactive alerts that constantly get generated. Would be great if CA engineering could fix that problem. Do you have an open support case?

A third alternative, if this is part of a maintenance process, would be to use the maintenance window feature to suppress alerts during the period of expected outage.

-Garin
5. Re: changing severity of Robot inactive alarms.

0 Recommend
rkalidh87
Posted Feb 27, 2018 10:24 AM

Reply Reply Privately
Hi Garin,

Thanks for your brief explanation and now I can understand the complete process behind AO.

Let me check with CA.
6. Re: changing severity of Robot inactive alarms.

1 Recommend
Garin Walsh
Posted Feb 27, 2018 10:42 AM

Reply Reply Privately
And not to confuse things further but the NAS processes alerts in the order they arrive. On a small system with no interruptions you don't see the weird things but consider the following:

In your data center you have two power strips: one powers the hub server and the other powers the robot and the switch between this hub and the hub running your nas. All is well and good until someone trips the switch on the robot's power strip.

Your robot's hub starts generating robot not responding alerts but has nowhere to send them because the switch it is plugged into is also down.

An hour goes by and you get four or five not responding alerts in queue on that local hub.

Nothing has reached your nas yet either.

Now someone notices the power strip and turns it back on.

Seconds later the switch is working and your central hub successfully connects to this robot's hub and does its get of messages - because the block size is 100, it gets all of them in one block and drops them onto the nas queue on the central hub in one sequential but extremely short time period.

Because the reading from queues is based on polling, it is likely that all four (or however more) robot not responding messages will be dropped into the queue in sequential order with nothing between them and the nas not reading any out.

Now the nas reads the first one of these messages out of queue and executes your AO profile and puts the resulting update back out there on the queue - behind the three old ones that haven't been processed yet.

Now the nas reads the next old alert and fires the AO again because the incoming alert message matches profiles priority test and so it makes the same changes again and queues the update. Now you have two remaining old messages and two new updated messages.

Etc.

This may or may not make a difference to your processing but if you make the assumption that nothing will change between the alarm.set() call and when the full results of that call are complete, you will at some point experience some grief.

-Garin
7. Re: changing severity of Robot inactive alarms.

0 Recommend
rkalidh87
Posted Mar 02, 2018 04:31 AM

Reply Reply Privately
Hi,

How to export LUA script output to a file( excel or word). please suggest.
8. Re: changing severity of Robot inactive alarms.

0 Recommend
Broadcom Employee

Nestor Falcon Gonzalez
Posted Mar 02, 2018 09:11 AM

Reply Reply Privately
Hi,
this extract might help you:
dir= "C://"
file = <variable>
ext= ".txt"
filename= dir..file..ext
io.output(filename)

io.write("text")
print("text")
9. Re: changing severity of Robot inactive alarms.

0 Recommend
NGHIA VAN
Posted Mar 02, 2018 09:19 AM

Reply Reply Privately
That code snippet will get you to writing *.txt files but to write to a doc or a xls file, you will need to import a module into lua to handle that or worse case create you own module. for example I use this one for xslx GitHub - jmcnamara/xlsxwriter.lua: A lua module for creating Excel XLSX files. because it uses the MIT lic which is a fairly friendly lic to use.
10. Re: changing severity of Robot inactive alarms.

0 Recommend
Sam Green
Posted Jul 13, 2018 05:46 AM

Reply Reply Privately
This might help:

-- Find inactive robots, ping them to see if just the robot which is down or the server.
-- The script assumes robot inactive alarms from the hub have been changed to major, this could always be handled by the script of course
-- just insert the following lines after line 26
-- a.level = 4
-- a.severity = major

--Find inactive robot alarm(s)
al=alarm.list("message","Robot % is inactive")
if al ~= nil then
   for i = 1,#al do
      -- Place current row al[i] into a (for readability)
    a = al[i]
      -- Print nimid, hostname, severity and message for troubleshooting
      printf("%02d %s %s %s",i,a.source,a.severity,a.message)
      -- Get the ip of the robot from the alarm
      ip_addr = a.source
      -- Print for troubleshooting
      print(ip_addr)
      -- Ping the ip
      ping_success = action.ping(ip_addr)
         if ping_success then
            -- Print the status for troubleshooting
            print("Ping success "..ip_addr)
            -- Edit the alarm message to to assist ops
            message_add_OK = "but server responds to ping OK"
            a.message = a.message.." "..message_add_OK
            -- Change severity to major
            --a.level = 4
            --a.severity = major
            alarm.set (a)
         else
            --Print the status for troubleshooting
            print("Ping fail "..ip_addr)
            -- Edit the alarm message to assist ops
            message_add_fail = "and no response to ping!"
            a.message = a.message.." "..message_add_fail
            -- Change the severity to critical
            a.level = 5
            a.severity = critical
            alarm.set(a)
         end
   end
end

DX Unified Infrastructure Management

changing severity of Robot inactive alarms.

rkalidh87Feb 27, 2018 06:19 AM

Garin WalshFeb 27, 2018 07:45 AM

rkalidh87Feb 27, 2018 08:10 AM

Garin WalshFeb 27, 2018 10:12 AM

rkalidh87Feb 27, 2018 10:24 AM

Garin WalshFeb 27, 2018 10:42 AM

rkalidh87Mar 02, 2018 04:31 AM

Nestor Falcon GonzalezMar 02, 2018 09:11 AM

NGHIA VANMar 02, 2018 09:19 AM

Sam GreenJul 13, 2018 05:46 AM

1. changing severity of Robot inactive alarms.

2. Re: changing severity of Robot inactive alarms.

3. Re: changing severity of Robot inactive alarms.

4. Re: changing severity of Robot inactive alarms.

5. Re: changing severity of Robot inactive alarms.

6. Re: changing severity of Robot inactive alarms.

7. Re: changing severity of Robot inactive alarms.

8. Re: changing severity of Robot inactive alarms.

9. Re: changing severity of Robot inactive alarms.

10. Re: changing severity of Robot inactive alarms.