DX Infrastructure Management

Tech Tip: UIM - correlating several alarms in the NAS 

Jun 22, 2017 10:33 AM

This document explains how to correlate alarms in the NAS via an Auto Operator rule and a LUA script.

The use case is the following: as a customer I want to be alerted when I have 2 simultaneous alarms in the console. For instance, a high CPU load and a large CPU queue length size.

 

As both alerts come from the CDM probe, we could start by creating a NAS AO rule to catch any of the 2 alarms:

 

 

The Action Type in this case is set to "script" because we want to check in the Backend if we have 2 alerts (CPU load and queue) active at the same time for the same robot.

The content of the script is:

(Note you will need to edit your database server and user/password)

database.open("Provider=SQLOLEDB;Initial Catalog=CA_UIM;Data Source=<databaseserver>,1433;User ID=sa;Password=<databasepassword>;Network Library=dbmssocn;Language=us_english")
local a = alarm.get()
local rs = database.query("select * from nas_alarms WITH (NOLOCK) where robot = '"..a.robot.."' and (message like '%total cpu is now%' or message like '%processor queue length%')")
if #rs == 2 then
  new_alarm = {}
  new_alarm.nimid = a.nimid
  new_alarm.message = "ATTENTION: "..a.message
  new_alarm.sid = a.sid
  new_alarm.level = 5
  new_alarm.severity = "critical"
  new_alarm.user_tag1 = "Detected high CPU load and high CPU queue simultaneous alarms"
  alarm.set(new_alarm)
end
database.close()

 

This script is quite simple and straight forward. The avg runtime is 5 ms in an environment with low load.

The script will update the existing alert to:
a. Raise the severity of the CPU alerts to critical.
b. Update the alarm message to bring attention to the operators
c. Edit the user_tag1 field with “Detected high CPU load and high CPU queue”

You can adapt it to your needs and change the query to detect other active alarms in the environment.

 

Keep in mind that LUA scripts can pose a load on the NAS when dealing with a high volume of alarms so we should tune the Auto Operator rule to execute the script for a minimum number of alerts.

Note that there are other ways to accomplish alarm correlation (e.g. via ems probe, NAS triggers).

 

HTH,

Nestor

Statistics
0 Favorited
17 Views
0 Files
0 Shares
0 Downloads

Related Entries and Links

No Related Resource entered.