alarm if % of tests fail within the hour

Feb 28, 2013
I have a scenario where I was asked to do alarms if 3 consecutive tests fail or 20% of tests within the hour fail. This includes processes and net_connect probes. As far as I can see there's no readily available way of alarming on % of tests failing within the hour. SLAs would be able to do that, but the minimum period available is one day.


I came up with this NAS script to generate the alarm (well alarm is not in this script but you basically just replace the printf with nimbus.alarm). This is for net_connect and a single target that I looked up for testing, the query needs to be adjusted to be dynamic.


rc ="provider=nis;database=nis;driver=none") if rc == 0 then    query = 'declare @var int;select @var = (select top 1 3600/samplerate from RN_QOS_DATA_0019 where table_id = 17565 order by sampletime desc);select COUNT(*)/CAST(@var as decimal) perc from (select top (@var) samplevalue from RN_QOS_DATA_0019 WHERE table_id = 17565 order by sampletime desc) as A where A.samplevalue is null;'        tab, rc = database.query(query)    if rc == 0 then       if tab[1].perc < 0.20 then          printf("NULL samples under threshold")       else          printf("NULL samples over threshold!")       end    end    else    printf("COULD NOT OPEN DATABASE: "..rc) end  database.close()

 This at least works for me. Another option I thought of is to send alarm on each sample and process those with triggersand AO profiles, but this seemed simpler. If anyone knows if there is a better and readily implemented solution, that would be great too :smileyhappy: