I have a scenario where I was asked to do alarms if 3 consecutive tests fail or 20% of tests within the hour fail. This includes processes and net_connect probes. As far as I can see there's no readily available way of alarming on % of tests failing within the hour. SLAs would be able to do that, but the minimum period available is one day.
I came up with this NAS script to generate the alarm (well alarm is not in this script but you basically just replace the printf with nimbus.alarm). This is for net_connect and a single target that I looked up for testing, the query needs to be adjusted to be dynamic.
rc = database.open("provider=nis;database=nis;driver=none") if rc == 0 then query = 'declare @var int;select @var = (select top 1 3600/samplerate from RN_QOS_DATA_0019 where table_id = 17565 order by sampletime desc);select COUNT(*)/CAST(@var as decimal) perc from (select top (@var) samplevalue from RN_QOS_DATA_0019 WHERE table_id = 17565 order by sampletime desc) as A where A.samplevalue is null;' tab, rc = database.query(query) if rc == 0 then if tab.perc < 0.20 then printf("NULL samples under threshold") else printf("NULL samples over threshold!") end end else printf("COULD NOT OPEN DATABASE: "..rc) end database.close()
This at least works for me. Another option I thought of is to send alarm on each sample and process those with triggersand AO profiles, but this seemed simpler. If anyone knows if there is a better and readily implemented solution, that would be great too