tdsnoc

Logmon issues: multiple thresholds, alarm clearing, and origin question

Discussion created by tdsnoc on Apr 18, 2014
Latest reply on Apr 29, 2014 by tdsnoc

Hi all,

 

This seems like it should be pretty simple but I can't get it working..

 

Basically I'm running a command every few minutes that will return lines in the following format:

 

<hostname1> <errors>

<hostname2> <errors>

...

 

What I want to do is this:

If a host shows up in that list, send an alarm with source <hostnameX> (this part works)

If the error count is below 10, make the alarm minor

If the error count is 10 or above, make the alarm major

 

I've tried all kinds of things, currently I have two watcher rules defined, one for major and one for minor. Here's the variables config for both:

major

name | source | expect | value

host   | 1-1       | =         | *

errors | 2-2      | <=       | 10

 

minor

name | source | expect | value

host   | 1-1       | =         | *

errors | 2-2      | <         | 0

 

Now first of all those less than signs seem counter intuitive but the first watcher rule works, it creates a major alarm whenever the error count is above 10. I have it set to abort on match because when I didn't, it would send a major and then immediately follow it up with a minor because of course the minor watcher matches as well. However for some reason whenever I have abort on match enabled, it never gets to the minor (at least it doesn't appear to because it doesn't generate minor alarms).

 

Stuff I've tried:

Changing the minor to expect >10 (again, counter intutive.. maybe expect was a bad choice for the header name) and turning off 'abort on match' in major doesn't work because that clears the alarm if the error count is 10 or more (I only want it to clear if reaches 0)

Adding an errorsMax variable with >10 to minor and making it not clear alarms and turning off abort on match doesn't seem to create any alarms either

Making the minor first, no abort on match, only received major alarms

 

This seems like it should be very simple but I can't seem to get it working..

 

Also, any ideas on how I could clear an alarm for a host that isn't on the list at all anymore? For example lets say host1 had a dozen alarms and the logmon probe generated the corresponding alarm, but then next time the logmon probe did it's check that host wasn't on the list at all anymore. Any way to send clears for hosts that were on the list but aren't anymore?

 

Finally, I'd like to be able to change the Origin of the alarm based on the hostname, do I have to do that via a nas script or does anyone know of a trick that'll let me do that directly with the logmon probe?

 

Thanks in advance for any feedback!

 

-Martin

Outcomes