- Some metrics, such as temperature can be very different between devices and between components within a device.
- Baseline based alerting would be a good way of alarming when temperatures are higher than normal
- Baseline based event rules can only use multiples of “standard deviation”. E.g. when a value is higher than 3 standard deviations above the baseline
- Standard deviations increase when there is more variation in the metric values
- However, when there is very little variation, the standard deviation is e.g. 0.1 or even 0.0
- When you configure that an alarm should be generated when a value is more than 3 standard deviations above the baseline and the standard deviation is 0.1, then the alarm would be generated when the temperature is just 0.3 degrees above the baseline. This is far too sensitive. In this scenario, to get any realistic alarms you need to set the alarm to 10 or 20 times the standard deviation.
- In some cases the standard deviation is 0.0. If you multiply 0.0 by 20 it is still 0.0. So any value higher than the baseline is higher than 20 times the standard deviation. Therefore no matter what, the event will be triggered. This pretty much makes this kind of event rule impractical
- However, in other cases there is a lot more variation and the standard deviation is e.g. 5 degrees, the alarm only gets generated when the temperature is 50 to 100 degrees above normal. This makes the alarm useless in case of devices with high variation
- --> standard deviation based event rules either get triggered too early (when there is (almost) no variation in the data) or not at all (when there is more variation for the metric value). Therefore in many cases standard deviation based event rules cannot be practically used
- Create the ability to set threshold to constant value (e.g 10 degrees) or percent above baseline.
- This is a feature that is already present in SNMPCollector in UIM and works quite well