Problem detecting unexpected process restarts for a process that restarts on a schedule

Discussion created by Garin on Aug 31, 2014
Latest reply on Sep 4, 2014 by 1_keithk

I have a need to monitor for unexpected process restarts for a process that has a scheduled restart. Specifically I have a process that starts during the first 10 minutes of the hour and then runs through to the end of the hour. It then shuts itself down and restarts during the first 10 minutes of the next hour and the process repeats.


I can create a schedule to reflect the desired schedule:
recurrent_event_spec = RRULE:FREQ=MINUTELY;INTERVAL=1|EXRULE:BYMINUTE=0,1,2,3,4,5,6,7,8,9,10


and it works. Any process down event that happens during minutes 0-10 doesn't generate an alert and from 11-59 it does.


The problem is that I also need to detect restarts and so checking once a minute for a process to be there only has a tiny random chance of finding the process down because it is unlikely to fail exactly on the minute and the restart is automatic and immediate.


The solution to this appeared to be to enable the detect restart and that requires the track by PID. This successfully detects the process restarts.


The problem with that is that if the process restarts during that first 10 minute time period where a restart is allowed, there is no alert generated as expected but on minute 11, there is an alert generated because the PID is now different from the last time checked during minute 59 of the previous hour.


So, is there a graceful way of handling this? I'd like to avoid a preprocessor script for this because that seems like a sledgehammer solution and is hard to maintain. I've considered a scheduled restart of the process probe but I don't have control of where in that 10 minute period the process restarts so synchronizing them is difficult and I'd lose monitoring for the restart period.


What I really want is a way to make the processes probe forget about the tracked PID when an expected interval happens that is excluded.