On Automic and the passing of time (and also, possibly some inspiration on monitoring technique)

Discussion created by Carsten_Schmitz on Feb 26, 2018
I'm admittedly sometimes rather critical of some things the Automation Engine does. But this one, man, this convinces me that Automic is not only great but it manipulates the very fabric of time and space.

I have a monitoring job to check whether our agents are not only alive, but do process actual jobs. It runs every 15 minutes for our most vital 74 agents. Nagios picks up on any (repeat) failures.

Additionally, I have a cron job that, once a week, collects the performance data of any successful runs of this job from the Oracle database. I am specifically interrested in the time it takes between activation and start of the monitoring job - we found that this is a good indicator of actual agent performance:

select AH_OH_IDNR as OHID, AH_IDNR as Runid, AH_TIMESTAMP1 as Activation, AH_STATUS as Status, AH_HOSTDST as Destination, AH_TIMESTAMP2 as Launch, round((AH_TIMESTAMP2 - AH_TIMESTAMP1) * 24 * 60 * 60,0) as seconds_diff from AH where AH_OH_IDNR in (select OH_IDNR from GFD_IS_UC4_O_01.OH where OH_NAME like '%MY_MONITORING_JOBS_NAME%' AND OH_NAME NOT LIKE '%OLD.%' and OH_NAME not like 'JOBP%') AND AH_STATUS = 1900;

Occasionally, I break this down by server, take out the huge numbers (e.g. over the last eight months, there was one job that ran for 30 hours due to a file transfer bug), calculate the average "startup lag", i.e. the average lag between activation and start of this job, and make one of those newfangled, fancy-coloured Excel charts from it.

And behold, Ladies and Gents, in 1.8 million unique datasets for the last eight months, I found one job that has ...

(wait for it ...)

Negative time!

2070694| 455149157|05-09-17 00:09:30|      1900|AGENT_NAME_REMOVED|05-09-17 00:09:09|          -21

Yes. This job, as the only one among 1.8m jobs, started, according to the UC4 database, 21 seconds before it was activated.

Stephen Hawking, you know nothing!