I'm trying to find the best way to monitor the last status of multiple SQL jobs. Currently we receive a status email, however this is unreliable because depending on how much data the jobs have to process greatly effects the time for completion and the times we receive the email can vary by hours.
I see this topic was raised a couple times previously: Monitored for SQL failed jobs? This would bea good idea to have failed jobs write to the event log however this will only limit us to knowing about job failures only. I also came across Question about processing external emails through Nimbus which is an interesting workaround to the problem however it seems to require alot of initial customization to get off the ground.
I was going down the path of creating a SQL query to pull job information, run time, job status, and job outcome like the following:
USE MSDB SELECT name AS [syspolicy_purge_history] ,CONVERT(VARCHAR,DATEADD(S,(run_time/10000)*60*60 /* hours */ +((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */ + (run_time - (run_time/100) * 100) /* secs */ ,CONVERT(DATETIME,RTRIM(run_date),113)),100) AS [Time Run] ,CASE WHEN enabled=1 THEN 'Enabled' ELSE 'Disabled' END [Job Status] ,CASE WHEN SJH.run_status=0 THEN 'Failed' WHEN SJH.run_status=1 THEN 'Succeeded' WHEN SJH.run_status=2 THEN 'Retry' WHEN SJH.run_status=3 THEN 'Cancelled' ELSE 'Unknown' END [Job Outcome] FROM sysjobhistory SJH JOIN sysjobs SJ ON SJH.job_id=sj.job_id WHERE step_id=0 AND DATEADD(S, (run_time/10000)*60*60 /* hours */ +((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */ + (run_time - (run_time/100) * 100) /* secs */, CONVERT(DATETIME,RTRIM(run_date),113)) >= DATEADD(d,-1,GetDate()) ORDER BY name,run_date,run_time
I attempted to use the SQL response probe to check the column Job Outcome and generate alarms against anything other than Succeeded however it seems it will only accepts numeric values. Is it possible to compare a word using the regular expression comparison under the alarms?