DX Unified Infrastructure Management

  • 1.  SQL Agent Job failure Monitoring

    Posted Oct 03, 2018 12:06 PM

    Hi Folks,

     

     

    We have got SQL job failure alerts for a job which has failed yesterday that is 2nd October 10:55AM.

     

    The check interval is 5 minutes.

     

    Still the QOS is getting appended in the old alerts i the alarm history tab in UMP console though the job got failed afterwards as well (that is after 10:55AM 2nd october)and there were new alerts triggered for the latest job failure.

     

    I am not able to understand why the QOS is getting populated in the old alerts that are no more relevant.Could anyone let me know if this is the expected behaviour as we have a requirement to acknowledge these alerts from CA UIM but again new alerts are getting triggered because the QOS is getting generated continuously.

     

    Many Thanks,

    Vineesha.



  • 2.  Re: SQL Agent Job failure Monitoring

    Posted Oct 03, 2018 12:48 PM

    Hi Vineesha,

     

    It sounds a bit confusing, can you elaborate more with some screenshots.

     

    What is the job interval which is failing, is it possible that job is getting successful in between and failing on random intervals. And what is the suppression key you are getting in both the alerts?



  • 3.  Re: SQL Agent Job failure Monitoring

    Posted Oct 03, 2018 01:17 PM

    Hi Hitesh,

     

    This is the suppression key for the job failure alerts:

     

    Profile $profile, instance $instance, job $job_name (category $category_name), has failed. Run time of job: $rundate

     

    Job was continuously failing for every 5 minutes yesterday and we got bulk alerts in our queue and the Database team have disabled the job from there end as the job is now running on some other node.

    Now the team is asking us to close the alerts as the job is now running on a different node.

     

    So when i acknowledge the alarm i am getting the new alarm stating that job failure happened with yesterday's date and time of the job run which should not be the case.

     

    Regards,

    Vineesha.



  • 4.  Re: SQL Agent Job failure Monitoring

    Posted Oct 03, 2018 01:18 PM

    Suppression key is same for all the job failure alerts.



  • 5.  Re: SQL Agent Job failure Monitoring

    Posted Oct 03, 2018 01:30 PM

    can you check the values in the threshold value ? I hope you are using the latest version of the sqlserver probe.

    Please try disabling/enabling the probe and log the support case with loglevel 5 if you are still getting alerts since this needs deep investigation.