Tech Tip - CA UIM sqlserver probe is not sending alarms for the check_dbalive checkpoint.

Document created by DavidM Employee on Jul 25, 2016Last modified by SamCreek on Dec 17, 2016
Version 3Show Document
  • View in full screen mode

Problem:

The sqlserver probe is installed on a robot to remotely monitor MS SQL Servers. When something happens resulting in the database being unavailable such as a hardware crash, the probe fails to send the alarm for the check_dbalive checkpoint.

Instead he customer gets many messages like this:

2016-01-04 07:58:05OPENProfile <Profile Name>, failed to execute in scheduled time interval, delayed by 308 seconds1SQL-Serverminor

'failed to execute in scheduled time interval' indicates the probe failed to complete all the checkpoints within the configured timeouts.

 

Environment:

Potentially this will effect all revisions of the probe and SQL Server.

 

Cause:

The sqlserver probe has several configurable timeouts which limit the time to process all checkpoints. When the timeout is reached the probe will stop processing the checkpoints and generate the above message. check_dbalive being one of the checkpoints could be excluded due to the timeouts.

 

Resolution:

The timeouts can be increased, or since checkpoints are processed in order, the check_dbalive checkpoint can be moved to the top so it is processed first.

Edit C:\Program Files (x86)\Nimsoft\probes\database\sqlserver\sqlserver_monitor.cfg by moving the section for  <check_dbalive> to the top of the  <checkpoints> section like this:

<groups>

   <UMP>

      description = To fill default UMP dashboards

      <checkpoints>

         <check_dbalive>

            active = yes

            description = Monitors connectivity to the database instance

            qos = yes

            qos_list = yes

            clear_msg = check_dbalive_1

            clear_sev = clear

            interval = 5 min

            sql_timeout =

            scheduling = rules

            use_exclude = no

            use_include = no

            samples = 1

            <thresholds>

               <default>

                  <0>

                     tagid = 0

                     value = 1

                     unit =

                     sev = major

                     msg = check_dbalive_2

                     condition = !=

                     clear_msg = check_dbalive_1

                     scheduling =

                     key_col_name =

                     key_col_value = default

                  </0>

               </default>

            </thresholds>

            <qos_lists>

               <0>

                  qos_name = check_dbalive

                  qos_desc = SQL Server Availability

                  qos_unit = Availability

                  qos_abbr = Avail.

                  qos_max = 1

                  qos_value = status

                  qos_key =

               </0>

            </qos_lists>

         </check_dbalive> 

         <active_connection_ratio>

            active = no

 

Additional Information:

For an explanation of the timeouts and other settings for the sqlserver probe see TEC000004014 sqlserver probe checkpoints - query timed out or failed to execute alarms.

 

 

This is a copy of my knowledge base document:

http://www.ca.com/us/support/ca-support-online/product-content/knowledgebase-articles/tec1346032.aspx

1 person found this helpful

Attachments

    Outcomes