How to use the nas and alarm_enrichment probes to enrich alarms

Document created by danst04 Employee on Oct 20, 2017Last modified by danst04 Employee on Oct 20, 2017
Version 2Show Document
  • View in full screen mode

Environment:

 

References:

nas configuration

alarm_enrichment Raw Configuration

 

Use Case: 

In this document, we will discuss alarm enrichment under the context of a specific use case. This is just one example of leveraging the alarm enrichment functionality.

 

The end result of this configuration will dynamically modify the following UIM alarm attributes, custom_2 through custom_4:

custom_4 --> ci_description "Configuration Item Description"

custom_3 --> met_description "Metric Description Data"

custom_2 --> target "Target data from S_QOS_DATA table"

 

 

CMDB Data Source

 

The alarm_enrichment probe can be configured to read data from various data sources. Each data source is referred to as a “CMDB” (Configuration Management Database), for example: CA Service Desk Manager (CMDB) or other products.  In this example, we will be using the UIM database as the CMDB data source. Currently, only JDBC-compliant SQL-database sources are supported..

 

Each data source is defined as:

 

  • JDBC connect string
  • user login
  • database password
  • query to extract the data from the cmdb

 

Enrichment Rules

 

Every data source allows a user-defined name to be referenced in the enrichment rules. Each enrichment_rule can reference one data source. A data source can be used by many enrichment rules.

Once you have defined the CMDBs/data sources, you must define at least one enrichment rule.

 

Each enrichment rule defines a matching condition to match on alarms which should be forwarded to this enrichment rule. The enrichment rule defines what alarm enrichment should be performed, and from what data source additional information for this alarm should be read. When an alarm is processed by the alarm_enrichment probe, it will be copied to a new event where:

 

  • the message identifier NimId is modified to ensure it is still unique
  • the fields qsize, md5sum and subject are removed from the incoming alarm
  • all fields starting with "hop" are copied by prepending it with "original_" so that the field "hop0" becomes "original_hop0" in the outgoing alarm.

 

The alarm is then matched against the configured alarm enrichment rules. An overwrite rule defines an alarm attribute, e.g., custom_2, and a value to which the alarm attribute should be set, e.g., target. Once an alarm has been processed against the alarm enrichment rules, it is passed on to the nas probe for further processing.

 

Routing Rules

At a minimum, you need one routing rule (routing-rule) to forward your alarms to your Alarm Server (nas).

There might be a situation where you would want to create more than one routing rule, e.g., send alarms to a different ‘receiver.’

 

Overview Diagram

 

It is highly recommended to test and develop the alarm enrichment in a sandbox environment following good change control processes prior to implementing in a production environment.

 

 

CMDB Data Source and Query Prep

 

  • Ensure the data source is reachable (via host:port)
  • Ensure the data sources you are using are ready for the number of requests the alarm_enrichment probe is making to get alarm information
  • Test your query against a single ci metric id In this case scenario) to make sure it works as expected and returns the expected results within a reasonable time frame. For MS SQL Server, speak with your DBA about “Execution Plan” and “Client Statistics” while running the query. You may need an index or maintenance plan to improve the speed of the query. MS SQL Query below:

 

select

case when d.target is null then

dev.dev_name + ':' + ci.ci_name else d.target end as target,

ccim.ci_metric_id,

ccim.ci_metric_type,

ccimd.met_description,

ccid.ci_description

from CM_CONFIGURATION_ITEM_METRIC ccim (nolock)

inner join CM_CONFIGURATION_ITEM ci (nolock) on ci.ci_id=ccim.ci_id

inner join CM_DEVICE dev (nolock) on ci.dev_id=dev.dev_id

inner join CM_CONFIGURATION_ITEM_METRIC_DEFINITION ccimd (nolock) on ccim.ci_metric_type = ccimd.met_type

inner join CM_CONFIGURATION_ITEM_DEFINITION ccid (nolock) on ccid.ci_type = ccimd.ci_type

left join S_QOS_DATA d (nolock) on ccim.ci_metric_id=d.ci_metric_id where ccim.ci_metric_id = 'M014F2FE1FEFC1F60130A3EAC07D4C938'

 

  • Test an abbreviated version of the query e.g., without the last ‘where’ clause with the ? variable and make sure you get results quickly:

 

select case when d.target is null then dev.dev_name + ':' + ci.ci_name else d.target end as target,ccim.ci_metric_id,ccim.ci_metric_type,ccimd.met_description,ccid.ci_description from CM_CONFIGURATION_ITEM_METRIC ccim (nolock) inner join CM_CONFIGURATION_ITEM ci (nolock) on ci.ci_id=ccim.ci_id inner join CM_DEVICE dev (nolock) on ci.dev_id=dev.dev_id inner join CM_CONFIGURATION_ITEM_METRIC_DEFINITION ccimd (nolock) on ccim.ci_metric_type = ccimd.met_type inner join CM_CONFIGURATION_ITEM_DEFINITION ccid (nolock) on ccid.ci_type = ccimd.ci_type left join S_QOS_DATA d (nolock) on ccim.ci_metric_id=d.ci_metric_id where ccim.ci_metric_id = ?

 

  • Keep an eye on latency to make sure your data source can return results quickly. When you run the query/population query the first row should be returned in a reasonable time frame, in a matter of seconds.
  • When accessing large and busy databases consider running a ‘shadow’ database for read-only query purposes. A shadow database is basically a mirror of the production database you can use for testing/dev purposes.
  • Create and use a separate database user for the connection string to the data source which allows easier troubleshooting if there is a problem.  For example:

 

 

 

 

Sample JDBC connect URLs

connection_url = jdbc:oracle:thin:@//172.17.4.12:1521/ORCL

connection_url = jdbc:sqlserver://172.17.8.12:1433;DatabaseName=CA_UIM;

connection_url = jdbc:mysql://172.17.0.12:3306/choslm

 

Alarm Enrichment - Raw Configure

 

 

The alarm_enrichment probe is configured using the Raw Configure option in the nas probe. The configuration settings for this probe are stored in the nas configuration file. Memory settings for the alarm_enrichment probe are maintained in the startup->opt section of the alarm_enrichment Raw Configure option. We recommend a min/max of at least 2048/4096 respectively.

 

The alarm_enrichment configuration settings are contained in the enrichment-source, enrichment-rules, and routing-rules sections of the raw configuration for the nas probe. The alarm_enrichment probe subscribes to "alarm" messages, modifies the alarm and submits a new message to the nas with a modified subject of "alarm2." The nas probe subscribes to the "alarm2" messages.

 

Note: The alarm_enrichment probe processes the enrichment and routing rules in alphanumeric order. You can determine the order in which the rules are processed by using a naming convention for the section names that dictate the order.Users are allowed to change the subject (queue) names. By default, alarm_enrichment probe uses the "alarm" subject and forwards messages to the "alarm2" subject for the nas probe. Warning: Note that if the subject name is changed, any existing content in the queues will be lost.

 

population_query

 

population query is the pre-population non-targeted query that will be executed on startup of the probe and at regular intervals. There should not be a "?" in this query as no ID substitution will occur. This query is placed in the alarm_enrichment cache for quick retrieval. The following example gathers name, ip, and os_type. Name and ip are used to help match the alarm and os_type is used for updating custom_4.

 

query

This query is a targeted query (of the population_query) is executed if the data required is not returned from the AE cache. Specify a "?" at the end of the query where the ID of the item can be filled by the results of the query.

 

Example query which simply returns name, ip and os_type data:

select name,ip,os_type from cm_computer_system

 

Note that for large databases with large tables that are being queried, the pre-population query may be left empty for better performance.

Note also that if storing millions of items in the AE cache then the cache initialization can take a very long time, hence the AE queue may take some time to process through the nas.

 

The alarm_enrichment 'bulk_size’ variable is based on how many messages it is actually reading at any given time. If AE is able to read a higher number of items in successfully, it will continue to take that many in. However, if it isn’t able to handle the number of messages you set it to, e.g., 1200, it will automatically decrease down to 100. We recommend keeping it set to a value of 100.

 

Deployment Example

Listed below is an example nas/alarm_enrichment configuration from a lab environment.

nas.cfg example listed below (with edited areas highlighted)

 

<setup>

   cache_enrichment_query_misses = yes

   enrichment_loglevel = 3

   enrichment_logsize = 50000

   enrichment_cache_prepopulation_interval_in_seconds = 21600

   enrichment_logfile = alarm_enrichment.log

   enrichment_subject = alarm

   debug = 3

   subject = alarm2

   logfile = nas.log

   bulk_read_size = 100

   …

   …

   …  

  </setup>

<enrichment-source>

   <cmdbs>

      <os_enricher>

         active = true

         connection_url = jdbc:sqlserver://abcd-1234.LAB.COM:1433;DatabaseName=CA_UIM;loginTimeout=1800;

         user = <omitted>

         password = <omitted>

         query = select case when d.target is null then dev.dev_name + ':' + ci.ci_name else d.target end as target,ccim.ci_metric_id,ccim.ci_metric_type,ccimd.met_description,ccid.ci_description from CM_CONFIGURATION_ITEM_METRIC ccim (nolock) inner join CM_CONFIGURATION_ITEM ci (nolock) on ci.ci_id=ccim.ci_id inner join CM_DEVICE dev (nolock) on ci.dev_id=dev.dev_id inner join CM_CONFIGURATION_ITEM_METRIC_DEFINITION ccimd (nolock) on ccim.ci_metric_type = ccimd.met_type inner join CM_CONFIGURATION_ITEM_DEFINITION ccid (nolock) on ccid.ci_type = ccimd.ci_type left join S_QOS_DATA d (nolock) on ccim.ci_metric_id=d.ci_metric_id where ccim.ci_metric_id = ?

 

         population_query =

      </os_enricher>

   </cmdbs>

</enrichment-source>

<enrichment-rules>

   exclusive_enrichment = no

   <1>

      match_alarm_field = met_id

      match_alarm_regexp = [\d\D]+

      use_enricher = os_enricher

      lookup_by_alarm_field = met_id

      lookup_by_regexp =

      <overwrite-rules>

         udata.custom_4 = [cmdb.ci_description]

         udata.custom_3 = [cmdb.met_description]

         udata.custom_2 = [cmdb.target]

      </overwrite-rules>

   </1>

</enrichment-rules>

 

nas.cfg in Raw Configure Mode:

 

   - enrichment source->cmdbs->os_enricher (database connection)

 

 

Enrichment rules

In all cases match on alarm field-> met_id

 

Use regexp to match on/process alarms,   [\d\D]+ is a ‘catch-all’ expression to match on all alarms.

 

   If preferred, you can specify specific probes using an OR operator, e.g., (cdm|ntevl|netapp)

 

 

 

Overwrite rules: (overwrite the custom_2, custom_3, and custom_4) alarm attributes with the results of the query.

 

  • Custom_2 alarm field is overwritten with target data from S_QOS_DATA (target)
  • Custom_3 alarm field is overwritten with metric description data from CM_CONFIGURATION_ITEM_METRIC_DEFINITION (met_description)
  • Custom_4 alarm field is overwritten with configuration item description from CM_CONFIGURATION_ITEM_DEFINITION (ci_description)

 

 

 

Running the aforementioned query in MS SQL Server studio yields these results:

 

 

 

If you have everything configured properly and you can connect to the database successfully to run the query, and the AE and nas queues are processing the alarm messages, you should see the custom fields populate in less than a minute or so in a small-to-medium environment and possibly longer (minutes) in a large environment. Check the hub Status Tab to make sure the AE and nas queues are sending messages.

 

Shown below is an example of UIM alarms in the IM alarm sub-console with custom_2 through custom_4 fields populated by the query:

 

 

 

Other nas.cfg notes:

cache_enrichment_query_misses = yes

#caches the query misses so AE does not rerun the query. Applicable to nas v8.56 and higher.

 

enrichment_cache_prepopulation_interval_in_seconds = 21600

#If using pre-population, run the prepopulation query every 6 hours to refresh the cache.

 

Additional Information:

alarm_enrichment Raw Configuration

https://docops.ca.com/ca-unified-infrastructure-management-probes/ga/en/alphabetical-probe-articles/nas-alarm-server/alarm_enrichment-raw-configurationhttps://docops.ca.com/ca-unified-infrastructure-management-probes/ga/en/alphabetical-probe-articles/nas-alarm-server/alarm_enrichment-raw-configuration

 

Using alarm_enrichment rule to lookup the device details in our CMDB, based on the short name / hostname and not the FQDN

https://support.ca.com/us/knowledge-base-articles.TEC1206036.html   

 

How to update origin for robot inactive alarm

https://communities.ca.com/docs/DOC-231177725-tec-tip-how-to-update-origin-for-robot-inactive-alarm?et=watches.email.document

Attachments

    Outcomes