jonhcw

Developing custom UIM integrations part 1: General design

Blog Post created by jonhcw on Mar 27, 2015

Introduction

I've worked on a couple of custom integrations with the UIM product. The most important ones have been with Service-Now and SCOM. Most recently I've worked on a new, a bit more extensive, Service-Now integration and I thought I'd write down some thoughts and observations that have emerged through the process (which is still ongoing).

 

While I'm going to focus on general design matters in Part 1, I will first briefly explain why I'm not using the sngtw probe that CA provides for Service-Now integrations. Service-Now is a Service Management platform, not only a ticketing system. You could have your ticketing, CMDB, Service Level Management, Knowledge Base and what not running in there. The way one's data is modeled into the CMDB, how forms (Incident, CI, etc), tables and such are used and customized might vary a lot from one case to another. While the sngtw provides a functional method for creating and even managing incidents to an extent, that is where it mostly ends.

    You can also map alarm fields to Incident fields such as Configuration Item, but the input needed by the Incident form is likely quite different from the "hostname", "source", or "robot" from the alarm message. Moreover, you'll likely need more data for the incident such as Service Offering, Business Offering or Service Contract. This data does not exist in the alarm message as such. One could argue that you can enrich the alarms with alarm_enrichment. My view on alarm_enrichment is that the concept and the implementation are suspect, so I try to avoid using it as much as I can. I do not agree with intercepting alarms before they are published, and I do not like the idea of querying a database each time an alarm occurs that matches the suppression criteria of an already published alarm.

   

The principal things I mean to achieve with the integration mechanism are:

  1. Creating tickets with correct CI, correct Service Offering and correct Priority (from alarm severity). This will provide, for example, automatic event SLA calculations, easier RCA (CIs are linked to others), easier reporting etc.
  2. CMDB synchronization between UIM and Service-now. This is vital for not only creating tickets but for automating UIM SLA creation and other additional features. In the future I hope to extend this to also further manage alarms in UIM by altering severities or even suppressing alarms based on CI relationships modeled in the Service-Now CMDB. I plan to discus this further in another blog entry.
  3. SLA automation. While this is not directly a part of Service-now integration, I'm mentioning it here because it requires data from the CMDB (service hours, SLA agreements) and because maintaining these manually is a major headache.

 

I'm going to try to explain how I'm trying to handle these things and more in a series of blogs. Some things can be done several ways and I aim to explain why I've decided to go one way and not the other where it really matters. Although I've explained my specific use case, I want to talk about creating integrations in general. Therefore I try to avoid going too deep into my implementation specific details such as table structures, queries etc. Also towards that end, I'll make an effort to focus more on UIM side of things rather than SN.

 

The general design

 

The previous Service-Now integration I made was a custom Perl probe that interacted with Service-Now through direct webservices. It maintained no device data in db or memory and had to query CMDB or Incidents in SN for several things each time it needed to create, close or update a ticket. This time around I chose to work with C#, continuing to use direct webservices to contact Service-now (SOAP). The fundamental difference is that now I'm going to store device data both in UIM db and in memory when needed. I'm also distributing the integration to three components: cmdb integration, custom alarm enrichment and ticket mechanism.

 

The components

For better manageability I decided to create for separate components that each handle a specific part of the alarm flow process.

 

CMDB probe

This is the heart of the integration and automations I am building. To be able to create tickets with correct data, I want to bring it to the UIM database where it is easily queriable. I'm only bring the data I need and this component also helps with keeping the CMDB up to date, since it enables us to report devices that are either missing from the CMDB or from monitoring. This will be the topic of part 2

 

Enricher probe

This probe reads alarm_new messages and enriches them based on the data synchronized from the CMDB. The probe also contains a copy of the CMDB data synced to UIM in memory for faster processing, much like data_engine.

 

Ticket probe

This probe handles creating, updating and closing tickets created from alarms. It uses the data on the alarm, put there by the Enricher probe.

 

sla_helper

This probe is not part of the integration, strictly speaking. However, it uses the data synchronized to the UIM database to automatically create SLAs for configured devices.

 

The alarm flow

Here's a basic step by step of how the components work.

 

  1. New alarm arrives, alarm_new message is published by nas
  2. There's a attach queue on the hub for alarm_new. The Enricher probe picks this up
  3. Enricher checks if it has data in memory for this devices. If it does not, it lets CMDB probe know.
    1. If CMDB probe has data or can resolve data for this device, it informs Enricher to refresh it's cache and check again.
  4. Enricher probe puts data in the custom_1-5 fields of the alarm by submitting a new alarm message that matches the suppression criteria for this alarm. These include necessary CMDB identifiers, service hours, action code (such as SMS), and alarm handling status code (bitmask). If it doesn't have data, it also puts that in the alarm as handling status and clear text.
  5. After alarm hits overdue age X, it assigned to a specific user which tells the Ticket probe to create an alarm
  6. Ticket probe handles assigned_to, alarm_update and alarm_close messages and takes action respectively.

 

Conclusion

I've listed the very basic idea of how the integration works. In the next part I'm going to write about the CMDB in more detail and on a bit more technical level.

Outcomes