Creating and managing tickets in Service-Now is the simplest part of the integration after alarms are being enriched with sufficient data. These are the main tasks I need to achive:
- Create tickets
- Update ticket if certain fields in alarm are updated
- Assign tickets in ticket system based on assignment in UIM
- Close tickets
- Acknowledge alarm when corresponding ticket is closed
A custom attach queue is created for messages for the probe to listen to. The probe needs the following messages: alarm_assign, alarm_update, alarm_close and alarm_close_i, which is a custom message. I will explain this message in the Closing tickets part.
Creating tickets is triggered by assignment. As mentioned above, the probe attaches to a queue which contains alarm_assign messages. The probe detects which user the alarm is assigned to, and if that user is ticket, it will then attempt to create a ticket for it, unless there is a valid ticket id in custom_5. As I mentioned in the CMDB and Enrichment entry, I put all the necessary unique identifiers needed to create a ticket in alarm's custom_1-4 fields. The probe simply needs to pick up all this information from those fields and insert the ticket through a webservice call.
I'm using an Auto-Operator profile in nas to assign alarms to the user if they meet certain criteria.
Update ticket id to alarm and refresh it to alarm consoles
The webservice call that creates the ticket also returns two unique identifiers for the ticket in the ticketing system. Since USM support URL actions, it makes sense to bring this id to the alarm as the ticket can then be opened directly from USM alarm views. I also discussed how to update alarms and force nas to publish alarm_update messages in the previous entry, so I will only mention here that I put the ticket ID in custom_5 field and and a "." to the alarm message, and then post the alarm message again.
Updating the ticket somewhat more complicated, if you want to do it on parameters other then severity. The only change that is obvious in an alarm_update message is if the level changes, as the message contains keys level and prevlevel. If any other field changes, you have to keep track of the fields of interest yourself. Another option would be to query alarm transaction history from nas or database, but I don't consider that to be a viable option at all. In my case, the only thing I currently need to update to tickets is the level as it impacts the tickets priority, so I'm in luck here. In the future it might be necessary to also track changes in the message, but that is not currently planned.
In the UIM end, triggering ticket assignment is very similar to the mechanism that triggers the creation of a ticket. Only the user changes. If the probe receives an alarm_assign message where the assigned_to field is other than ticket. There is also the special case when the value is empty, which means the alarm has been unassigned.
Closing ticket is pretty much the only thing in this part of the integration where something interesting happens. The basic functionality is very simple. There are two ways in which alarms are closed:
- A clear alarm comes in
- Someone acknowledges the alarms (close_alarms callback)
When alarms are closed, nas issues the alarm_close message. This message contains string table closed_alarms, which is list of nimids that have been closed. If you can close tickets in the ticketing system based on nimid and that is all you need, this is enough. I could do that, but I additionally want to know who acknowledged the alarm or whether it was cleared by an alarm message. The alarm_close message does not contain this information. You could query the nas transaction history and get the details, but that is slow and inefficient.
What I've discovered is, that if you repost clear level alarms with an Auto-Operator profile, you can get more information. So I created an AO profile that reposts clears with the subject alarm_close_i. This message is very similar to an alarm. Nas posts one message for each alarm that was cleared by level 0 alarm, and two messages for each acknowledged, and after it's done posting those for all alarms that were closed at once, it will post the alarm_close message. Alarm_close_i messages contain field event_type. If the message was acknowledged, there's one message with event_type 4 and one with 32, otherwise they're identical. If the message was closed by level 0 alarm, there is only one message with event_type 4. Therefore, as the probe receives alarm_close_i messages, it only processes ones that have event_type 4. These messages contain field acknowledged_by, if the message was acknowledged, and if closed by another alarm, the field isn't there.
Now that I can deduce from alarm_close_i messages who closed the alarm, it's pretty straight forward to move on. I keep a simple dictionary in memory that links nimid to acknowledger. If it was closed by alarm, I just use "System" for acknowledged_by. Each alarm_close_i message adds to this dictionary and when the alarm_close message comes, it'll then close all the tickets (one by one though, so the gained efficiency is sort of small) and then clears the dictionary. The advantage that I gain by closing them at alarm_close instead of alarm_close_i is, that before closing them I need to check their status from the ticketing system (I don't want to alter a ticket that has been closed once) before closing them. As I do it at alarm_close, I can do single webservice call and query like "this ticket or this ticket", where as if I did it at alarm_close_i, I'd have to do more webservice calls. Processing them in bulk will also mean, that the probe will have to reconnect to the queue if it processes a large amount of alarms in one message, as it'll time out. So you'll want to make sure you're handling that correctly.
Acknowledging alarms based on tickets being closed is a different issue. The way I'm going to approach this down the road is likely to have Service-Now send REST to UMP, which will then close the alarms. For the time being, I'm stuck with the ugly method of polling tickets that have been closed since last poll and meet certain other conditions, such as having nimid in them. This is actually what sngtw also does, or at least used to do.
- How much strain can you put on your ticketing system webservices?
- Many things, such as assigning tickets, closing them, updating them, might require you query the ticket status first so you know if/how to update it.
- Amount of queries vs time to complete them
- Does your process allow you to automatically close tickets?
- Do you need to know who closes the alarms? If necessary, this information is also available in nas transaction log. The retention time for that is probably shorter than for your ticketing system, though.
- Alarm volume
- Will you need to separate different messages into different queues, so they can be processed in multiple threads?
- How quick is the API in your ticketing system