APM - Meaningful data to SOI

Question asked by Bob_Lomax_UK on Mar 1, 2013
Latest reply on May 28, 2015 by Guenter_Grossberger
Hello all,

I have APM 9.1 reporting CEM Incidents to SOI 3.0 as alerts currently.

My issue is how to make APM data more meaningful due to the fact that CEM Incidents remain open until an APM operator closes them, regardless of if the problem has gone away. An example:-

First slow time defect at 07:00.
08:00 Slow time defects reach impact 1000 which triggers a moderate impact level on APM - in turn providing an alert to SOI. SOI treats this as a 'slight' health issue. Not so much of a problem.
09:00 Defect impact then reaches 2000 which means Severe on APM - providing an alert update to SOI, changing SOI health to 'moderate'. Now, by default, SOI treats this as an outage - the start of a period of service unavailability.
09:15 The defects now stop as the underlying problem goes away. APM still has an Open incident. SOI is still thinking the outage is ongoing.

Eight hours later... the APM operator closes the CEM Incident. SOI (if you are at APM v9.1.5) now gets an alert close message and removes the alert, bringing the service up to normal health and considering that the end of the outage.

However, I have an 8 hour plus downtime on my SOI stats, reports and SLA calculations...
Actual downtime should really be assessed as start of outage to time of last defect. I.e. 09:00 to 09:15 - just 15 minutes.

There is little that can be done in SOI to deal with this case and it relies on the APM feed to tell it about opening and closing of CEM Incidents.
I can make sure SOI doesn't treat a CEM Incident as an outage until it reaches a 'Critcal' level, but that doesn't fix the issue of waiting for APM to send a closure.

Is there any feature or workaround in APM 9.1 to allow Age-out of Open (not Pending) Incidents?
Are there any Auto-close facilities in APM that can be set by GUI configuration?
Would an IT PAM Workflow be required to detect and call an API/web service to get APM to close or age-out an Incident?