ESP Workload Automation

  • 1.  Any ideas for checking an event for suspension and providing an alert?

    Posted Feb 24, 2017 08:42 PM

    Hello folks. Does anyone know of a solution that will check a specific ESP event (identified as "critical" by the business) and provide an alert if it has been suspended? Use cases: An application has requested to suspend one their events due to a large application release. After the smoke has cleared, the application missed the step to re-activate the event. After several days, it was revealed, through a customer channel, that data was not being transmitted. Or, Operations suspended the wrong event or missed turning an event back on.



  • 2.  Re: Any ideas for checking an event for suspension and providing an alert?
    Best Answer

    Posted Feb 27, 2017 10:04 AM

    You could use the LIST command to display event information.

    Note: When you issue the LIST command from page mode, you must use the abbreviated form of the LIST command (L).

     

    You could do this for a specific event or all events "-"

    An example from my Development system.

     

    L LEVEL(<PREFIX>.-)                    

    <PREFIX>.EVENT1            IS SUSPENDED      

    <PREFIX>.EVENT2            REQUIRES TRIGGER  

    <PREFIX>.EVENT3            NEXT DUE AT 07.58.00 ON TUE FEB 28TH, 2017

     

    In the example above I used the hyphen to get many events,  You could do that or just the specific event you are looking for.

    You could create a REXX program that send the L LEVEL(<PREFIX>.-) command to ESP and check for SUSPENED.  Call the REXX program from a JCL Program that is an ESP Job that is scheduled at your desired interval. 

    Does that sound like something that would provide what you need?



  • 3.  Re: Any ideas for checking an event for suspension and providing an alert?

    Posted Feb 27, 2017 04:39 PM

    Hi Rick,

     

    I like your idea and I believe we may have function like that provided by our product support team. The missing link is identifying which events are considered "critical" so that an action can be taken without having a manual review. It sounds like another process is needed to scrub down the LIST results and bump them against a control member that contains the critical list. A concern here is by having a list means that it would need an owner and to be maintained.

     

    A tentative solution that I'm kicking around:

     

    1. In each "critical" event, insert a self-completing task (SCT) in the triggered APPL. The task would simply execute at a specific time. Could be right after generation load or 2 hours before execution.

     

    2. Create a new event and APPL that defines a self-completing task for each "critical" event uses the above task an external job dependency. I could then use the DUEOUT sensor to flag a suspended event an dispatch an alert. Another thought is replace the externals with resources and basically flip the resource on (using the above method), checking it for availability and then resetting it back to off for the next generation load. I would able to use DUEOUT for this method as well.

     

    It's not 100% fool proof, but I think it covers the 80/20 rule.

     

    Thoughts?



  • 4.  Re: Any ideas for checking an event for suspension and providing an alert?

    Posted Feb 28, 2017 08:15 AM

    The issues I see are

    • If the event is not identified as critical you will not report on it
    • Maintenance of the list

     

    Maybe it would be better to report on all suspend events.

    • Schedule a job at defined intervals to do L LEVEL(<PREFIX>.-) and write the output to a dataset
    • The next run of the job would compare the before and after datasets for the differences, Suspended events would be reported. In my environment we have SuperC and I created a JCL with PGM=ISRSUPC,PARM=('DELTAL,LINECMP' when needing to compare datasets

     

    This method would be self-maintaining not requiring Application group to inform you of what is critical. 


    The list of suspended events would need to be researched by Operations/Application owner.

     

    In a Production environment shouldn't all events be in a non-suspended state?