Automic Workload Automation

Expand all | Collapse all

U000006 DEADLOCK: All WP try to update EH, anyone had this before?

  • 1.  U000006 DEADLOCK: All WP try to update EH, anyone had this before?

    Posted Jan 10, 2018 07:05 AM
    Hi,

    We just had an outage of our production AE system (in it's last weeks of being a version 10.0.3+hf2), because all WP tried to do this simultaneously:

    SELECT EH.*,ROWID FROM EH WHERE EH_AH_Idnr = :A0001 FOR UPDATE

    Deadlock hilarity ensued, no new jobs could be started, and the "Messages" window printed nothing but "U000006: DEADLOCK".

    I resolved it by killing all but one WP. I was then able to restart the WP and the problem was gone.

    An incident has been filed with Automic, but out of curiosity: Anyone had this before?

    Cheers,
    Carsten


  • 2.  U000006 DEADLOCK: All WP try to update EH, anyone had this before?

    Posted Jan 10, 2018 11:22 AM
    To whom it may concern:

    It has been concluded very quickly (kudos for this to the Automic Support!), as the result of the incident we filed, that the recursive cancellation of a JOBP has effectively killed *) UC4 in version 10 **).

    I just analyzed the offending job plan, it has 59 JOBP in total across four nesting levels, and a total of about 2.100 objects.

    So you might want to take away from this that friends don't let friends cancel somewhat large job plans.


    *) I say "effectively killed" because this has been deemed a performance problem, which shall not see a fix in version 10. Automic maintains that it would have sorted itself out eventually, though how much time of effective service outage one would have to have endured to that point is anyone's guess.

    **) whether this is fixed in version 12 is undetermined. The Automic support contact mentioned that there are major improvements to the handling of large job plans in V12, but whether this addresses this particular problem has not been determined.

    Hope this helps.