AutoSys Workload Automation

Expand all | Collapse all

We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

  • 1.  We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Jul 27, 2018 09:38 AM

    CHASE report:

     

    FILE WATCHER JOB:

    CAUAJM_I_50161 Examining job: <file watcher job>

      CAUAJM_I_50162 Job has been in the STARTING state more than 120 seconds. Manual intervention may be required.

     

    FILE TRIGGER JOB:

    CAUAJM_I_50161 Examining job: <file trigger job>

      CAUAJM_I_50162 Job has been in the STARTING state more than 120 seconds. Manual intervention may be required.

     

    COMMAND JOBs:

    CAUAJM_I_50161 Examining job: <command job>

      CAUAJM_I_50162 Job has been in the STARTING state more than 120 seconds. Manual intervention may be required.

     

    CAUAJM_I_50161 Examining job: <command job>  

      CAUAJM_I_50162 Job has been in the STARTING state more than 120 seconds. Manual intervention may be required.



  • 2.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Jul 28, 2018 08:10 PM

    Hello Noe,

     

    I am unsure of the ask. Do you want to stop the CHASE alarms to stop or automatically clear the alarms, or you are saying these are false positives?

    If the jobs are getting stuck in STARTING then that's what needs to be addressed?

     

    Thanks,

    Chandru



  • 3.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 07, 2018 05:35 AM

    Hello Noe

     

    We have fixed an issue with false chase alarms in the past.

     

    The System Agent and Scheduler must be upgraded

    To see the minimum maintenance level of the System Agent and Scheduler, click this link: CHASE

     

    If you are already at these levels or above, further investigations are required and I would suggest to open a case at CA Support

     

     

    Regards

    Jean Paul



  • 4.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 07, 2018 08:39 AM

    Thanks Jean & Chandru,

     

    the issue here is a job is reporting to be STUCK in STARTING state via the CHASE command.

    The job is STUCK and needs to be dealt with manually.

     

    So we must investigate and reset the job's status = terminated to help restart the job's processing.

     

    These situation come about when the Scheduler posted the job = STARTING and waiting for the Agent to start the process on that machine.

     

    that never happens.

     

    as the Agent has been shutdown, maybe by a reboot of other patching.

     

    So the job is STUCK in STARTING state.

     

    What I am looking for is a way to automate the resetting of the job either to TERMINATED or request the Scheduler to re-send the START of the job to the Agent after it comes back up. 

     

    the problem comes about I that we only know this by running the CHASE command and reporting on the results of the output.

     

    Looking for an automated solution.

     

    Thanks,

    Noe



  • 5.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 07, 2018 08:50 AM

    Hi Noe

     

    Thanks for the details, and in this specific situation, what about  the chase -E option?

     

    -E
    (Optional) Puts a job in FAILURE status when the job and its agent are not running on the client but the database indicates they should be.

    In this case, the job restarts if the job definition includes the n_retrys attribute.

    The scheduler must be running for chase to automatically restart jobs.

     

    Regards

    Jean Paul

     



  • 6.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 07, 2018 08:56 AM

    Yes, we use the -E option to restart/fail jobs stuck in RUNNING state, but use of that option does nothing for jobs stuck in STARTING state.



  • 7.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 07, 2018 09:00 AM

    are these real machines or are they netscalers loadbalancing F5s? 

    also never use -E as it can be wrong sometimes. you are better off getting the alarm and checking on it by hand. 

    you can also increase the interval it waits with CHASE_STARTING_WAIT_INTERVAL=<seconds>

    Also be warned STUCK in starting DOES NOT mean the job didnt run. (unless you have an internal auto_ping alarm or a comm_err_5/14)

    STUCK in starting just means that the agent has NOT reported back yet and many things can cause that .. so i wouldn't terminate the job without knowing it hasnt actually run.

    This paradigm was always true from 2-4X through R11. Never Assume it didn't run without really checking first. 

    hope that helps 

    good luck 

    Steve C.



  • 8.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 07, 2018 09:20 AM

    I agree with Steve, may things can cause that.

    Another example: Jobs on an agent host are stuck in STARTING status. The job detail report displays a STATE_CHANGE event with comment <Waiting for initiator>

     

    It would be better to find out the root cause of such a problem by looking into the scheduler and system agent log files at same time.

     

    Some details about CHASE_STARTING_WAIT_INTERVAL and CHASE_SLEEP at this url: Chase variables

     

    Regards

     

    Jean Paul



  • 9.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 07, 2018 09:35 AM

    we always manually check jobs listed in STARTING state

    They always are related to a shutdown of the agent due to patching / reboots.

    The -E option work great for jobs stuck in RUNNING state, as it forces the job = FAILURE and then the application support can review the failure and or it is coded for RESTART(n_retrys). 

     

    also this is not Mainframe related (<Waiting for initiator>)

     

    this situation has been with us since r4.5

    We have looked at the agent log and the agent knows nothing about this job.

     

    Scheduler send it to Agent , but agent never got it.

    Thus STARTING state

     

    Just looking for an automated way to resolve, without manual intervention.



  • 10.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 07, 2018 10:00 AM

    If you have the most recent agent .. then during a reboot you should get a job failure

    It also depends the order in which the agent shuts down.

    All of these issues come into play.

     

     

     

    Steve C.

     

     

    Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.

     

    Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.



  • 11.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 19, 2018 03:51 AM

    Hello Noe

     

    If there is still an issue, please open a case at CA support and attach the following documentation to the ticket,

    - the event_demon log with the chase alarm

    - all the files from the log and spool directories of the System Agent

    - the agentparm.txt file

    - the output of 'autoflags -a' from the scheduler

    - the output of 'cybAgent -v' from the System agent

    - if the System Agent is on Unix/Linux, a full 'ps -ef ' listing

    - the output of 'autorep -J job -d' of the Job reported in the chase alarm

     

    Regards

    Jean Paul



  • 12.  Re: We see Autosys jobs STUCK in STARTING state (Mostly File Watcher or File Trigger, but can be Command job) appearing in the CHASE reports, is there an automated way to clear/resolve this issue?

    Posted Sep 19, 2018 05:46 AM

    Hi NoeParenteau

     

    This issue is already reported in the community link

     

    https://communities.ca.com/message/242115655-re-chase-stuck-in-starting-for-x-mins-question

     

    and there some suggestions which resolved the problem

     

    Please let me know if the answer provided resolved your issue.

     

    Thanks and warm regards

     

    Faouzia