Lucy_zhang

CA Tuesday Tip: (CA ESP) Why HISTFILE Archive Job Can Cause Big Impact?

Discussion created by Lucy_zhang Employee on May 31, 2011
CA Tuesday Tip by Lucy Zhang, Principal Support Engineer for 5/31/2011

The HISTFILE retains job statistics on a long-term basis. It is used for creating a history report. Since it will create new records for every new run of job, over time it grows very fast, you must archive data from the history data sets to tape or disk. Running HISTFILE archive job periodically (weekly or monthly) is recommended.

Note: From r11.3, you can generate application history reports in a similar way to which you generate job history reports.

Reason for critical impact
This job needs to turn off the tracking first, and turn on the tracking after completes. When the TRACKING is set to NOSTORE, tracking requests are accumulated in the checkpoint data set, which is very small and will be out of space quickly. And since tracking data cannot be processed, job status is not updated.

If the archive job runs too long or fails, you will notice:
- Job status on CSF or GUI is not updated, new jobs will show “Queued for Submission”;
- Error messages start with “INSUFFICIENT CHECKPOINT” will show in JESMSGLG, like ESP1156E/ESP1157E/ESP335E. These errors may mean loss of schedule data, job tracking data and monitor notification data.
- HISTFILE archive job didn’t run last step and following commands were not issued:
OPER HISTFILE HIST1 OPEN
OPER TRACKING STORE

Important! Since TRACKING is turned off, the failure of the archive job can NOT be reported automatically.

How to avoid
- HISTFILE should have no or few extents, enough free space and big enough secondary allocation size, to guarantee it’s in good shape even in exceptional conditions. Recommend secondary allocation is 20% of primary allocation size; and please use your judgement when primary allocation is very large.
- When HISTFILE size is increased, remember to increase the archive data set and temporary data set size accordingly, otherwise the JCL will abend with B37. For example, ESP.HISTARCH and ESP. HISTTEMP in sample JCL on “Programmer’s Guide”. This is the most common reason that the archive job fails.
- Choose a time of low system activity to run this archive JOB.
Note: On support online, search “ESP HISTFILE”, you will find quite a few Knowledge documents for FAQs.

What to do if it occurs
- Restore the TRACKING by “OPER TRACKING STORE” from page mode to avoid any delays in active jobs.
- Check if HISTFILE has more space, ESP277E will show in JESMSGLG if it doesn’t have any free space:
- If yes, issue “OPER HISTFILE HIST1 OPEN” to catch up workloads and run archive job later after the error is corrected;
- If not, then you need to decide:
1. Run workloads without opened HISTFILE, the real time schedule will run fine, but the activities are not stored in HISTFILE and therefore are not available for history report;
2. Correct the archive job and rerun it. If it takes longer time, some critical jobs may miss their SLO.
3. Allocate a new HISTFILE and use it temporarily:
- Using IDCAMS, to define new HISTFILE
- Rename existing HISTFILE to .OLD, and new HISTFILE to existing name
- Issue “OPER HISTFILE HIST1 OPEN”

It will take normally 5-30 minutes to resume the processing. If “LISTCKPT” issued from page mode shows no difference on “HIGHEST ADDRESS USED” and “BYTES IMBEDDED FREE SPACE”, then COLD start will be needed to reformat the checkpoint file. Please review Chapter 3 on “Operator’s Guide” for the impact of starting CA WA EE system with parm=COLD.

Outcomes