Please Enhance Garbage Collect Stored Procedure

Idea created by Chris_Ryan_Thomas on May 16, 2016
    Under review
    • Eric Laney
    • ronro03
    • sgangaraboina
    • Sri_FAA
    • Breandan
    • Chris_Ryan_Thomas
    • MK_1

    Garbage Collect stored procedure running with a huge volume of data (# of records) will adversely affect CA Identity Manager J2EE IME performance due to the stress on the underlying CA Identity Manager Object store and Task persistence run-time databases. When running smaller IAMgarbageCollectTaskPersistence jobs, it will put less overall stress on the Identity Manager deployed environment


    There's a time limit within the IAMgarbageCollectTaskPersistence procedure hard-coded as one hour, which will prevent it from taking up too much sql resources with a this long running task, why not also include record count as a parameter to this procedure rather than just a cutoff time. Cutoff time is ambiguous and it will be easier for DBA's to decipher what are appropriate parameters to pass, especially when running this on a task persistence database in the millions of records. It already accepts cut-off date, but perhaps it could instead to be run in batches, similar to how the bulk loader client works to scale smaller batches of submissions into the IME via TEWS. 


    Additionally Garbage Collect Statistics table (iam_gcstats) lists all the cleanup data for the previous run, but instead produce these statistics prior to initiating the cleanup and provide feedback when trying to clean up too large of a load. Once a record count could be accounted for then it will be easier for a sql developer to automate the whole thing in a nightly sql job by passing a resonable amount of records to be cleaned per run

    run_id finishtasksession12_5archive_event12_5archive_event_object12_5object12_5runtimeStatusDetail12event12_5event_object12_5lock12_5archive_tasksession12_5archive_runtimeStatusDetail12archive_object12_5runtimeStatusDetailAttribute12archive_rsdAttribute12
    2016-04-29 16:20:38.9672016-04-29 16:20:40.100941     47     23     11     32     9       41         47     23     11     0       0

    Whether or not they’re nested tasks, or just records scattered throughout the various View Submitted tables, the records being archived / purged will be much more than tasksession12_5 root record count used (or calculated by trying various cutoff_times).  Suggest using a smaller record counts because their will be diminished performance benefits to using a larger record counts with this new IAMgarbageCollectTaskPersistence Stored procedure  The impact on CA Identity Manager will be less, if the duration of each run is shorter, but the frequency of each run is greater. I.E. loop the calls to procedure by sequentially passing new cutoff_times for a set duration. (for a few hours during the middle of the night). 


    To get comfortable with record count size to use and how long each IAMgarbageCollectTaskPersistence execution takes, it would be much easier if garbage collection statistics were outputted to the statistics table prior to the run and even better if a supplemental parameter of record count could be passed, which would cause the procedure to loop through the clean up in record count chunks.