One possible case, an object in IdM was renamed that was under workflow control and had an active job in the queue duuring the modification. If you've come across others, please post and comment below.
Problem and Troubleshooting:
seeing large server.logs, and find what appears to be “looping” in the CA identity manager application (IdM)
application logs fill disk capacity very quickly.occurring on only 1 of 2 servers in a cluster. restarted the application multiple times, both with and without the other server in the cluster being available…. No change…..deleted the …JBoss/server/all/ folders for data, tmp, and work, and restarted… no change. can run the application on the other server, with no issues.
The errors we see in the logs are very similar, and are repeated over and over:
2015-02-25 09:23:06,843 WARN [ims.tmt.eventlistener] (WorkManager(2)-10) Evt IMTaskEvent:3b65e232-61ee1279-bb6aea66-96d0e8 is invalid.
2015-02-25 09:23:06,843 ERROR [ims.tmt.IMSMessageListener] (WorkManager(2)-10) Exception Occured during event processing.
[facility=4 severity=2 reason=0 status=38 message=No items found]
It appears we have a “hung” task…..after updating the Provisioning Manager IdMgr setup to point to the DNS/LB only.
stopped and restarted the server with the errors, the errors moved to the other server in the cluster.
now see the errors in both servers in the cluster, and the server.log files are filling up fast.
turned on DEBUG (for ims) and captured more info:
2015-02-27 13:56:12,932 INFO [ims.tmt.persistence] (WorkManager(2)-15) PersistenceProvider: JMS:ID:JBM-f669fc4a-2f7b-4910-bd1c-07aa22205728: starting commitTransaction
2015-02-27 13:56:12,932 INFO [ims.tmt.persistence] (WorkManager(2)-15) PersistenceProvider: JMS:ID:JBM-f669fc4a-2f7b-4910-bd1c-07aa22205728: finished commitTransaction
2015-02-27 13:56:12,932 INFO [ims.tmt.persistence] (WorkManager(2)-15) PersistenceProvider: JMS:ID:JBM-f669fc4a-2f7b-4910-bd1c-07aa22205728: getLock: lockid=3b65e232-61ee1279-bb6aea66-96d0e8 lockType=TASK
2015-02-27 13:56:12,932 DEBUG [ims.tmt.persistence.sql] (WorkManager(2)-15) PersistenceProvider: JMS:ID:JBM-f669fc4a-2f7b-4910-bd1c-07aa22205728: conn: org.jboss.resource.adapter.jdbc.jdk5.WrappedConnectionJDK5@70c73944 sql: update lock12_5 WITH (ROWLOCK) set lockinfo=? where lockid=? and lockinfo=?
2015-02-27 13:56:12,933 DEBUG [ims.tmt.persistence] (WorkManager(2)-15) PersistenceProvider: JMS:ID:JBM-f669fc4a-2f7b-4910-bd1c-07aa22205728: lockObject: locktype=TASK lockid=3b65e232-61ee1279-bb6aea66-96d0e8 exists - retried 0 times
2015-02-27 13:56:12,933 INFO [ims.tmt.persistence] (WorkManager(2)-15) PersistenceProvider: JMS:ID:JBM-f669fc4a-2f7b-4910-bd1c-07aa22205728: updateEvent1: eventID=3b65e232-61ee1279-bb6aea66-96d0e8 newState=invalid
2015-02-27 13:56:12,933 DEBUG [ims.tmt.events] (WorkManager(2)-15) IMTaskEvent IMSEvent.setAttribute next_state:null >> invalid
2015-02-27 13:56:12,934 DEBUG [ims.tmt.persistence.sql] (WorkManager(2)-15) PersistenceProvider: JMS:ID:JBM-f669fc4a-2f7b-4910-bd1c-07aa22205728: conn: org.jboss.resource.adapter.jdbc.jdk5.WrappedConnectionJDK5@70c73944 sql: select * from tasksession12_5 WITH (NOLOCK) where tasksessionid=?
in the Identity Manager, view submitted tasks and this task appears to have “FAILED”, yet the timestamp continues to grow.
Diagnosis and Solution:
NOT FIT FOR PRODUCTION USE, IF HELP IS NEEDED ON PRODUCTION, PLEASE CONTACT SUPPORT.
*please back up IdM Object store and stop your IdM application server before proceeding.
Drop all your JMS tables. Also delete everything under jboss\server\default temp.
drop table HILOSEQUENCES;
drop table JBM_COUNTER;
drop table JBM_DUAL;
drop table JBM_ID_CACHE;
drop table JBM_MSG;
drop table JBM_MSG_REF;
drop table JBM_POSTOFFICE;
drop table JBM_ROLE;
drop table JBM_TX;
drop table JBM_USER;
drop table TIMERS;
some object associated with the User task have likely been renamed or changed in some way. There's a workflow approval that's referencing something old and it's unable to find it. Maybe worth looking over your backups of roles and tasks on this environment to try to identify what the change was, unless perhaps you have auditing enabled in this environment and you can look it up within there.
Here's how we can figure out and identify the problematic looping events, but if it gets too technical, I would suggest you just take a backup of your object store schema and upload it to to support.
Analyze your data in VIew Submitted Tasks (VST):
*use the following query to find currently looping events that are not completed or cancelled:
select * from event12_5 where state not in (128,256) and description like '%No items found:%';
(this will return a list of tasksessionid's)
All of these records we're going to need to update in order to correct this. It's probably best for you DBA to develop a cursor which will iterate through the results of the previous query and update rows returned as follows:
update tasksession12_5 set state='256' where tasksessionid='***';
update event12_5 set state='256'
where tasksessionid='***' and state not in (128);
You can then query to get the list of currently In Progress
tasks and you reviewed that list of tasksessionid to see which ones you
wanted to cancel and you provided.
SELECT * FROM tasksession12_5 where state='4';
More info on states and what they mean, can be found here.
Please post with any questions or concerns.
Thank you.
Regards,
Chris Thomas
CA Technologies
Principal Support Engineer
IdentityMinder Reporting Expert
Tel: +1-631-342-4360
Chris.Thomas@ca.com
Message was edited by: Christopher Ryan THOMAS