Hi Ruber, I can speak to why this occurs in our shop. We are using DE and primarily have Windows and Unix agents. With Windows job types 99% of the time when a job is sitting in READY state for more than the normal couple seconds, the issue in our shop is that the ID the job is using does not have the correct password defined within the agent via the topology. We have problems time to time where our customers update the password in Active Directory but do not have the password updated in the topology so there is a mismatch which causes the ID to fail to login to the server when the job runs (you can see this in the security log within Event Viewer of the server the agent sits on). With the exception of a few of our Windows agents, the ID will attempt to login every minute up to 30 minutes (job will still be in READY state at this time). After 30 mins the job goes to SUBERROR state. I have seen once or twice the drive that contained the location where the spool files are being written, was full which causes jobs to stay in READY state.
For Unix job types we see a lot more issues related to the filesystem containing the spool location, being full and causing jobs to not be able to run. We also have seen the filesystem where the agent resides does not have any INODES available which causes the same thing.
For jobs stuck in READIED, as noted above it does not happen as much and when we do see that it seems to be a system wide issue affecting all agents (essentially an issue with the server that houses the DE manager). We had this issue occur a lot on our TEST environment. I recently increased the size of the HEAP and STACK size for the manager and have not seen that issue yet (been over a month now). I am thinking but not sure, that it was related to not enough memory being allocated to Java.
Hope this helps some.