I have a support case open and am expecting some valuable information from Automic, which I would contribute here. In the mean time though:
Automation Engine communicates via Message Queues. During a Zero Downtime Upgrade ("ZDU"), there are two sets of message queues, one for the old and one for the new version (each one collectively refered to as an "MQ set"). Automic recently told me that one can not finalize the ZDU until all affected objects in the old MQ set (meaning, those started before the ZDU began) have been deactivated once in AE, and restarted on the new MQ set.
I have not found sufficient documentation on this yet. It does seemingly not affect all object types, because the FAQ claims that JOBS are not blocking the ZDU (thanks FrankMuffke for the info). BUT we have a wealth of jobs in "Waiting for Predecessor", according to Automic, those in fact do block the ZDU.
It does probably also affect Schedulers, Events and C_PERIOD objects. But I have also been told that one or more object type (Schedulers, Events or both, it's not entirely clear to me) restart itself every 2 weeks in version 12, thus not blocking ZDU completion if you wait up to two weeks between starting and finishing the ZDU.
Please provide me with some idea if you already knew about this, or if this is news to you as well.
While it sounds logical in retrospect, it severely diminishes the value of a ZDU for us, since I'd have to get all my users to deactivate their stuff ("stuff" yet to be uniquely identified), which will be difficult, given the autonomy our Automic users (no pun intended) enjoy.
Side note, it is as of yet unexplained why some of our ZDU in the past worked at all - given this new information, I would have expected every single ZDU to fail in the past, but some actually worked.
p.s. if you can provide any insights into details of this mechanism, e.g. what objects or states do block the ZDU and which don't, much appreciated.