We currently use DUAS in a DQM load balancing configuration (1 node for scheduling, with logical queues that point to multiple remote physical queues that reside on execution nodes)
If a job runs on a Logical queue that points to 2 servers, the job could end up running on any of the two servers (round robin)
-> PhysicalQueue1 (ExecutionNode1)
-> PhysicalQueue2 (ExecutionNode2)
When the job reaches its launch window, it can start running on any of the 2 servers.
When we try to rerun the job, it also can rerun on any of the 2 servers.
Is there a way to force the job to rerun on the same server?
Rerunning the job on a different server does not make sense. ie: rerun job at different step --> step 1 to x run on server 1, step x to z run on server 2?
Unfortunately it's not possible to assign a job to a specific node.
Jobs submitted to a logical queue will be dynamically distributed across its associated physical queues in order to balance machine workload.
So a job submitted to a logical queue will potentially run on any of the physical queues associated with this logical queue and consequently on any node (where the physical queue for this execution resides). Notwithstanding, by default, DQM explores local physical queues first.