Anh-Thai_Nguyen_7342

Rerun jobs on same server in DQM Load Balancing configuration

Discussion created by Anh-Thai_Nguyen_7342 on Feb 27, 2017
Latest reply on Sep 15, 2017 by Adm10_Groupe_4407

Scenario

We currently use DUAS in a DQM load balancing configuration (1 node for scheduling, with logical queues that point to multiple remote physical queues that reside on execution nodes)
If a job runs on a Logical queue that points to 2 servers, the job could end up running on any of the two servers (round robin)
LogicalQueue 
-> PhysicalQueue1 (ExecutionNode1)
-> PhysicalQueue2 (ExecutionNode2)

When the job reaches its launch window, it can start running on any of the 2 servers.
When we try to rerun the job, it also can rerun on any of the 2 servers.

Is there a way to force the job to rerun on the same server?
Rerunning the job on a different server does not make sense. ie: rerun job at different step --> step 1 to x run on server 1, step x to z run on server 2?


Answer

Unfortunately it's not possible to assign a job to a specific node.

Jobs submitted to a logical queue will be dynamically distributed across its associated physical queues in order to balance machine workload.

So a job submitted to a logical queue will potentially run on any of the physical queues associated with this logical queue and consequently on any node (where the physical queue for this execution resides). Notwithstanding, by default, DQM explores local physical queues first.


Outcomes