We have been facing failures for in-flight transactions during failover between Siteminder policy server & CA Directory nodes.
Whenever we take one of the ca directory node ( router + data dsa both shutdown) on one box, policy server is not seamlessly failing over to other node.
The in-flight transactions are failing.
1. Understand that policy server supports in-built load balancing & failover then also shutdown of directory nodes is causing in-flight transactions to fail. Any reasons?
2. Understand that policy server establishes a ping connection and in an interval of 30seconds, it pings to check health status of each directory node. Is there an option to reduce this ping duration?
Due to 30 seconds, policy server may take for e.g. 30 seconds to detect that a node is down and all the ongoing operations to that node will fail. is this understanding correct?
3. Every authentication call to policy server would consist of Ldap ping, bind & search and Policy server establishes persistent connections to LDAP. Is this correct?
4. If yes, does it mean that when one directory server is going down, all the ongoing operations to that node should fail until failover has happened to other node?
A Possible setup ( need views if it's correct to do so or not):
a. We are thinking of setting up a load balancer between Siteminder policy server & CA Directory nodes.
b. In Policy server user directory configurations, to maintain multiple concurrent connection to directory nodes, we will mention LDAP laod balancer url multiple times.
c. Have a LDAP health-check monitor configured at Load Balancer level which will make a LDAP bind & search call every second and on 2 failures, would mark a node down and failover to other node.
d. This would reduce the time that is taken to detect that a directory node is down and would reduce no. of in-flight transaction failures.
e. This should take care of planned or unplanned shutdown of CA Directory nodes.
Let me know if anyone sees issues with this setup.