My comments below.
Questions:
1. Understand that policy server supports in-built load balancing & failover then also shutdown of directory nodes is causing in-flight transactions to fail. Any reasons?
Ujwol => No reason. This will need analysis of the polcy server trace logs. This is not expected behavior. There were few LDAP failover issues in the past. Please open a support ticket so we can check this in detail.
2. Understand that policy server establishes a ping connection and in an interval of 30seconds, it pings to check health status of each directory node. Is there an option to reduce this ping duration?
Ujwol => Yes, there is option to change the LDAP server check interval.
Configure LDAP Storage Options - CA Single Sign-On - 12.52 SP1 - CA Technologies Documentation
Configure LDAP Server Checker Interval
Once the connections with the LDAP directory servers are established, CA SSO regularly checks the availability of the LDAP servers.
LDAPServerCheckerInterval
Specifies how often (in seconds) the Policy Server polls the LDAP servers to retrieve the availability information.
Default: 30 sec (This value is also used when the registry setting does not exist.)
To configure this setting, you must add the DWORD value key LDAPServerCheckerInterval in the following registry location and update the value:
HKEY_LOCAL_MACHINE\SOFTWARE\Netegrity\SiteMinder\CurrentVersion\Debug
Also have a look at this :
Tech Tip - CA Single Sign-On: PolicyServer :: LDAPPingTimeout Explained
3. Due to 30 seconds, policy server may take for e.g. 30 seconds to detect that a node is down and all the ongoing operations to that node will fail. is this understanding correct?
Ujwol=> 30 seconds is maximum time it may take to detect a bad server using Ping thread.
It is possible that it could be detected earlier during authentication call itself.
If an operation fails to particular LDPA server, PS will immediately failover to next available server.
Requests will no more be sent to bad server once the connecftion to it fails.
3. Every authentication call to policy server would consist of Ldap ping, bind & search and Policy server establishes persistent connections to LDAP. Is this correct?
Ujwol=> No. Not that every authenticall call to Policy server will result in creating those 3 connections.
Yes the connections are persistent.
Please refer to this blog to understand more on how the connections are managed :
Tech Tip - CA Single Sign-On: Policy Server :: Policy Server Hung if LDAP User Directory is unresponsive/slowly performing.
4. If yes, does it mean that when one directory server is going down, all the ongoing operations to that node should fail until failover has happened to other node?
Ujwol => As I said before, the first thread which detects the connection is bad will mark it bad and no other thread will use it any more.
For the suggestion about configuring external LB for LDAP , you can certainly do that.
What that means is we are basically offloading Load balancing and failover of LDAP to LB.
Regards,
Ujwol