Please see attached picture as the forum editor will not allow me to cut and paste my question into this space...
First off all, please go through the following KB to understand how the web agent to policy server connectivity works :
Then, coming back to your question , of when does the web agent determines that the policy server connection is bad is :
It determines the connection is bad when it cannot open the socket connection to the policy server on the policy server ports.
and I believe, in your case, when policy server is having issue with policy store connectivity and logs "reactor down" , it is my belief that , it is still accepting the tcp socket connection to those ports.
It is just that reactor threads are not available to properly initialize the agent to PS connections, which is why the agent is not failing over to the next available policy server.
I am not in front of my computer at the moment, but I could do some testing and validate this sometime next week and get back to you.
In the interim if you have any question, please let us know.
Thank you for your reply... If I understand correctly, the web agents simply checks if they can connect to a server. There is no "health monitor" check by the agents against the actual Policy Server application to determine if the Policy Server application is running appropriately, correct? If that is the case, it would seem that a health monitoring feature would be very important feature for agents to access for high availability environments. Or is there some recommended workarounds for that scenario? If not, it would seem that a "health monitor" that agents can check to verify the policy server application is running correctly should be an important feature to add by CA, would it not?
Thank you for your time!
That feature is already there.
The load balancing algorithm in webagent takes into consideration the health/response times of the policy servers such that the most healthy policy servers get to server most of the request.
I was talking more from the fail-over perspective earlier.
Please go through this new thread which I posted just recently it clarifies few of these :
Tech Tip : CA Single Sign-On :: PolicyServer::Cluster vs Non-Cluster Load balancing
Thank you for your quick reply... I'm trying to understand why the feature does not seem to work. Perhaps the algorithms do not take into consideration the "reactor down" issue when the Policy Store was booted up after the Policy Server?
Also, we have seen scenarios when only a single policy server (among 4 policy servers listed on the HCO) has problems with a User Directory, an application web agent simply locks into that single problem Policy Server (in the most recent example, the problem policy server was listed first on the HCO, but the HCO did have 3 other working Policy Servers listed) such that the application in question was considered down since no one could access the SiteMinder protected application. I can only conclude that the algorithm used by the web agent still thought the "problem" Policy Server was the most "efficient" Policy Server to work with, even though all requests (as oppose to connections) to that policy server essentially failed due to authentication errors due to the specific policy server was having issues working with the User Directory.
It would be interesting to understand what is included in the algorithm that web agents use to determine what is a "valid" policy server.
Some thing to add :
1) If the policy store goes offline , after the Policy server has completed loaded the Policy store, the Policy server should continue to work with it's local cached copy.
2) You said , the HCO did have 4 policy servers, was it configured in load balance or failover mode ?
To answer your questions :
(1) Since the Policy Store was offline when the Policy Server was booted up, there was no local cached copy… eg. “Reactor Down” error
(2) Load Balance mode : see attached picture
Retrieving data ...