Symantec Access Management

Expand all | Collapse all

CA SSO & CA Directory failover

  • 1.  CA SSO & CA Directory failover

    Posted Feb 13, 2018 09:38 AM

    Hi All,
    We have been facing failures for in-flight transactions during failover between Siteminder policy server & CA Directory nodes.

     

    Problem:
    Whenever we take one of the ca directory node ( router + data dsa both shutdown) on one box, policy server is not seamlessly failing over to other node.
    The in-flight transactions are failing.

     

    Questions:
    1. Understand that policy server supports in-built load balancing & failover then also shutdown of directory nodes is causing in-flight transactions to fail. Any reasons?

    2. Understand that policy server establishes a ping connection and in an interval of 30seconds, it pings to check health status of each directory node. Is there an option to reduce this ping duration?
    Due to 30 seconds, policy server may take for e.g. 30 seconds to detect that a node is down and all the ongoing operations to that node will fail. is this understanding correct?

    3. Every authentication call to policy server would consist of Ldap ping, bind & search and Policy server establishes persistent connections to LDAP.  Is this correct?

    4. If yes, does it mean that when one directory server is going down, all the ongoing operations to that node should fail until failover has happened to other node?


    A Possible setup ( need views if it's correct to do so or not):

     

    a. We are thinking of setting up a load balancer between Siteminder policy server & CA Directory nodes.
    b. In Policy server user directory configurations, to maintain multiple concurrent connection to directory nodes, we will mention LDAP laod balancer url multiple times.
    c. Have a LDAP health-check monitor configured at Load Balancer level which will make a LDAP bind & search call every second and on 2 failures, would mark a node down and failover to other node.
    d. This would reduce the time that is taken to detect that a directory node is down and would reduce no. of in-flight transaction failures.
    e. This should take care of planned or unplanned shutdown of CA Directory nodes.

     

    Let me know if anyone sees issues with this setup.

      

    Regards,
    Neeraj Tati



  • 2.  Re: CA SSO & CA Directory failover
    Best Answer

    Posted Feb 13, 2018 08:36 PM

    My comments below.

     

    Questions:
    1. Understand that policy server supports in-built load balancing & failover then also shutdown of directory nodes is causing in-flight transactions to fail. Any reasons?

     

    Ujwol => No reason. This will need analysis of the polcy server trace logs. This is not expected behavior. There were few LDAP failover issues in the past. Please open a support ticket so we can check this in detail.

     

    2. Understand that policy server establishes a ping connection and in an interval of 30seconds, it pings to check health status of each directory node. Is there an option to reduce this ping duration?

     

    Ujwol => Yes, there is option to change the LDAP server check interval.

    Configure LDAP Storage Options - CA Single Sign-On - 12.52 SP1 - CA Technologies Documentation 

    Configure LDAP Server Checker Interval
    Once the connections with the LDAP directory servers are established, CA SSO regularly checks the availability of the LDAP servers.
    LDAPServerCheckerInterval
    Specifies how often (in seconds) the Policy Server polls the LDAP servers to retrieve the availability information.
    Default: 30 sec (This value is also used when the registry setting does not exist.)
    To configure this setting, you must add the DWORD value key LDAPServerCheckerInterval in the following registry location and update the value:

    HKEY_LOCAL_MACHINE\SOFTWARE\Netegrity\SiteMinder\CurrentVersion\Debug


     

     

    Also have a look at this :

    Tech Tip - CA Single Sign-On: PolicyServer :: LDAPPingTimeout Explained 


    3. Due to 30 seconds, policy server may take for e.g. 30 seconds to detect that a node is down and all the ongoing operations to that node will fail. is this understanding correct?

     

    Ujwol=> 30 seconds is maximum time it may take to detect a bad server using Ping thread.

    It is possible that it could be detected earlier during authentication call itself.

    If an operation fails to particular LDPA server, PS will immediately failover to next available server.

    Requests will no more be sent to bad server once the connecftion to it fails.

     

    3. Every authentication call to policy server would consist of Ldap ping, bind & search and Policy server establishes persistent connections to LDAP.  Is this correct?

     

    Ujwol=> No. Not that every authenticall call to Policy server will result in creating those 3 connections.

    Yes the connections are persistent.

    Please refer to this blog to understand more on how the connections are managed :

    Tech Tip - CA Single Sign-On: Policy Server :: Policy Server Hung if LDAP User Directory is unresponsive/slowly performing. 

     

    4. If yes, does it mean that when one directory server is going down, all the ongoing operations to that node should fail until failover has happened to other node?

    Ujwol => As I said before, the first thread which detects the connection is bad will mark it bad and no other thread will use it any more.

     

     

    For the suggestion about configuring external LB for LDAP , you can certainly do that.

    What that means is we are basically offloading Load balancing and failover of LDAP to LB.

     

    Regards,

    Ujwol



  • 3.  Re: CA SSO & CA Directory failover

    Posted Feb 15, 2018 04:43 PM

    Hi Ujjwal,
    Thanks for sharing such useful links. It has provided great insight on this concept.

    I have few more questions to clear my doubt.

     

    1. Are all connections LDAP Ping, Search & User made under one thread?
    If yes, does it mean every new Thread that Policy server opens to handle more requests will have it's own LDAP connections?

    2. I believe there will be only one ping connection for every thread, but can a policy server thread have more than one LDAP Search & User connection?
    My purpose of asking this question is to understand the concurrency of logins. If my authentication TPS is 200, then how Policy server wil distribute 200 login calls to LDAP User connection?
    How many user logins can one LDAP User connection handle concurrently? or concurrency is achieved by through more no. of worker threads?

    3. What's the fallback mechanism for Ping threads to know if a node is back to running state? Will ping thread still validate the connection status of bad node every 30sec?

    4. which connection is responsible to first detect that a node is down? Is it LDAP User or LDAP ping connection?

    I am thinking of impact here:
    If Authentication TPS is 200 & If I reduce LDAPPingTimeout field to 2 sec
    Will there be any difference in time policy server takes to find out if a node is down?

     

    Regards,

    Neeraj Tati



  • 4.  Re: CA SSO & CA Directory failover

    Posted Feb 15, 2018 11:08 PM

    Hi Neeraj,

     

    My comments below.

     

    1. Are all connections LDAP Ping, Search & User made under one thread?
    If yes, does it mean every new Thread that Policy server opens to handle more requests will have it's own LDAP connections?

     

    Ujwol => No , it doesn't work like that. LDAP connections are shared by all the worker threads.

    The no of LDAP connections created by Policy server is dependent on the number of LDAP banks.

    If you have 1 LDAP bank , the max no of connection to that LDAP server is just 3 (1 Ping, 1 Search, 1 User connection).

    These "search" and "user" connections are shared by all the worker threads.

    Total no of LDAP connections = Number of LDAP Banks * 3

    You can refer to this thread for more details on how to configure LDAP Banks :

    How to configure LDAP banks 

     

     

    2. I believe there will be only one ping connection for every thread, but can a policy server thread have more than one LDAP Search & User connection?
    My purpose of asking this question is to understand the concurrency of logins. If my authentication TPS is 200, then how Policy server wil distribute 200 login calls to LDAP User connection?
    How many user logins can one LDAP User connection handle concurrently? or concurrency is achieved by through more no. of worker threads? 

     

    Ujwol => The number of authentication TPS is dependent on number of worker thread plus also the number of LDAP banks (in turn no of available LDAP connections ) . If for e.g you have 16 worker thread but only 1 Search and 1 User connections , if the LDAP connection is being used, the remaining worker thread will need to wait until the LDAP connection is available . This may create bottleneck resulting in the drop in the TPS for authentication/authorization calls.

     

    3. What's the fallback mechanism for Ping threads to know if a node is back to running state? Will ping thread still validate the connection status of bad node every 30sec?

     

    Ujwol => Yes. Ping thread will continue to montior the LDAP connectivity every 30 seconds irrrespective of whether it is down or up.

     

    4. which connection is responsible to first detect that a node is down? Is it LDAP User or LDAP ping connection?

    Ujwol => It depends on which one is used first. Mostly it's the Ping connection which determines bad server first.

     

    I am thinking of impact here:
    If Authentication TPS is 200 & If I reduce LDAPPingTimeout field to 2 sec
    Will there be any difference in time policy server takes to find out if a node is down?

     

    Ujwol => LDAPPingTimeout  of 2 second is quite low. Often if the LDAP server is under heavy load , it would take much longer to perfrom ping search. Having very low value may give false result causing unnecessary failover and failback.

     

     

    In general if you want to increase auth/az throughput , you will need to increase LDAP banks. There isn't any forumula to caculate how many LDAP banks is appropriate. You will need to run load test and try increasing the LDAP banks & worker threads till the desired througput it reached.



  • 5.  Re: CA SSO & CA Directory failover

    Posted Feb 16, 2018 12:00 AM

    Thanks Ujwol. We have 10 LDAP banks configured in below fashion.

     

    <LDAP_Host1_IP> a11 a12 a13 a14 a15 a16 a17 a18 a19 a110

    <LDAP_Host2_IP> a21 a22 a23 a24 a25 a26 a27 a28 219 a210

     

    User directory Object configurations:

    Ldap group 1 : a11:port a21:port

    Ldap group 2 : a12:port a22:port

    ....

    ...

    Ldap group10: a110:port a210:port

     

    So a bank has two ldap servers (failover within) & all banks are in HA load balanced mode defined.

     

    Q1. So may I know if I have 10 banks, each bank having 2 LDAP server, will there be 30 or 60 connections?

     

    It depends on which one is used first. Mostly it's the Ping connection which determines bad server first.

    Q2. On this, my understanding is that on high traffic systems, LDAP User connection will always be busy and as you said earlier, policy server should failover as soon as authentication bind is not working. 

     

    This makes me think it should always be LDAP User connection detecting failure unless there is no authentication call coming to policy server. is it correct?

     

    Q3. We have CA Dir used as User store and we have below setting:

             set max-op-time = 5 (seconds)

           So, CA Dir will timeout any operation after 5 seconds. Do you think if setting PingTimeout=5 sec would align it with LDAP connection timeout value?  And would it help policy server determine faster if a ldap node is down?

     

    Regards,
    Neeraj Tati

     

     

     



  • 6.  Re: CA SSO & CA Directory failover

    Posted Feb 19, 2018 12:57 AM

    Q1. So may I know if I have 10 banks, each bank having 2 LDAP server, will there be 30 or 60 connections?

    Ujwol => Only 30. It's only 3 connections per fail-over group. So even if you have X LDAP servers defined under one failover group , PS will at anytime establish only 3 connections.

     

    It depends on which one is used first. Mostly it's the Ping connection which determines bad server first.

    Q2. On this, my understanding is that on high traffic systems, LDAP User connection will always be busy and as you said earlier, policy server should failover as soon as authentication bind is not working. 

     

    This makes me think it should always be LDAP User connection detecting failure unless there is no authentication call coming to policy server. is it correct?

     

    Ujwol=> Yes.

     

    Q3. We have CA Dir used as User store and we have below setting:

             set max-op-time = 5 (seconds)

           So, CA Dir will timeout any operation after 5 seconds. Do you think if setting PingTimeout=5 sec would align it with LDAP connection timeout value?  And would it help policy server determine faster if a ldap node is down?

     

    Ujwol => Yes, makes sense. 



  • 7.  Re: CA SSO & CA Directory failover

    Posted Feb 20, 2018 09:42 AM

    Great. Thank you for such clear explanation. I have one last thing to clarify on this topic. I want to understand the need of sticky/persistent session if there is a load balancer between

    1. Siteminder & CA Directory User Store

    2. Application Server & CA Directory User Store

     

    Is it mandatory to have sticky session enabled on Load Balancer?

     

    I am thinking of a use-case:

    a. If app server instance 1 makes an update to LDAP user store instance 1 and it user store instance 1 goes down before it could replicate update to instance 2. 

    b. if next request from app server is to user store instance 2, it would turn into failure. 

     

    Regards,

    Neeraj tati



  • 8.  Re: CA SSO & CA Directory failover

    Posted Feb 20, 2018 04:24 PM

    Hi Neeraj,


    Do you mind posting this question in CA directory forum as they are the best to answer that:


    https://communities.ca.com/community/ca-security/content?filterID=contentstatus%5Bpublished%5D~category%5Bca-directory%5D


    Regards,

    Ujwol



  • 9.  Re: CA SSO & CA Directory failover

    Posted Feb 22, 2018 02:15 PM

    As Ujwol suggested it is best to discuss in CA Directory forum. But here's a basic design question for you NeerajChase as far as CA Directory is concerned. Have you worked with CA Directory Router DSA and CA Directory Data DSA. I mean you are talking about a Physical Load Balancer (external to CA Directory) and all the jazz that comes with the overhead of an external LB, but have you really explored what is available within the Product minus the jazz.

     

    Lastly please open a new thread. But be prepared to be examined (Cross Questioned) on the use of an external Load Balancer with CA Directory, because no where are you referencing what is best practice recommended as per the product documentation.

     

    https://docops.ca.com/ca-directory/12-6/en/administrating/performance-and-tuning

     

    Regards

    Hubert



  • 10.  Re: CA SSO & CA Directory failover

    Posted Mar 15, 2018 01:21 PM

    Ujwol 

    We have two Session Stores running on different box. can you tell me below setting in SMCONSOLE is using Round-Robin OR Fail-over ? note- we don't have router configured.

     



  • 11.  Re: CA SSO & CA Directory failover

    Posted Mar 15, 2018 01:27 PM

    Vipul vkaneriya

     

    It is always recommended to open a new thread for a new question. If needed we can reference an old thread within the new thread.

     

    That said. It is failover. Request will always go to server01. When server01 is unavailable, it will failover to server02. Hence if you do not have replication configured within CA Directory, when request does failover, there will be no session information in server02.