AnsweredAssumed Answered

Working (non-F5) load balancer setup in an AA configuration

Question asked by JussiValkonen on Aug 11, 2017
Latest reply on Aug 21, 2017 by JussiValkonen

CA officially supports F5 load balancer and an example configuration can be found on the Wiki article and its comments. However, we don't have an F5 so we're looking into alternative solutions. So far all the solutions we've tried have had their shortcomings one way or another, making them not really fit for our needs. The implementation guide has a chapter that sounds promising, "Configure the Load Balancer" but it falls miles short when the only part actually talking about load balancing is "Configure the session persistence on each load balancer. For more information, see your load balancer document. This process ensures that a request coming from one application server is routed back to the same application server." This is complemented by the Wiki article and comments I mention above, but they all have one major shortcoming: What does the user see when an application server is quiesced?

I understand providing an example configuration for all the available software and hardware load balancers out there is impossible. What I'm looking for is whether anyone has ever managed to configure the load balancing the way that would need our needs. Below are the key details of our environment and the requirements/wishes for the load balancer we use.

Environment

  • Windows environment
  • 14.1 AA configuration with
    • 1 BG server
    • 1 stand-by server
    • 4 application servers
    • Using Cisco ACE for load balancing.

Current scenario

The Cisco ACE serves requests to a single host name and transparently redirects the traffic to the chosen app server, so the users only see the hostname of the LB. The LB is configured to monitor the HealthServlet response on each of the app servers and to take down the app server from the load balancing pool when the Health Servlet no longer returns a HTTP 200 response. The LB has sticky sessions configured to make sure the users always use the same app server.

Challenges

  • As you might've already noticed on the scenario description, the LB health probes aren't the smartest ones. Whenever the app server is quiesced it will instantly throw out any user connected instead of allowing them to gracefully log out and have their connection re-balanced.
  • Because of the first item the possibilities the AA configuration theoretically grants us are lost; we can't do maintenance during the service hours as the users will get disrupted in a non-acceptable way. The quiesce mode with its "please log out" notification goes half the way but the LB solutions we've seen so far do not play nice with this.
  • HealthServlet returns 5xx response even when the app server is still good to serve the existing users. Current LB solution doesn't know the difference between quiesced and shut down app server.
  • Even if the above issues are resolved it still leaves the issue of sticky sessions and logout. For the duration of the quiesce the user would be logged out, unable to log back in but still stuck to the server being quiesced, which leads us to the last item...
  • Re-balancing of the connection seems to be rather hard, it seems the jsessionid (that most of the load balancers use for sticky sessions) is set somewhere inside the webengine and even though creating a filter for Tomcat to remove the cookie from the response might address this issue it still leaves the IIS unresolved.

Goal

We hope to retain a single host name towards the users but so far any of the solutions capable of doing that hasn't met the other requirements, so we're OK to let that requirement slide. So in essence the requirements are these:

  • Sticky sessions until logout and a re-balancing of the connection on logout. This would spread the load evenly among remaining app servers when one is quiesced
  • Existing sessions must remain instead of instant re-balancing (and lost data) when the app server is quiesced. It seems nginx might be able to do this with the "drain" directive on a node but haven't seen it done yet so I'm a bit sceptic.
  • Single host name towards the users (negotiable)

So do you have or do you know that someone has a setup like this? If we let the single host name requirement go, even a simple round-robin balancer that doesn't channel the subsequent traffic through itself would work for us, just didn't find a working solution yet.

Outcomes