RCA_Agile_Central-2017-11-08

File uploaded by JohnStreeter Employee on Nov 14, 2017Last modified by JohnStreeter Employee on Nov 15, 2017
Version 2Show Document
  • View in full screen mode

Incident: Agile Central customers experienced extremely slow-loading pages, rendering the application mostly unusable.  

Root Cause: On November 8, 2017 a large number of requests degraded in performance between 9:38AM MDT and 10:03AM MDT. This occurred due to a misconfiguration in a load balancer that served requests to our authentication service.  When maintenance was performed on nodes running our Platform as a Service and a pod of the authentication service was destroyed, the remaining authentication service pods did not pick up the remaining production traffic due to the misconfiguration.  A redeploy of the authentication service resolved the issue at the time and the incorrect configuration in the load balancer has also since been resolved.

Outcomes