chaje22

Lesson Learned from Spectre Meltdown Response

Blog Post created by chaje22 Employee on Mar 5, 2018

Spectre / Meltdown vulnerability hit the industry and consumers equally hard when it was announced by Google Project Zero team's blog post on Jan 3, 2018, Wed around 2:30 PM PST. It essentially affected all computing devices and platforms as pretty much all modern processors were affected. CA API Gateway appliances were not immune and our Gateway and its sister teams knew right away that this was a "big deal".

 

Importance of establishing early communication with customers

 

Whenever a vulnerability of such magnitude comes out, customers have the right to demand prompt patching as well as early communication. Even when your product is not affected, you still owe to your customers for a statement explaining why your product is not affected. Organizations that communicate information promptly will increase their customers’ confidence in their ability to effectively deal with security issues. Organizations without a swift and clear communication plan risk confusion on the part of their customers, as they reach out to the organization for information and remediation plans. So good communication is a win-win for both sides.

 

Knowing the above, our organization issued proactive notifications to our customers on the next day after gathering all facts. We basically let the customers know that our product is affected and what our game plan was for patching. Even letting customer know when the patch will come helps our customers (i.e. IT admins) schedule their patching cycle and proper resource planning. We also published a Knowledge Base article to allow our customers to learn in one place how Spectre and Meltdown affects other CA API Management products that are typically sold with CA API Gateway.

 

We decided to issue an expedited January Gateway Monthly Platform Patch, because the next scheduled monthly patch was 20 calendar days away. We knew Mr. Spectre Meltdown deserved its own.

 

Importance of automation and internal communication

 

Our Monthly Platform Patch preparation is mostly automated. However, you still need someone to kick it off, monitor packaging, and of course, examine QA test results. Lastly, we also need someone to published the patch on our Solutions and Patches site. So knowing that small things add up and that our customers are counting on us, we wanted to get our ducks in a row as much as possible before Red Hat publishes its patch.

 

Dev managers, support team, publication team, product management, and developers alike got together and agreed on the patch schedule we wanted to achieve. Red Hat issued the kernel update around the following Friday noon and it was time to execute. We had many unsung heroes who worked hard to prepare the expedited January Gateway Monthly Platform Patch, which became available by Friday 10 PM PST. The turn around time was 10 hours from time Red Hat released its patch. We wanted our customers to have the patch before the weekend so we were committed as a team.

 

Even though most of the process was automated, we realized that few things (more than we would have liked) required manual intervention so it was a lesson learned for us. Usually such intervention is not a big deal as we start the process at least few days away from the patch date but doing so "on-demand" made us look where we can further automate and improve.

 

In retrospect, if we didn't have good internal communication, we probably would have passed the Friday evening and few people would have worked that weekend. It was really good to see our dev management team resisting that option at all cost and it felt that much better to have prepared in advance and get people home for the weekend.

 

Customer's Perspective

 

I think it is crucial to think from the customers' perspective when a software company is responding to a security vulnerability in general. Customers just cannot have as in-depth technical knowledge of the product as the developer teams do. So naturally, customers have hard time assessing the true severity of a certain vulnerability they come across. Also, having a delayed patch time gap makes customers (i.e. IT admins) feel very vulnerable, especially when they use the product as a security control. So if your organization does not have in-depth communication plan with all stakeholders identified, it is never too late to start planning for the next big vulnerability event.

Outcomes