AnsweredAssumed Answered

Production setup for Audit Sinks, Internal Audit Sink Policy, and Log Sinks

Question asked by gianuy_ma on Nov 30, 2016
Hey guys,
Our CA Gateway production 2-node cluster has a problem on Audit Sinks and the Internal Audit Database. We’re using Audit Details a lot to make sure we could debug and troubleshoot problems. That includes putting audit messages that are for debugging such as:
  • Raw requests
  • Raw responses
  • Message Transformation Results of an entire message
  • Printing out variables
  • Putting out messages that a folder assertion has been executed.

 

Audit Details Examples

 

We also have modified our Internal Audit Sink Policy to send us alerts via email. This provided us more convenience if services are failing throughout and a central policy to process them.

 

Internal Audit Sink Policy Customizations

 

Although our traffic isn’t that significant for that cluster (we’re only processing 2,000,000+ requests per day, that translates to 28 requests/sec on average, 14/sec for each node), we managed to fill up our Internal Audit Database and started failing requests. We tried doing a cron job to clean the internal audit database up and expanded the logical volume of the Internal database but that just degraded the performance of our gateway cluster and we were back on the same problem when traffic goes a little higher
I know that the way to go is going thru the log sink (or Audit thru JMS or even log files) but we still want a central place to capture all audit events that is being emitted by the Standard (and even custom) Gateway Assertions and Policies. So in summary our requirements are:
  • Send all audit messages (that are emitted by assertions and policies) to a central location that we can effectively search and troubleshoot if there are issues. 
  • If also possible, effectively send our debug information along with the rest of the audit messages. 
  • If not we’re willing to put this on a log sink, although that would be an extra step for us to look out for and probably slow us down in finding the issue.
  • Basically no matter what implementation, the Internal Audit Database should not fill up the allocated logical space which causes the other functions of the Gateway to fail. Which effectively is a DoS.
What is the recommended setup with that requirement? I’m sure other shops with bigger CA API Gateway clusters and huge amount of transactions would have the same issues.
Thanks,
Gian

Outcomes