Many times, when I interact with our clients in the API Management space, I hear about requirements for analytics and metrics, often in the same sentence. Though those things are often based on similar data items, the intent is different, but the data capture is the common item. By proceeding to add a great deal of data capture, it is far too easy to lose the value. Without context, a given captured data item has no real way to be understood. Without a unit, without a minimum, maximum, expected or reference value, it’s a number that you’ve committed to go through all the work of capturing and storing, but have no way to put it to use.
Context defines relationships: Humans don’t really process raw data directly, we process data in relationships – a CPU usage value in terms of “Megahertz”, like how many hypervisor systems report doesn’t really help much if you don’t know what was expected, what was allocated, or what the load presented was. Similarly, a data counter that changes by one per second could just as easily be the current time as it would be a request counter. Context changes all. This ends up being why many dashboards plot multiple data values together in multiple scales - to allow the viewer to derive some kind of context.
That kind of reference frame for the relationships provide an additional tier of context that can be crucial. As a very concrete example, in a product we used to encounter fairly often, the thread count exceeding a well-known threshold was a primary indicator of an impending crash. When we encountered those products, we sometimes had hard requirements to have a thread count as a crucial item of data capture – but if the product in question doesn’t crash with high thread counts, then the thread count data is not particularly crucial. Even capturing and storing that thread count is effort that might be better spent elsewhere.
Successful contextualizing data – in the API space especially – includes some obvious and not so obvious information. Cleary the expressed URL of the request matters – but without the potentially wildcard-based URL of the listening service, or some other clear way to define which service was in use, you would have to derive which service was in use by having a-priori knowledge in the visualization tools, so capturing both means that even when you change available APIs, the data from the capture is still valid years later.
Similarly, IP addresses of requester and server seem like they would matter a lot, but in the extremely common case with multiple layers of load balancers on the front end, and load balancers and containers on the back end, the IP addresses are less clearly definitive. It would be useful to know, for instance, that during the data capture period, last month the back end called was one of the “east coast production nodes in cloud PaaS A”, instead of some address that is no longer in use in your system due to scaling. The IP addresses then are nearly useless.
The conclusion I draw is that designing data capture for metrics and analytics requires clear and intentional focus on context. Context turns “1000” into “1000 current in-progress requests compared to the day time average of 2000 and maximum of 40,000, implying the system is performing well”.