I am working with our AppDev team to implement CA APM monitoring in our third party cloud environments. Our solution involves scripted creation/destruction of VMs to dynamically satisfy website demand. Each VM is assigned a unique server name and is monitored by an EPAgent, and the Docker containers running in each VM are also assigned a unique name and are monitored by a Java agent. Both naming conventions essentially result in names that will be used once and when destroyed will most likely never be used again. While our live metric counts remain comparable to live metric counts on our internal static server environment, the historical metric counts increase significantly with each Docker container restart due to the one time use name approach. For example, we perform a code deploy, destroy 100 old-code-containers that generate ~10k metrics each, spawn 100 new-code-containers generating the same ~10k metrics each, and the result is we see the historical metric count spike by ~1 million metrics. Given the number of containers and metrics per container it would take a small number of code deployments to reach a historical metric clamp state.
Is anyone else seeing this issue in their APM monitored Docker environments? Is your Docker automation recycling container names to prevent this? Is it possible to force the Java agent to use a specified host name rather than the Docker container name when reporting metrics? Do you just live with the issue and perform regular metric pruning?