We have 3 enterprise managers with the following "Metrics with Data":
What should I do for CA APM to handle this workload?
This is a nice indication of slow performance. I think the first one is MOM and rest two are collectors metrics. As per CA recommendations, metrics with data should be below 300 K otherwise your smartstor is running with huge data which in turn affects your collector performance.
1) First do a smartstor cleanup immediately to remove the unwanted metrics. Usually SQL, Sockets, JMX metrics are of no use. Please remove that in first place. We have a script which does the cleanup for us under /tools folder of your installation directory. But you will have to follow some steps to do the cleanup using that script. The attached document created by Sergio will help you to do cleanup. Also below wiki link will give a detailed idea on how smarstor tools work.
Configure and Manage SmartStor Data - CA Application Performance Management - 10.0 - CA Wiki
While running test_regex command in smartstor tools, you will get a better understanding of metrics count in historical store. Now decide which one you want to remove which are not required by application teams.
2) Secondly, check for agents for metric clamp hit. If you have 100's of agents, an easiest way to find is creating metric grouping and pulling a report(or set an alert which clamp greater than 0). Once you find which agents are sending too much data, click on agent node and check the individual metrics count pie chart. This is a voluminous job if you have too many agents sending lot of data. Once you see the individual metric count details, go back to agent configuration/logs to check how agent is instrumenting the code. Here you will find the probes which are capturing too much of data. Start fine tuning the pbd's.
3) If you still want to cut down the unnecessary metrics, run Agent Summary report(I hope this should be available under supportability module) to understand the live metrics count for each agent. I always suggest an agent should not send more than 5000 metrics. So look for agents which are sending more than 5000 agents. Once you have the agent list, go back to agent node in investigator and check the individual metrics count again. Follow the same steps above to fine tune the agents once you find which metrics are occupying too much of data.
On the other hand, if you still need these metrics and cannot be removed, please increase your smartstor storage or add a new collector. I am enclosing the document created by Sergio
One of the agents send 40,000 metrics. This is the bottleneck.
But how will we re-configure CA APM to handle this workload?
There is a monolithic application that has thousands os stored procedures.
This app alone creates such a big workload.
Please look at agent configuration files. Especially, look at PBL files and PBD files. I hope this is instrumented by sql.pbd file. Please remove sql.pbd from pbl files if you don't need SQL and stored procedures metrics. Else rewrite traces in sql.pbd to stop collecting/collect only required stored procedures.
I cannot remove sql.pbd file because all the stored procedures are in use.
I need to collect performance data from those SPs.
I think we are at the limits of the CA APM product.
I could not get any solution from CA.
In such case, you need to increase smartstor storage or add a new collector to handle that agent? What is your current smartstor storage size now?
Can you also think about removing other unwanted metrics from historical store?
SmartStore is 35 GB.
Please use disk space calculator to review your smartstor DB size and set accordingly.
If there are more agents, please consider adding another collector.
>I think we are at the limits of the CA APM product.
>I could not get any solution from CA.
There is only so much you can resolve in a community posting. These are meant for questions not requiring detailed analysis. Having someone onsite such as SWAT/Professional Services or working with the Support organization is the fastest way to resolve the various multiple complex issues that you seem to be having.
Retrieving data ...