We are not happy with the performance of CA APM.
It's almost unusable in our environment.
I opened a support case to CA (00151073) but I received a list of things to check.
We have done those steps 8 months ago and nothing changed.
CPU usage is %1, 8 GB of free memory, no disk IO and CA APM response time is more than 3 minutes.
Why?
Hello,
I can see the case, to investigate further support would need the below information, please attach it to the case:
- Copy of the logs directory from all the EMs (Mom and collectors), they will need VERBOSE logs.
- Screenshot of the "Custom Metric Host (Virtual) | Custom Metric Process (Virtual) | Custom Metric Agent (Virtual) | Enterprise Manager | Data Store | Smartstor | Metadata | Metrics with Data” supportability metric from all Collectors.
- Screenshot of MOM > “Status console”
- Collect a series of threadump from MOM and collector to find out the root cause.
Confirm if you are using a physical or virtual environment.
I have checked quickly the logs, you can find below some observations:
From the MoM perflog, I Performance.MetricDataManager.QueryMemory is ~ 27,151,588
From the Mom log, I see:
- a huge amount of these error: "data seems to be OK - not sure what to do with append inputs",
if possible restart the MoM with a refresh new smartstor db
-[WARN] [PO:client_main Mailman 1][Manager.AsyncQueryResultStateMachine] Received tardy historical data from slow collector
This other error indicates a performance issue in the cluster (MOM and collector communication)
-[WARN] [Collector ***@5001] [Manager.Cluster] Collector clock is too far skewed from MOM. Collector clock is skewed from MOM clock by 56,085 ms. The maximum allowed skew is 3,000 ms. Please change the system clock on the collector EM.
Make sure clocks of all EMs are in synch, you must configure a NTP server.
- [Collector xxxxx@5001] [Manager.Cluster] The Introscope Enterprise Manager will continue to attempt to re-connect to the Introscope Enterprise Manager at xxxx@5001. Further failures will not be logged
- [WARN] [pool-1-thread-1] [Manager.Cluster] The Collector 10.0.6.122@5001 is responding slower than 10000ms and may be hun
The above indicates that the Mom keep disconnecting from collectors.
Check how the individual collectors are performing, check the logs, key words to search: outgoing, capacity, WARN, ERROR, reached, CancelledKeyException, java.io.IOException, outofmemory,
Also, I noticed you are using 914, if possible I suggest you to upgrade to a latest release as many issues affecting the EM, clustering, loadbalancing mechanism, etc has been fixed in latest releases. Below link to the master list
http://www.ca.com/us/support/ca-support-online/product-content/knowledgebase-articles/tec1075326.aspx?intcmp=searchresultclick&resultnum=4
I hope this helps,
Regards,
Sergio