Back again.
Does anyone have a solution to how to detect and alert on an agent that is no longer reporting within APM?
Created two alerts on agent connection status
Test 1 ConnectionStatus 15 sec resolution, Greater Than, Whenever Severity Changes, combinations all, threshold 1, periods 8/8 with caution set to 1, 4/4.
Test 2 Connection Status 15 sec resolution, Not Equal To, Whenever Severity Changes, combinations all, threshold 1, periods 8/8 with caution set to 1, 4/4.
ADS - Cron, 90/14/14/4/8/?/2016 so 2:14 pm on 8/4 for 90 minutes
1. set unmount to 10 minutes
2. restart enterprise managers
3. Stopped the epagent to insure that a alert would generated
4. Started the agent - agent changed collectors from collector 1 to collector 2
3. set ADS for 90 minutes
4. waited till ADS was active
5. stop the epagent on target server
6. waited till ADS ended
7. The agent jumped between the two collectors during preparation of the test and on the first collector the agent was unmounted but after more than 12 hours the agent has not unmounted from the second collector. On the second collector the agent is grayed out with no metrics, namely the agent connection status.
I did not receive an alert from either agent connection status alerts
This was my best guess to have APM be aware of an agent being down/stopped/unreachable after an ADS.
We have 577 agents and 734.8k metrics with a MOM and seven collectors. We do not see any performance issues.
We have tried to build java script calculators to scan the agent connection status but with so many agents the java script calculator will stop and not report on all of the agents.
APM version 10.0
Agent environment performance agent version 10.0.0.12