Thank you junwah,
We don't have rsync and are not able to ssh between servers under our APM admin account so we have to depend on the server's scheduling setup.
We created adapter shell scripts (MOM-EMCtrl.sh, MOM-WVCtrl.sh, Collector-EMCtrl.sh) to contain our sleeps so we don't pause/stall the server init process. So the init will do a fork/call/independent process to call each of the adapter scripts.
Stop
00 MOM
05 Collectors
10 APM DB
So we will be giving each 5 minutes to shutdown
Start
5:00 MOM - MOM-EMCtrl.sh ; MOM-WVCtrl.sh
Sleep 35 minutes
EMCtrl.sh start
WVCtrl.sh start
5:05 Collectors - Collector-EMCtrl.sh
Sleep 15 minutes
EMCtrl.sh start
5:10 APM DB
pg_ctrl start . . .
5:20 APM DB should have started
5:30 Collectors should all have started
5:35 MOM EM and WebView started
I did look into looking into the log files for the "started" message or even port pinging 5001 on the collectors so the MOM knows to start but ran into what if a collector fails, should the MOM start?
If the MOM starts, there is a management module with alerts on the number of collectors calculator that if one didn't start, we will get an error email every 6 minutes. This message also go to the NOC so we will get a wake up call to log in and diagnose what is going on.
Then, what if the MOM does not start? We created a HP SiteScope alert on the MOM EM port 5001 and it will check every 10 minutes. If there is an error, SiteScope will check back every two minutes. After 3 total errors, an email alert is sent to our NOC and we get a wake up call.
I've got another discussion thread about the APM DB and the APM status console, how to alert when the EMs are having issues with the APM DB. Looking like I'll need to create HP SiteScope alerts for the Postgres database also.
Hopefully we won't have more than one or two collectors at the same time fail to start, but this is occurring outside of our typical high load times so if we lost 2 of 7 collectors, the APM cluster should be able to handle the load till we are able to get the two failed collectors running.
Wished that the APM would do more in terms of it's own infrastructure alerting, but I guess that the APM isn't an infrastructure monitoring system.