NMS_Colorado

Maintaining full robot monitoring during DR failover

Discussion created by NMS_Colorado on Jul 20, 2012
Latest reply on Jul 27, 2012 by 1_keithk

We are starting to look at options to maintain full robot monitoring of customer servers upon a VMware SRM failover to a different site.  The replicated VMs on the DR side will have their same internal IP and host name but the NAT will be different upon failover to the DR site so it seems robot.cfg will need to be updated as part of the process.  Note that there is not a dedicated customer hub that will be part of the DR environment - customer robots are monitored from shared hubs which use NATs to reach the robots (no hubs or NMS infrastructure are moving in this scenario...just customer robots).

One thing we've considered is developing a mechanism where we could have robot.cfg dynamically created with the correct robotname and correct robotip_alias upon boot based upon which site is coming online.  So on the DR side the robot name would be made unique by appending "-DR" and the correct NAT would also be entered.

If both sites are up at same time (such as during testing) this should help keep the QoS data and alarms clearly differentiated.  Since the internal ip of the server would be the same while the robot name and NAT would be different are there any issues we would need to watch out for over time with QoS or alarms?

Does anyone have other ideas on best (simplest, most maintainable, most reliable, etc) way to handle this type of scenario?  Has anyone had success or failure attempting anything like this?  Is there any value in exploring passive robots to help accomplish the end goal (haven't had a chance to play with passive robots yet)?




Outcomes