In APM 10.5 does the MOM copy the loadbalancing.xml to the collectors? I know it did that in v 9.7 but am not sure about 10.5.1.
if I update the loadbalancing.xml on the MOM should I expect to see the updates on the collector's loadbalancing.xml ?
I only had one rule as below. After re-testing based on inputs from Sergio and Lynn I was able to see it match correctly.
<loadbalancing xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="loadbalancing0.1.xsd">
<agent-collector name="agents to core collectors only"> <agent-specifier>.*\|Jetty.*\|.*</agent-specifier> <exclude> <collector host="jp110" port="5001"/> <collector host="jp111" port="5001"/> </exclude> </agent-collector>
Dear CA APM Customers and Partners:
Samira welcomes your contributions to these questions. Can you help?
CA Support and others may help if no answer is provided within a few hours
As far as I am aware the MOM has only ever sent the updated rules from the loadbalancing.xml to the Collectors every 10 minutes (default Load Balancer interval on MOM) rather than the actual file itself. That should be the same in 10.5 per various links:
Configure loadbalancing.xml for Allowed and Disallowed Agents by Enterprise Manager - CA Application Performance Managem…
Agent - Enterprise Manager Network Topology Overview - CA Application Performance Management - 10.5 - CA Technologies Do…
I will run some tests to confirm.
Hope that helps
Hi Lynn, Samira,This is related to the Loadbalancing / Agent Controllability feature we added in 9.1, the Agent controllability configuration is the "in memory data structure" of the agent controllability directive information. It holds the information about agent and their allowed/disallowed collectors. This configuraiton information is populated by reading loadbalancing.xml at MOM and it is transferred to Collectors using Isegard communication channel to maximize consistency of this information across collectors in the cluster but this information is kept in memory only.
The collectors maintain this information to use it on agent and to transfer agent related Collector information to agent in case of agent restart or MOM not being available.
Lynn - suggest the doc team to refer to DDS_AgentControlability doc
Based on the information you provided, if an agent is already connected to a collector and there are new loadbalancing rules now applicable to that agent/collector then we will need to restart either the agent/collector or maybe the MOM (depending on the scenario) for the new rules to be applied. Any new incoming agents or during rebalancing we may not need the restarts to happen. Would be a correct understanding?
No, that is not needed. The LB info in collectors and agents will refresh upon
a.Reconnection with MOM orb.When lb.xml changes in MOM
LB xml data is sent to collectors within a minute but the agents are transferred/balanced according to the new configuration during the next load rebalancing.
Once again, the LB xml data is only in Collector's memory and NOT persisted to disk and hence NOT preserved during restarts. On restart, collector will download the configuration from MOM when it reconnects. If MOM goes down, collector will continue to use the last downloaded configuration until it reconnects with MOM.
I hope this answer your question,
You definitely answered my initial question. In my case, since there aren't many agents on boarded into this environment yet, there isn't enough load for the MOM to rebalance, so agents are not moved around ( I only have like 12 agents so far and 6 collectors) . Because of that the agents continue to connect to the same collector that they are on,when MOM send down the updates to the loadbalancing.xml, even though those updates contain an exclude tag that invalidates the current agent-collector connections. The allowed collectors list continues to show the excluded collectors, unless I restart that excluded collector/agent to force the agents to find another collector via the MOM, when the rules are applied.
In summary, when rebalancing does not translate into any agents being transferred across collectors, we end up with an invalid state where the agents continue to connect to an excluded collector unless a restart happens that breaks that communication.
I just tried to recreate the scenario and stopped all collectors etc and started from step 1 with debug level logs. I can see now where on rebalance interval it did pick up the correct list and transferred the agents over to the appropriate collectors. Thank you all for taking the time and having the patience to explain.
So far I found 2 sections that need to be corrected and I have created a Defect for the Tech Info team to address:
Troubleshoot Agent - Enterprise Manager Network Topology - CA Application Performance Management - 10.5 - CA Technologie…
How Agent Connection Information Propagates Across Introscope - CA Application Performance Management - 10.5 - CA Techno…
Thank you Lynn, Sergio and Hal. Thank you for taking the time to provide the detailed explanation.This topic is an interesting read. This is the first time I am configuring loadbalancing for keeping certain agents tied to a specific collector. I am still trying to co relate what I understand from all the information you have provided and what I am seeing in my environment.
So I have 1 MOM and 6 collectors. Collector 5 and 6 are new and are only meant to have agents from a specific data center called DC1 to connect to them. We don't have any new agents in that DC1 yet, so I added a agent collector tag to ensure agents in other DC's don't connect to those collectors as below:
<agent-collector name="rsa -core collectors"> <agent-specifier>.*dpweb.*\|.*\|.*</agent-specifier> <exclude> <collector host="collector5" port="5001"/> <collector host="collector6" port="5001"/> </exclude> </agent-collector>
The plan is when we have the new agents in DC1, I will add one more agent collector tag with an include and latched property as below:
<agent-collector name="agents -dc1 collectors"> <agent-specifier>.*newagent.*\|.*\|.*</agent-specifier> <include> <collector host="collector5" port="5001" latched="true"/> <collector host="collector6" port="5001"/> </include> </agent-collector>
However what I am seeing is that the dpweb.* agents ended up connecting to collector5 in the exclude list. Probably due to the sequence of the steps I followed. I added the new collector to the cluster, after which the dpweb agent got rebalanced to the collector5 while I was updating the loadbalancing.xml.
For the new loadbalancing rules to apply I had to bring the collector5 down so the agents have to reconnect and the MOM will then honor the exclude rules or I could have restarted the agent so that they reconnect to the MOM to have the exclude rules applied.
What I was expecting was, once I added the new load balancing rules, the dpweb agents should get moved over to the other collector since the collector5 was now excluded for those agents.
When I check the collector5 log it still shows all collectors in the list passed to the agent. Hence my question, if an agent is already connected to a collector and now that collector gets a new update for loadbalancing.xml from MOM, then shouldn't it read the update and apply the new rules to move the agent over to the appropriate collector ( irrespective of the loadbalancing threshold, weight etc taken into consideration since the collector is excluded completely). Are the loadbalancing rules only honored on intial connect and for rebalancing?
I apologize for the long explanation. I can open a support ticket on it, but am not exactly sure if support will be able to answer this easily. Thanks again for taking the time to help understand and answer my question. I really appreciate it.
One thing to remember about loadbalancing.xml, it is top -> down design. That is the very first match the agent hits, it will use that rule. Or possibly it could be matching on something below if its not hitting anything prior.
Can you upload your loadbalancing.xml file to here? It would help to see the whole picture.
Retrieving data ...