DavidLeDeaux

Discovery Correlation

Blog Post created by DavidLeDeaux Employee on Jan 23, 2018

Correlation is the process by which we receive information about a device from two different sources and need to make a determination as to whether these are the same devices or not. In UIM it is a very common practice to learn about a single device from multiple sources. For example, a vmware probe may be running on one robot and discover a virtual machine, while a net_connect probe may be running on another robot that might be pinging that same virtual machine. It is discovery_server's job to examine information provided by each of these probes and decide if these are the same machines or not.

 

Correlation Rule LogicIt's important to understand that discovery correlation happens in an order of priority and when a correlation is made, we stop trying to correlate on weaker correlation methods. Additionally, as we move down this list of correlation rules, our confidence in similarity begins to drop. Therefore there are decorrelation rules that may cause two devices that appear to look the same to be separated into two distinct devices. Due to the complexity of this topic, it's difficult to go into detail, but the general rule of thumb is that if you have high priority mismatched information, the device might be decorrelated. For example, if two devices would match on IP and origin but they are presenting two different MAC addresses, we would unmatch those devices.

Because of the changes to how correlation is processed in 8.51, this article will be split into a discussion on how each of those correlation algorithms work.  Some of the information is overlapping.


UIM 8.51 and higher

Beginning with UIM 8.51, the correlation rules are highly configurable. These are the rules that are provided out of the box.

  1. Name, MAC addresses and IP matches
  2. Name and MAC addresses matches
  3. Name and IP matches
  4. MAC address and IP matches
  5. MAC addresses match
  6. Fully qualified domain names match
  7. Name and origin matches
  8. IP and origin matches

 

Method descriptions

In 8.51 and higher, we do away with the concept of strong and weak correlators and we introduce more of a concept of confidence. As we move down the list of correlation rules, our confidence in matching begins to drop and less matches are likely to occur.

Because the rule names are self explanatory and many of the attributes are repeated through different rules, it's better to explain how we derive the information for each attribute.

 

With UIM 8.51 we introduced the concept of target and source attributes to further increase the granularity of the configuration.  A target would be a device that already exists in the database that we would be comparing against whereas source is the information that is coming in from a probe.

 

Name

The name attribute has been largely expanded from the previous discovery definition of "simple name".  Name now includes a large list of qualifiers that include virtual machine names, robot names and probe profile labels.

 

MAC address

MAC address is the "unique" value that is assigned to each network card. In some cases this value is not actually unique, particularly in the case of virtual machines. Because of this, we exclude certain vendors' MAC addresses from correlating so that distinct VMs do not begin to correlate as a single device. MAC addresses can be determined a number of ways, but one way that discovery_agent determines them is to check it's ARP cache. If the discovery_agent exists on the same subnet as the device it is scanning, it will be able to find this information.

 

MAC addresses matching will include the "primary" MAC address as well as any discovered secondary MAC addresses.  We will match against entries in the database that are also considered primary and secondary.  So a new incoming secondary MAC address can be matched against an existing primary MAC address in the database.

 

 

IP address

IP addresses include the "primary" IP address as well as any discovered secondary IP addresses.  IP addresses are somewhat specific in how they match.  We only consider incoming primary IP addresses and the discovery_agent's incoming target address for consideration with existing primary and secondary IP addresses.

 

Fully qualified domain name (FQDN)

This definition is a subset of the "name" category.  The name category is very broad in that it does not require a specific format to be present.  The FQDN category has a special "type" flag that signals the discovery_server to parse the name values against an FQDN pattern matcher.  So if a probe is presenting "server1" as a name it will not be considered a FQDN, but if it is presenting "server1.abc.com", it will meet this definition.

 

Origin

Each robot reports to a hub.  Each hub advertises an origin.  This origin is the name of the hub by default, but can be overridden in the hub configuration.  When a robot starts up, it checks in with the hub and grabs the hub's origin to tag outgoing messages with.  (The origin can be overridden on the robot level as well).

 

The origin is important because it gives us one last failsafe in determining if two devices are the same or not.  In a multitenant environment, an MSP for example, it's possible that the MSP could be monitoring two customer networks that have devices with the same IPs.

 

Because it is possible for "Customer A" to have a router with an IP of 192.168.0.1 and "Customer B" to have a router with an IP of 192.168.0.1, we can't assume that all 192.168.0.1 devices are the same.  Let's assume that each customer has a hub named Customer_A and Customer_B respectively.  When discovery_server comes down to the last couple of rules, the confidence is very low and we must consider origin.  In this scenario discovery_server would be comparing "192.168.0.1+Customer_A" and "192.168.0.1+Customer_B" and see that it is not a match and correctly represent those devices as distinct.

 

Because of this behavior, we sometimes see duplicate devices when we don't have much information to go on other than IP address.  Consider the following scenario where a non-MSP network is being monitored.  For redundancy, two net_connect probes are set up to monitor the same device from two different parts of the network; perhaps a WAN router that is being monitored from both sides of the WAN with two robots reporting to Hub_A and Hub_B.

 

When discovery_server sees the incoming topology messages, it sees "192.168.0.1+Hub_A" and "192.168.0.1+Hub_B".  In this scenario, discovery_server doesn't know that these aren't two different customer devices so it errs on the side of caution and represents them as two different devices.

 

Additional Reading

Device Correlation Configuration

Device Correlation Troubleshooting


UIM 8.5 and lower

These rules are largely hardcoded and can not be modified beyond the ability to add exclusions and deactivate the category.

  1. Device UUIDs match
  2. Virtual machine IDs match
  3. Robot device IDs match
  4. MAC addresses match
  5. Fully qualified domain names match
  6. Simple name and origin matches
  7. IP address and origin matches

 

Method Descriptions

There are two categories of correlators in UIM 8.5 and earlier; strong and weak. Strong correlators are categories that we have a high degree of confidence in and require only a single attribute to match.

 

Strong Correlators

Device UUID

Device UUID corresponds to the cs_key that is found in the CM_DEVICE_ATTRIBUTE table. You'll rarely encounter this attribute, however it can be used for the cm_data_import probe when defining an XML import template.

Virtual Machine ID is a reference to a unique ID that is assigned to a virtual machine. This ID is useful for tracking virtual machines as they might vMotion from one host to another or if IPs change. Discovery_agent is also able to detect this value.

 

Robot Device ID

Robot device ID is the hash value of the robot which can be found in the niscache folder. This is a generated value based on IP address + robot name. If either of these values change (even case change from uppercase to lowercase), a new robot ID is created.

MAC Address

MAC address is the "unique" value that is assigned to each network card. In some cases this value is not actually unique, particularly in the case of virtual machines. Because of this, we exclude certain vendors' MAC addresses from correlating so that distinct VMs do not begin to correlate as a single device. MAC addresses can be determined a number of ways, but one way that discovery_agent determines them is to check it's ARP cache. If the discovery_agent exists on the same subnet as the device it is scanning, it will be able to find this information.

 

Fully Qualified Domain Name

Fully qualified domain names (FQDN) are the last strong correlator.  Very few probes will present an FQDN.  FQDN can be found via reverse DNS lookups as with a Discovery Wizard (discovery_agent).  Some probes will present an FQDN depending on how they are configured and other times it is found via information we know about the system. For example, the VMWare probe will report the FQDN if the virtual machine has VM Tools installed on it.

 

Weak Correlators

Methods 6 and 7 above are considered "weak correlators" due to their inclusion of origins.

 

Each robot reports to a hub.  Each hub advertises an origin.  This origin is the name of the hub by default, but can be overridden in the hub configuration.  When a robot starts up, it checks in with the hub and grabs the hub's origin to tag outgoing messages with.  (The origin can be overridden on the robot level as well).

 

The origin is important because it gives us one last failsafe in determining if two devices are the same or not.  In a multitenant environment, an MSP for example, it's possible that the MSP could be monitoring two customer networks that have devices with the same IPs.

 

Because it is possible for "Customer A" to have a router with an IP of 192.168.0.1 and "Customer B" to have a router with an IP of 192.168.0.1, we can't assume that all 192.168.0.1 devices are the same.  Let's assume that each customer has a hub named Customer_A and Customer_B respectively.  When discovery_server comes down to the last couple of rules, the confidence is very low and we must consider origin.  In this scenario discovery_server would be comparing "192.168.0.1+Customer_A" and "192.168.0.1+Customer_B" and see that it is not a match and correctly represent those devices as distinct.

 

Because of this behavior, we sometimes see duplicate devices when we don't have much information to go on other than IP address.  Consider the following scenario where a non-MSP network is being monitored.  For redundancy, two net_connect probes are set up to monitor the same device from two different parts of the network; perhaps a WAN router that is being monitored from both sides of the WAN with two robots reporting to Hub_A and Hub_B.

 

When discovery_server sees the incoming topology messages, it sees "192.168.0.1+Hub_A" and "192.168.0.1+Hub_B".  In this scenario, discovery_server doesn't know that these aren't two different customer devices so it errs on the side of caution and represents them as two different devices.

 

Simple Name

A simple name is classified as a name that isn't a FQDN.  In UIM 8.5 and earlier, discovery_server regarded very few attributes as a name;  for example, a virtual machine name as defined in the vSphere client was not considered a name.  So this category offered very little value in correlation since it was rarely matched against.  This was considered weak due to the fact that it was very easy for devices to have the same name across multiple customer environments in an MSP scenario. 

Outcomes