Change the default handling of SNMP counter rollovers in CAPC back to the way it was pre-2.5.0

Idea created by BobM233 on Aug 27, 2015
    Under review
    Score4
    • agrta02
    • michaeleen.kelzenberg
    • joseph.webber
    • dxmoy

    gaps.bmp

     

    Shortly after we upgraded CAPC from version 2.4.1 to 2.5.0 we started to see a lot of data gaps in our network data, specifically for Utilization In/Out.  At first I thought it was a polling problem but after opening a support case and digging deeper we found out CAPC changed the default behavior for how it handles counter rollovers in 2.5.0.   In version 2.4.1 and earlier if a counter value was less then the previous value CAPC would just assume the counter rolled over and do the proper math to figure out the correct delta value.  Now in 2.5.0 it apparently assumes the counter was reset, waits 2 more polling cycles before it can calculate a delta again, and leaves you with a 10 minute data gap.

     

    CAPC will use the low speed 32-bit counters (ifEntry) instead of the high speed interface 64-bit counters (ifXEntry) for any interface that has a speed of 20M or less.  So now lets do a little math to see how often the 32-bit ifInOctets or ifOutOctets counters could rollover on a 20M interface.

     

    Highest value on a 32-bit Octet counter = 4,294,967,295

    In bits that's 8 * 4,294,967,295 = 34,359,738,360

     

    If your 20M interface is running at 100% your counter will rollover in 34,359,738,360 / 20,000,000 = 1717 seconds

     

    1717 / 60 = 28.6 minutes

     

    Who thinks a 10 minute data gap every 29 minutes is a good idea?

     

    Now I'll admit that's an extreme example, but even if our 20M interface was running at 50% there would be a 10 minute data gap every hour.  Even on our lower speed interfaces in the 1.5M-6M range we were seeing far too many gaps in the utilization charts.

     

    The good news is the fix is easy to put it back to the way it worked in 2.4.1.  The bad news is CA omitted this change in the 2.5.0 Release Notes, so we had to find out about it the hard way.

     

    Here's how to change it back -

    On your Data Collector create a file called com.ca.im.dm.snmp.collector.SnmpCollector.cfg in the  <Data_Collector_installation_ directory>/apache-karaf-2.3.0/etc  directory.

    Add the following line to the file:

    showGapsOnCounterRollover=false

     

     

    I'm told by support that the reason for the change was to avoid the large data spikes that can occur when a counter resets.  I would argue that counter rollovers are going to occur a lot more often than counter resets.  I also don't think customers should have to choose between data spikes or data gaps.  I keep hearing how CAPC is supposed to be a "carrier class" product.  I would hope that a carrier class SNMP collector would be able to handle both counter rollovers and counter resets in its normalization calculations.