DX NetOps

  • 1.  CAPC STATUS FAILED

    Posted Jan 14, 2019 06:38 AM

    Hi,

     

    I have a severe issue with my CAPC i explain the whole scanerio to you maybe you have any solution to this by CA Support i am on this case from last 30 days approx. got no RCA/Solution.

     

    my CAPC health status will automatically fluctuate between failed to active randomly with on different times. and then sometimes we observe the status is good and when i open the dashboards the data is missing from last 4 hours or 10 hours.

    when i check the events there are too many status fail events at that time after which the data is missing in dashboards. when i stop and then restart the dadaemon service the data is then populated after few minutes.

    currently no observation shared by CA support for this issue.

     

    As per my observation maybe its due to connectivity issue.

    as our DA,DRs,PC are on 1 subnet and the DC is on another subnet as we need public IP to communicate with devices present far away.

     

    i am very upset as its the production environment and its not stable and we are suffering it from approx. 1.5 month... no proper justification shared by CA support on this issue.

     



  • 2.  Re: CAPC STATUS FAILED

    Broadcom Employee
    Posted Jan 15, 2019 02:16 PM

    Hello Akash,

     

    Can you share the case numbers (can share privately sending to michael.poller@broadcom.com if you wish) as it appears this is a matter already gone over with support.

     

    If that isn't correct and no support case has been opened for analysis of the problem, one should be opened as soon as possible, along with logs via the re.sh script packages, for analysis.

     

    Thanks,

    Michael



  • 3.  Re: CAPC STATUS FAILED

    Posted Jan 16, 2019 05:38 AM

    HI Micheal,

     

    i send you all the case numbers on email and you can also check this events from last 4 hours the DA is continuously fluctuating.

     

     

     

     

     

    Regards

    Akash



  • 4.  Re: CAPC STATUS FAILED
    Best Answer

    Broadcom Employee
    Posted Jan 16, 2019 04:24 PM

    It's clear from the case notes that there are capacity issues involved. The latest updates in the case about the DA sync failure, and the one about DA health degraded are the same as the cases are the same.

     

    Those should be one support case, for the same problem.

     

    It appears from the research in the cases that the systems are either:

    • Not currently resourced appropriately for the size of the items it's working with.
    • Maintaining too many inactive and/or retired items which are causing memory usage to be excessive as they are loaded into the DA memory for use when needed.

    The logging messages noted in the cases clearly show memory related usage problems on a regular basis.

     

    I suspect you may have a large number of inactive and/or retired items in the environment being loaded into memory and causing the issue. This is often seen when the DA data source is set to sync inactive items. If excessive numbers of QoS related items have been created it can cause similar symptoms.

     

    I'd recommend following the steps provided in those cases, working with support to clean up the system of old deleted and/or retired items and components. Once some of the load is decreased or resources are increased I suspect the problems will start to dissipate.

     

    If they don't, once the system is cleaned up and not showing capacity or memory problems, then whatever issue remains will be more readily found and resolved.

     

    These are VM systems according to the notes. It is also possible that the resources for the VMs are shared, not dedicated. If shared other systems sharing the resources may be causing an issue. VM resources for these systems in production environments should always be dedicated to avoid resource contention issues with other VMs sharing the resources.



  • 5.  Re: CAPC STATUS FAILED

    Posted Jan 23, 2019 06:18 AM

    Hi Micheal,

     

    yes we deleted many items shared the results on ticket but still there are many items and also filtered items present those are representing in CAPC. IS there any way to remove these we are never using QOS currently.

    Before:
    ----------------------------------------------------------------------------+---------
    {http://im.ca.com/inventory}Pollable                                       | 2438830
    {http://im.ca.com/inventory}DeviceComponent                                | 2438455
    {http://im.ca.com/inventory}DiscoveryInfo                                  | 2438172
    {http://im.ca.com/inventory}Port                                           | 1724035
    {http://im.ca.com/inventory}AlternatePort                                  |  430991
    {http://im.ca.com/inventory}Hierarchy                                      |  230774
    {http://im.ca.com/inventory}QoSQueuing                                     |   91610
    {http://im.ca.com/inventory}QoSRED                                         |   87453
    {http://im.ca.com/inventory}QoSClassMap                                    |   22141
    {http://im.ca.com/inventory}ChassisTemperatureEnvironmentalSensor          |   17957
    {http://im.ca.com/inventory}QoSContract                                    |   14825
    {http://im.ca.com/inventory}QoSPolicer                                     |   14745
    {http://im.ca.com/inventory}ChassisTemperatureEnvironmentalSensorAlternate |    9314
    {http://im.ca.com/inventory}SystemEnvSensor                                |    5227
    {http://im.ca.com/inventory}ChassisPowerSupplyEnvironmentalSensor          |    4467
    {http://im.ca.com/inventory}TempEnvSensor                                  |    3214
    {http://im.ca.com/inventory}SysEnvAmpSensor                                |    3116
    {http://im.ca.com/inventory}Memory                                         |    2452
    {http://im.ca.com/inventory}CPU                                            |    1791
    {http://im.ca.com/inventory}ChassisFanEnvironmentalSensor                  |    1778
    {http://im.ca.com/inventory}ResponsePathTest                               |    1226
    {http://im.ca.com/inventory}ResponsePathJitter                             |     821
    {http://im.ca.com/inventory}SysEnvVolSensor                                |     654
    {http://im.ca.com/inventory}SysEnvVolACSensor                              |     422
    {http://im.ca.com/inventory}ResponsePathIcmp                               |     382
    {http://im.ca.com/inventory}MetricFamilyDiscoveryHistory                   |     375
    {http://im.ca.com/inventory}ConsolidatedAndDiscoveredMetricFamilyHistory   |     375
    {http://im.ca.com/inventory}Device                                         |     375
    {http://im.ca.com/inventory}AccessibleDevice                               |     374
    {http://im.ca.com/inventory}ManageableDevice                               |     364
    {http://im.ca.com/inventory}Router                                         |     341
    {http://im.ca.com/inventory}Switch                                         |     264
    {http://im.ca.com/inventory}GenericSystem                                  |     224
    {http://im.ca.com/inventory}SwitchingEngine                                |     215
    {http://im.ca.com/inventory}AAASubscriber                                  |     194
    {http://im.ca.com/inventory}RollUp                                         |     153
    {http://im.ca.com/inventory}DiscoveryInstance                              |      88
    {http://im.ca.com/inventory}Baseline                                       |      72
    {http://im.ca.com/inventory}Partition                                      |      68
    {http://im.ca.com/inventory}DataLoader                                     |      48
    (40 rows)

    dauser=> delete FROM item WHERE item_id IN (select item_id FROM v_poll_item WHERE is_filtered = 1 AND device_item_id IN (select item_id from device) AND device_item_id IN (select item_id FROM v_item_facet WHERE facet_qname like '%DataAggregatorInfo%'));
    OUTPUT
    --------
          0
    (1 row)
    dauser=> delete FROM item WHERE item_id IN (select item_id from v_item_facet WHERE facet_qname like '%}Retired');
    OUTPUT
    ---------
    1361935
    (1 row)
    dauser=> select count(*) FROM item;
      count
    ---------
    1122693
    (1 row)

    dauser=> select facet_qname, count(*) FROM v_item_facet WHERE facet_qname like '%/inventory}%' group by 1 order by 2 desc limit 40;
                                    facet_qname                                 |  count
    ----------------------------------------------------------------------------+---------
    {http://im.ca.com/inventory}Pollable                                       | 1076895
    {http://im.ca.com/inventory}DeviceComponent                                | 1076520
    {http://im.ca.com/inventory}DiscoveryInfo                                  | 1076237
    {http://im.ca.com/inventory}AlternatePort                                  |  417112
    {http://im.ca.com/inventory}Port                                           |  390174
    {http://im.ca.com/inventory}Hierarchy                                      |  218123
    {http://im.ca.com/inventory}QoSQueuing                                     |   88058
    {http://im.ca.com/inventory}QoSRED                                         |   84215
    {http://im.ca.com/inventory}QoSClassMap                                    |   18746
    {http://im.ca.com/inventory}ChassisTemperatureEnvironmentalSensor          |   16935
    {http://im.ca.com/inventory}QoSPolicer                                     |   13808
    {http://im.ca.com/inventory}QoSContract                                    |   13296
    {http://im.ca.com/inventory}ChassisTemperatureEnvironmentalSensorAlternate |    9040
    {http://im.ca.com/inventory}SystemEnvSensor                                |    5217
    {http://im.ca.com/inventory}ChassisPowerSupplyEnvironmentalSensor          |    4451
    {http://im.ca.com/inventory}TempEnvSensor                                  |    3208
    {http://im.ca.com/inventory}SysEnvAmpSensor                                |    3104
    {http://im.ca.com/inventory}Memory                                         |    2397
    {http://im.ca.com/inventory}CPU                                            |    1744
    {http://im.ca.com/inventory}ChassisFanEnvironmentalSensor                  |    1732
    {http://im.ca.com/inventory}ResponsePathTest                               |    1225
    {http://im.ca.com/inventory}ResponsePathJitter                             |     821
    {http://im.ca.com/inventory}SysEnvVolSensor                                |     652
    {http://im.ca.com/inventory}SysEnvVolACSensor                              |     422
    {http://im.ca.com/inventory}ResponsePathIcmp                               |     381
    {http://im.ca.com/inventory}ConsolidatedAndDiscoveredMetricFamilyHistory   |     375
    {http://im.ca.com/inventory}Device                                         |     375
    {http://im.ca.com/inventory}MetricFamilyDiscoveryHistory                   |     375
    {http://im.ca.com/inventory}AccessibleDevice                               |     374
    {http://im.ca.com/inventory}ManageableDevice                               |     364
    {http://im.ca.com/inventory}Router                                         |     341
    {http://im.ca.com/inventory}Switch                                         |     264
    {http://im.ca.com/inventory}GenericSystem                                  |     224
    {http://im.ca.com/inventory}SwitchingEngine                                |     207
    {http://im.ca.com/inventory}RollUp                                         |     153
    {http://im.ca.com/inventory}AAASubscriber                                  |     149
    {http://im.ca.com/inventory}DiscoveryInstance                              |      88
    {http://im.ca.com/inventory}Baseline                                       |      72
    {http://im.ca.com/inventory}Partition                                      |      68
    {http://im.ca.com/inventory}DataLoader                                     |      48
    (40 rows)

    [root@Data-Repository1 scripts]# ./cleanupDeletedItems.sh -U dauser -w --------
    Count of table entries where item_id NOT in item table anymore:
    Devices: 158
    Poll Items: 1367085
    Item Facets: 8219381
    Item Relationships: 2785
    Visible Tenants: 2398
    Removing stale entries from database...
    Deleted 158 device entries
    Deleted 1367085 poll_item entries
    Deleted 8219381 item_facet entries
    Deleted 2785 item_relationship entries
    Deleted 2398 visible_tenant entries
    Running DB purge to physically remove deleted data...
    Completed purge.
    elapsed time:  0 minutes 7 seconds


  • 6.  Re: CAPC STATUS FAILED

    Broadcom Employee
    Posted Jan 30, 2019 07:37 PM

    Did you break the connection between any QoS Metric Families, and their associations to devices via Monitor Profile to Collection associations?