DX NetOps

  • 1.  DA stopped after running 6-10 hours everyday

    Posted Mar 20, 2015 01:21 AM

    Hi all guy,

     

    I do not know why Data aggregator stopped. I always start DA service every morning.

     

    Our environment have 220 devices.

    CAPM : 2.3.3

    DA memory: 16 GB (We used 14 GB)  2GB for machine.

     

    Could you please share experience for DA stopped? What's cause that DA stopped? How to solve for this issue?

     

    Thank you,

     

    Best regards,

    Borworn 



  • 2.  Re: DA stopped after running 6-10 hours everyday

    Broadcom Employee
    Posted Mar 27, 2015 12:30 PM

    Hi Borworn:

     

    You should review the /opt/IMDataAggregator/apache-karaf-2.3.0/shutdown.log on the DA to gain insight as to why the DA might have stopped.  Assuming the DR is running maybe there are some issues communicating with the DR?  Do the machines have the appropriate resources per our sizing documentation?

     

    HTH,

    Joe



  • 3.  Re: DA stopped after running 6-10 hours everyday

    Posted Apr 18, 2015 02:32 PM

    Hi Joe,

     

    I am so Sorry for slow answer, These is message on shutdown.log

     

     

    INFO  | -toHost:DATA-REP | 2015-04-17 07:58:12,024 | shutdown | ase.heartbeat.DBStateManagerImpl  756 | ase.heartbeat.DBStateManagerImpl  756 | ommon.core.services.impl |

          | DB state for host DATA-REP changing from INIT to OK

    INFO  | xtenderThread-81 | 2015-04-17 07:58:12,034 | shutdown | ces.shutdown.ShutdownManagerImpl   61 | ces.shutdown.ShutdownManagerImpl   61 | ommon.core.services.impl |

          | Shutdown Manager initialized with: [DATA-REP=OK]

    Current State:Available Hosts: [DATA-REP]

    Down Hosts: []

    INFO  | xtenderThread-81 | 2015-04-17 07:58:12,042 | shutdown | tTolerantDBConnectionManagerImpl  395 | tTolerantDBConnectionManagerImpl  395 | ommon.core.services.impl |

          | The primary host for database transactions is now set to DATA-REP

    ERROR | t Monitor Thread | 2015-04-17 08:43:10,844 | shutdown | ase.heartbeat.DBStateManagerImpl  411 | ase.heartbeat.DBStateManagerImpl  411 | ommon.core.services.impl |

          | DB heartbeat to host DATA-REP execeeded max non-success time of 300000

    WARN  | t Monitor Thread | 2015-04-17 08:43:10,844 | shutdown | ase.heartbeat.DBStateManagerImpl  752 | ase.heartbeat.DBStateManagerImpl  752 | ommon.core.services.impl |

          | DB state for host DATA-REP changing from OK to DOWN

    ERROR | Manager-thread-2 | 2015-04-17 08:43:10,849 | shutdown | tTolerantDBConnectionManagerImpl  240 | tTolerantDBConnectionManagerImpl  240 | ommon.core.services.impl |

          | No DB host name available.

    ERROR | Manager-thread-5 | 2015-04-17 08:43:10,849 | shutdown | ces.shutdown.ShutdownManagerImpl  150 | ces.shutdown.ShutdownManagerImpl  150 | ommon.core.services.impl |

          | Shutting down the data aggregator.It was detected that no data repository nodes were contactable. The uncontactable hosts are:[DATA-REP]

    INFO  | Manager-thread-2 | 2015-04-17 08:43:10,849 | shutdown | tTolerantDBConnectionManagerImpl  395 | tTolerantDBConnectionManagerImpl  395 | ommon.core.services.impl |

          | The primary host for database transactions is now set to null

    ERROR | Manager-thread-2 | 2015-04-17 08:43:10,850 | shutdown | tTolerantDBConnectionManagerImpl  198 | tTolerantDBConnectionManagerImpl  198 | ommon.core.services.impl |

          | The primary data repository host 'DATA-REP' is no longer available, and there are no available secondary hosts. Current Host Status: {DATA-REP=DOWN}

    WARN  | -toHost:DATA-REP | 2015-04-17 08:48:23,198 | shutdown | ase.heartbeat.DBStateManagerImpl  813 | ase.heartbeat.DBStateManagerImpl  813 | ommon.core.services.impl |

          | DB heartbeat to host DATA-REP successful, but the response time of 0:04:16.399 was longer then a threshold of 20000 ms.

    INFO  | -toHost:DATA-REP | 2015-04-17 08:48:23,199 | shutdown | ase.heartbeat.DBStateManagerImpl  756 | ase.heartbeat.DBStateManagerImpl  756 | ommon.core.services.impl |

          | DB state for host DATA-REP changing from INIT to DEGRADED

    INFO  | xtenderThread-44 | 2015-04-17 08:48:23,210 | shutdown | ces.shutdown.ShutdownManagerImpl   61 | ces.shutdown.ShutdownManagerImpl   61 | ommon.core.services.impl |

          | Shutdown Manager initialized with: [DATA-REP=DEGRADED]

    Current State:Available Hosts: [DATA-REP]

    Down Hosts: []

    INFO  | xtenderThread-44 | 2015-04-17 08:48:23,222 | shutdown | tTolerantDBConnectionManagerImpl  395 | tTolerantDBConnectionManagerImpl  395 | ommon.core.services.impl |

          | The primary host for database transactions is now set to DATA-REP

    INFO  | -toHost:DATA-REP | 2015-04-17 08:48:33,273 | shutdown | ase.heartbeat.DBStateManagerImpl  756 | ase.heartbeat.DBStateManagerImpl  756 | ommon.core.services.impl |

          | DB state for host DATA-REP changing from DEGRADED to OK

    ERROR | t Monitor Thread | 2015-04-17 09:15:51,941 | shutdown | ase.heartbeat.DBStateManagerImpl  411 | ase.heartbeat.DBStateManagerImpl  411 | ommon.core.services.impl |

          | DB heartbeat to host DATA-REP execeeded max non-success time of 300000

    WARN  | t Monitor Thread | 2015-04-17 09:15:51,941 | shutdown | ase.heartbeat.DBStateManagerImpl  752 | ase.heartbeat.DBStateManagerImpl  752 | ommon.core.services.impl |

          | DB state for host DATA-REP changing from OK to DOWN

    ERROR | Manager-thread-4 | 2015-04-17 09:15:51,943 | shutdown | ces.shutdown.ShutdownManagerImpl  150 | ces.shutdown.ShutdownManagerImpl  150 | ommon.core.services.impl |

          | Shutting down the data aggregator.It was detected that no data repository nodes were contactable. The uncontactable hosts are:[DATA-REP]

    ERROR | Manager-thread-7 | 2015-04-17 09:15:51,943 | shutdown | tTolerantDBConnectionManagerImpl  240 | tTolerantDBConnectionManagerImpl  240 | ommon.core.services.impl |

          | No DB host name available.

    INFO  | Manager-thread-7 | 2015-04-17 09:15:51,944 | shutdown | tTolerantDBConnectionManagerImpl  395 | tTolerantDBConnectionManagerImpl  395 | ommon.core.services.impl |

          | The primary host for database transactions is now set to null

    ERROR | Manager-thread-7 | 2015-04-17 09:15:51,945 | shutdown | tTolerantDBConnectionManagerImpl  198 | tTolerantDBConnectionManagerImpl  198 | ommon.core.services.impl |

          | The primary data repository host 'DATA-REP' is no longer available, and there are no available secondary hosts. Current Host Status: {DATA-REP=DOWN}

    WARN  | -toHost:DATA-REP | 2015-04-17 09:16:40,766 | shutdown | ase.heartbeat.DBStateManagerImpl  813 | ase.heartbeat.DBStateManagerImpl  813 | ommon.core.services.impl |

          | DB heartbeat to host DATA-REP successful, but the response time of 0:00:34.099 was longer then a threshold of 20000 ms.

    INFO  | -toHost:DATA-REP | 2015-04-17 09:16:40,766 | shutdown | ase.heartbeat.DBStateManagerImpl  756 | ase.heartbeat.DBStateManagerImpl  756 | ommon.core.services.impl |

          | DB state for host DATA-REP changing from INIT to DEGRADED

    INFO  | xtenderThread-62 | 2015-04-17 09:16:40,779 | shutdown | ces.shutdown.ShutdownManagerImpl   61 | ces.shutdown.ShutdownManagerImpl   61 | ommon.core.services.impl |

          | Shutdown Manager initialized with: [DATA-REP=DEGRADED]

    Current State:Available Hosts: [DATA-REP]

    Down Hosts: []

    INFO  | xtenderThread-62 | 2015-04-17 09:16:40,798 | shutdown | tTolerantDBConnectionManagerImpl  395 | tTolerantDBConnectionManagerImpl  395 | ommon.core.services.impl |

          | The primary host for database transactions is now set to DATA-REP

    INFO  | -toHost:DATA-REP | 2015-04-17 09:16:50,810 | shutdown | ase.heartbeat.DBStateManagerImpl  756 | ase.heartbeat.DBStateManagerImpl  756 | ommon.core.services.impl |

          | DB state for host DATA-REP changing from DEGRADED to OK

     

    We just monitored 220 devices. 

     

    Best regards,

    Borworn 



  • 4.  Re: DA stopped after running 6-10 hours everyday

    Posted Apr 23, 2015 12:53 PM

    When there is an issue on DR, DA is stopping automatically.

     

    ERROR | t Monitor Thread | 2015-04-17 09:15:51,941 | shutdown | ase.heartbeat.DBStateManagerImpl  411 | ase.heartbeat.DBStateManagerImpl  411 | ommon.core.services.impl |

          | DB heartbeat to host DATA-REP execeeded max non-success time of 300000

    WARN  | t Monitor Thread | 2015-04-17 09:15:51,941 | shutdown | ase.heartbeat.DBStateManagerImpl  752 | ase.heartbeat.DBStateManagerImpl  752 | ommon.core.services.impl |

          | DB state for host DATA-REP changing from OK to DOWN

    ERROR | Manager-thread-4 | 2015-04-17 09:15:51,943 | shutdown | ces.shutdown.ShutdownManagerImpl  150 | ces.shutdown.ShutdownManagerImpl  150 | ommon.core.services.impl |

          | Shutting down the data aggregator.It was detected that no data repository nodes were contactable. The uncontactable hosts are:[DATA-REP]

    ERROR | Manager-thread-7 | 2015-04-17 09:15:51,943 | shutdown | tTolerantDBConnectionManagerImpl  240 | tTolerantDBConnectionManagerImpl  240 | ommon.core.services.impl |

          | No DB host name available.

     

    How about the performance (CPU, Mem,) of DA and DR machine over time (obviously you have a single DR node installation)? Is there any firewall between DA & DR which causes maybe communication issue between them?



  • 5.  Re: DA stopped after running 6-10 hours everyday

    Posted Jun 30, 2015 01:18 AM

    Hi All,

     

    Thank you for reposing. We fixed our physical interface on DA server. It's work.

     

    Thank you

    Borworn