DX NetOps

  • 1.  Status of DCMD Service incorrect

    Posted Jul 20, 2018 02:48 AM

    Dear Team,

     

    When checking the status of IMDataCollector 'dcmd' the status shows as active(Running) when one of the service (

    /opt/IMDataCollector/jre/bin/java) is not running. Refer Below sample.

     

    systemctl status dcmd

    dcmd.service - Data Collector

       Loaded: loaded (/etc/systemd/system/dcmd.service; enabled; vendor preset: disabled)

       Active: active (running) since Thu 2018-07-19 13:30:49 IST; 2h 38min ago

      Process: 11031 ExecStop=/opt/IMDataCollector/scripts/dcmd stop sysd (code=exited, status=0/SUCCESS)

      Process: 15162 ExecStart=/opt/IMDataCollector/scripts/dcmd start sysd (code=exited, status=0/SUCCESS)

        Tasks: 4

       Memory: 2.5M

       CGroup: /system.slice/dcmd.service

               └─15192 /opt/IMDataCollector/ICMPD/IcmpDaemon --start

     

     

    I think the Script which check status should check both the status of ICMP Daemon and Java process both and then report status as active or dead.

     



  • 2.  Re: Status of DCMD Service incorrect

    Broadcom Employee
    Posted Jul 20, 2018 08:58 AM

    Chetan,

     

    What does this command return?

     

    ps -ef | grep apache

     

    Troy



  • 3.  Re: Status of DCMD Service incorrect

    Posted Jul 20, 2018 09:09 AM

    Hi Troy,

     

    We have 3 DC out of which 1 is currently showing as 'not connected' in the PC Console GUI. Below is the output of service dcmd status and ps -ef | grep apache pasted for ref.

     

    pmadmin@inp44vpehcol1 apache-karaf-2.4.3]$ ps -ef | grep apache

    pmadmin  19641     1  2 Jul19 ?        00:46:53 /opt/IMDataCollector/jre/bin/java -Xms500M -Xmx1000M -Xmn250M -server -XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:TargetSurvivorRatio=90 -XX:InitialTenuringThreshold=15 -XX:MaxTenuringThreshold=15 -XX:+ScavengeBeforeFullGC -XX:+ExplicitGCInvokesConcurrent -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -Djava.util.logging.config.file=logging.properties -Djava.security.auth.login.config=/opt/IMDataCollector/broker/apache-activemq-5.15.2/conf/login.config -Dcom.sun.management.jmxremote -Djava.awt.headless=true -Djava.io.tmpdir=/opt/IMDataCollector/broker/apache-activemq-5.15.2//tmp -Dactivemq.classpath=/opt/IMDataCollector/broker/apache-activemq-5.15.2//conf:/opt/IMDataCollector/broker/apache-activemq-5.15.2//../lib/: -Dactivemq.home=/opt/IMDataCollector/broker/apache-activemq-5.15.2/ -Dactivemq.base=/opt/IMDataCollector/broker/apache-activemq-5.15.2/ -Dactivemq.conf=/opt/IMDataCollector/broker/apache-activemq-5.15.2//conf -Dactivemq.data=/opt/IMDataCollector/broker/apache-activemq-5.15.2//data -jar /opt/IMDataCollector/broker/apache-activemq-5.15.2//bin/activemq.jar start

    pmadmin  25714     1 99 Jul19 ?        3-07:38:06 /opt/IMDataCollector/jre/bin/java -Xms1024M -Xmx5252M -server -Xms1024M -Xmx5252M -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass -Dcom.sun.management.jmxremote -XX:NewRatio=3 -XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:TargetSurvivorRatio=50 -XX:InitialTenuringThreshold=15 -XX:MaxTenuringThreshold=15 -XX:+ScavengeBeforeFullGC -XX:+ExplicitGCInvokesConcurrent -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -Djava.endorsed.dirs=/opt/IMDataCollector/jre/jre/lib/endorsed:/opt/IMDataCollector/jre/lib/endorsed:/opt/IMDataCollector/apache-karaf-2.4.3/lib/endorsed -Djava.ext.dirs=/opt/IMDataCollector/jre/jre/lib/ext:/opt/IMDataCollector/jre/lib/ext:/opt/IMDataCollector/apache-karaf-2.4.3/lib/ext -Dkaraf.instances=/opt/IMDataCollector/apache-karaf-2.4.3/instances -Dkaraf.home=/opt/IMDataCollector/apache-karaf-2.4.3 -Dkaraf.base=/opt/IMDataCollector/apache-karaf-2.4.3 -Dkaraf.data=/opt/IMDataCollector/apache-karaf-2.4.3/data -Dkaraf.etc=/opt/IMDataCollector/apache-karaf-2.4.3/etc -Dda.data.home=/opt/IMDataCollector/apache-karaf-2.4.3/da_data -Dda.version=1.0.0.0 -Djava.io.tmpdir=/opt/IMDataCollector/apache-karaf-2.4.3/data/tmp -Djava.util.logging.config.file=/opt/IMDataCollector/apache-karaf-2.4.3/etc/java.util.logging.properties -XX:+HeapDumpOnOutOfMemoryError -Dorg.apache.activemq.SERIALIZABLE_PACKAGES=* -XX:OnOutOfMemoryError=/opt/IMDataCollector/apache-karaf-2.4.3/bin/restart -Dkaraf.startLocalConsole=false -Dkaraf.startRemoteShell=true -classpath /opt/IMDataCollector/apache-karaf-2.4.3/lib/karaf-jaas-boot.jar:/opt/IMDataCollector/apach-karaf-2.4.3/lib/karaf.jar:/opt/IMDataCollector/apache-karaf-2.4.3/lib/karaf-wrapper.jar org.apache.karaf.main.Main

    pmadmin  40813 40781  0 18:33 pts/1    00:00:00 grep --color=auto apache

     

    pmadmin@inp44vpehcol1 ~]$ service dcmd status

    Redirecting to /bin/systemctl status dcmd.service

    dcmd.service - Data Collector

       Loaded: loaded (/etc/systemd/system/dcmd.service; enabled; vendor preset: disabled)

       Active: active (running) since Thu 2018-07-19 16:35:17 IST; 1 day 1h ago

      Process: 25569 ExecStop=/opt/IMDataCollector/scripts/dcmd stop sysd (code=exited, status=0/SUCCESS)

      Process: 25661 ExecStart=/opt/IMDataCollector/scripts/dcmd start sysd (code=exited, status=0/SUCCESS)

        Tasks: 781

       Memory: 5.6G

       CGroup: /system.slice/dcmd.service

               ├─25691 /opt/IMDataCollector/ICMPD/IcmpDaemon --start

               └─25714 /opt/IMDataCollector/jre/bin/java -Xms1024M -Xmx5252M -server -Xms1024M -Xmx5252M -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass -Dcom.sun.management.jmxremote -XX:NewRatio=3 -XX:Survi...

    [pmadmin@inp44vpehcol1

     

    In this situation how to know if DC is really collection data or not?



  • 4.  Re: Status of DCMD Service incorrect

    Broadcom Employee
    Posted Jul 20, 2018 09:19 AM

    Chetan,

     

    Is this the 1 Data Collector that is showing as Not Connected?

     

    There is more to the polling of information than just collecting it, you need to have somewhere to store it.  I would validate that the DC can connect to the DA on ports 61616, 61618, 616120, 61622 as if it is showing as Not Connected yet the processes seems to be running, it is likely unable to connect to its own ActiveMQ service or to the DA.  You can also check any of these three logs for more information:

     

    /opt/IMDataCollector/broker/apache-activemq-5.15.2/data/activemq.log

    /opt/IMDataCollector/apache-karaf-2.4.3/shutdown.log

    /opt/IMDataCollector/apache-karaf-2.4.3/data/log/karaf.log

     

    If you are not able to discern where the fault may lie, I would recommend opening a support case and including both Data Collector and Data Aggregator remote engineer tar files.  For more information on remote enginer, please see this KB Article:

     

    How to Run the CA Remote Engineer (CARE) diagnosti - CA Knowledge 

     

    Note: You will need to run the re.sh command on both the DA and the DC separately, you cannot collect logs from both systems when running from a singular system.

     

    Troy



  • 5.  Re: Status of DCMD Service incorrect

    Posted Jul 20, 2018 10:45 AM

    Hi Troy,

     

    Yes this is for the collector which is showing 'not connected'.

     

    I am seeing the attached log in the DA where the DC is disconnected and connected back on its own. Just want to understand, how the communication works between DC and DA.. is it some heartbeat based and if there is any timeout value set?

     

    ERROR | atTimer-thread-2 | 2018-07-20 17:32:54,877 | DCHeartBeatLog | impl.DCMContactStatusManagerImpl  116 | ager.core.collector.impl |       | Lost contact to DC inp44vpehcol1:312c1a43-6684-4a04-a5fe-87f6ec7a839f.  State changed from RUNNING to CONTACT_LOST.  The last heartbeat was received 52389 ms ago

     

     

    INFO  | atTimer-thread-2 | 2018-07-20 19:47:14,877 | DCHeartBeatLog | impl.DCMContactStatusManagerImpl  126 | ager.core.collector.impl |       | Contact established to DC inp44vpehcol1:312c1a43-6684-4a04-a5fe-87f6ec7a839f. State changed from CONTACT_LOST to RUNNING.  The last heartbeat was received 7151 ms ago

     

    I will raise separate case with CARE logs.

     

    Rgds,

    Chetan