DX Unified Infrastructure Management

  • 1.  Robot / Hub Communication Error

    Broadcom Employee
    Posted Jul 09, 2018 04:09 PM

    Team,

     

    While moving robot from one hub to another hub we faced some challenges today. There is no tunnel involved. As a part of troubleshooting, we modified required settings in the robot.cfg to point the robot to new hub, but still we still faced communication issues.

     

    Below is the snip from hub logs. I have also copied logs from controller probe below

     

    Jul  9 15:51:24:476 [5628] hub:  head  mtype=100 cmd=heartbeat seq=125766 ts=1531165884 frm=10.224.24.111/48002 tout=30 addr= sid=i7rKhzzxE+CysjROPTPhIwADMTUzMTE2NjQ4MgFDT01fUmVtb3RlX0hVQl8xATAB

    Jul  9 15:51:24:476 [5628] hub: RREPLY: status=OK(0) <-10.224.24.110/59111  h=68 d=0

    Jul  9 15:51:24:476 [5628] hub:  head  mtype=200 charset=windows-1252 seq=125766 status=0

    Jul  9 15:51:24:476 [5628] hub: Sent heartbeat on queue route 't_11'

    Jul  9 15:51:25:260 [1772] hub: sockAccept: new socket fd=1960

    Jul  9 15:51:25:260 [1772] hub: sockAccept: new session nims=0000000000C7C2B0 new=00000000032F26E0 fd=1960 0.0.0.0/48002<-10.15.9.228/63319

    Jul  9 15:51:25:260 [1772] hub: sockCloseErr: 00000000032F26E0 10.15.9.228/63319 socket disconnected

    Jul  9 15:51:25:260 [1772] hub: sockClose:00000000032F26E0:0.0.0.0/48002

    Jul  9 15:51:25:260 [1772] hub: sockClose - closesocket 1960

    Jul  9 15:51:26:276 [1772] hub: sockAccept: new socket fd=2024

    Jul  9 15:51:26:276 [1772] hub: sockAccept: new session nims=0000000000C7C2B0 new=00000000032F9910 fd=2024 0.0.0.0/48002<-10.15.9.228/63326

    Jul  9 15:51:26:276 [1772] hub: sockCloseErr: 00000000032F9910 10.15.9.228/63326 socket disconnected

    Jul  9 15:51:26:276 [1772] hub: sockClose:00000000032F9910:0.0.0.0/48002

    Jul  9 15:51:26:276 [1772] hub: sockClose - closesocket 2024

    Jul  9 15:51:26:292 [5628] hub: SREQUEST: heartbeat ->10.224.24.110/59111

    Jul  9 15:51:26:292 [5628] hub:  head  mtype=100 cmd=heartbeat seq=125767 ts=1531165886 frm=10.224.24.111/48002 tout=30 addr= sid=i7rKhzzxE+CysjROPTPhIwADMTUzMTE2NjQ4MgFDT01fUmVtb3RlX0hVQl8xATAB

    Jul  9 15:51:26:292 [5628] hub: RREPLY: status=OK(0) <-10.224.24.110/59111  h=68 d=0

    Jul  9 15:51:26:292 [5628] hub:  head  mtype=200 charset=windows-1252 seq=125767 status=0

    Jul  9 15:51:26:292 [5628] hub: Sent heartbeat on queue route 't_11'

    Jul  9 15:51:27:274 [1772] hub: sockAccept: new socket fd=1932

    Jul  9 15:51:27:274 [1772] hub: sockAccept: new session nims=0000000000C7C2B0 new=00000000032F26E0 fd=1932 0.0.0.0/48002<-10.15.9.228/63327

    Jul  9 15:51:27:274 [1772] hub: medium timeout

    Jul  9 15:51:27:274 [1772] hub: sockCloseErr: 00000000032F26E0 10.15.9.228/63327 socket disconnected

    Jul  9 15:51:27:274 [1772] hub: sockClose:00000000032F26E0:0.0.0.0/48002

    Jul  9 15:51:27:274 [1772] hub: sockClose - closesocket 1932

    Jul  9 15:51:28:104 [5628] hub: SREQUEST: heartbeat ->10.224.24.110/59111

    Jul  9 15:51:28:104 [5628] hub:  head  mtype=100 cmd=heartbeat seq=125768 ts=1531165888 frm=10.224.24.111/48002 tout=30 addr= sid=i7rKhzzxE+CysjROPTPhIwADMTUzMTE2NjQ4MgFDT01fUmVtb3RlX0hVQl8xATAB

    Jul  9 15:51:28:104 [5628] hub: RREPLY: status=OK(0) <-10.224.24.110/59111  h=68 d=0

    Jul  9 15:51:28:104 [5628] hub:  head  mtype=200 charset=windows-1252 seq=125768 status=0

     

    Controller logs

    Jul  9 15:50:27:518 [9372] Controller:  robotip_alias=

    Jul  9 15:50:27:518 [9372] Controller: 'robotup' to COM_Remote_HUB_1(10.224.24.111) for tr0235casm002

    Jul  9 15:50:27:520 [9372] Controller: SREQUEST: robotup ->10.224.24.111/48002

    Jul  9 15:50:27:520 [9372] Controller:  head  mtype=100 cmd=robotup seq=0 ts=1531165827 frm=10.15.9.228/63075 tout=15 addr=

    Jul  9 15:50:27:520 [9372] Controller:  data  robotname=tr0235casm002 robotip=10.15.9.228 robotport=48000 version=7.93HF5 [Build 7.93HF5.9817, Apr  7 2018] flag=assign ssl_mode=0 os_major=Windows os_minor=Windows Server 2008 R2 Standard Edition, 64-bit os_description=Service Pack 1 Build 7601 os_user1=EntMon os_user2=CA Spectrum    offline=2 

    Jul  9 15:50:27:520 [9372] Controller: sockCloseErr: 00000000006BC7E0 10.224.24.111/48002 socket disconnected

    Jul  9 15:50:27:520 [9372] Controller: nimSessionWaitMsg: got error on client session: 10054

    Jul  9 15:50:27:520 [9372] Controller: sockClose:00000000006BC7E0:10.15.9.228/63075

    Jul  9 15:50:27:520 [9372] Controller: hub COM_Remote_HUB_1(10.224.24.111) NO CONTACT (communication error)

    Jul  9 15:50:27:520 [9372] Controller: Try secondary hub

    Jul  9 15:50:27:520 [9372] Controller: No secondary hub configured

    Jul  9 15:50:27:520 [9372] Controller: Try first hub answering (temporary)

    Jul  9 15:50:27:521 [9372] Controller: Access [4] = 0 -> 0

    Jul  9 15:50:27:521 [9372] Controller: MyPutEnv NIMCPRID

    Jul  9 15:50:28:521 [9372] Controller: medt_timeout -> 17 seconds is under interval 60. Plugins will not be run.

    Jul  9 15:50:28:521 [9372] Controller: Try primary hub

    Jul  9 15:50:28:521 [9372] Controller: dorobotup -

    Jul  9 15:50:28:521 [9372] Controller:  hubdomain=xyz

    Jul  9 15:50:28:521 [9372] Controller:  hubname=COM_Remote_HUB_1

    Jul  9 15:50:28:521 [9372] Controller:  hubip=10.224.24.111, hubport=48002

    Jul  9 15:50:28:521 [9372] Controller:  hub_dns_name=

    Jul  9 15:50:28:521 [9372] Controller:  robotip_alias=

    Jul  9 15:50:28:521 [9372] Controller: 'robotup' to COM_Remote_HUB_1(10.224.24.111) for tr0235casm002

    Jul  9 15:50:28:523 [9372] Controller: SREQUEST: robotup ->10.224.24.111/48002

    Jul  9 15:50:28:523 [9372] Controller:  head  mtype=100 cmd=robotup seq=0 ts=1531165828 frm=10.15.9.228/63078 tout=15 addr=

    Jul  9 15:50:28:523 [9372] Controller:  data  robotname=tr0235casm002 robotip=10.15.9.228 robotport=48000 version=7.93HF5 [Build 7.93HF5.9817, Apr  7 2018] flag=assign ssl_mode=0 os_major=Windows os_minor=Windows Server 2008 R2 Standard Edition, 64-bit os_description=Service Pack 1 Build 7601 os_user1=EntMon os_user2=CA Spectrum    offline=2 

    Jul  9 15:50:28:523 [9372] Controller: sockCloseErr: 00000000006BC7E0 10.224.24.111/48002 socket disconnected

    Jul  9 15:50:28:523 [9372] Controller: nimSessionWaitMsg: got error on client session: 10054

    Jul  9 15:50:28:523 [9372] Controller: sockClose:00000000006BC7E0:10.15.9.228/63078

    Jul  9 15:50:28:523 [9372] Controller: hub COM_Remote_HUB_1(10.224.24.111) NO CONTACT (communication error)

    Jul  9 15:50:28:524 [9372] Controller: Try secondary hub

    Jul  9 15:50:28:524 [9372] Controller: No secondary hub configured

    Jul  9 15:50:28:524 [9372] Controller: Try first hub answering (temporary)

    Jul  9 15:50:28:524 [9372] Controller: Access [4] = 0 -> 0

    Jul  9 15:50:28:524 [9372] Controller: MyPutEnv NIMCPRID

     

    Can you please help in isolating this issue. What other possible avenues to look at?

     

    Thanks

    Balkar



  • 2.  Re: Robot / Hub Communication Error
    Best Answer

    Broadcom Employee
    Posted Jul 09, 2018 05:15 PM

    It looks like you can communicate from the robot on 10.15.9.228 to the hub on 10.224.24.111

    but the reply is not making it back.

    do you have the windows firewall enabled?

    can you telnet from the hub to the robot on port  48002?

    these look to be on two different networks possibly

    does traceroute take the same path in both directions?



  • 3.  Re: Robot / Hub Communication Error

    Broadcom Employee
    Posted Jul 10, 2018 11:25 AM

    The network team has confirmed that ports 48000 – 48050 are open from hub to all servers.  They see traffic over those ports but are reporting that the agent is resetting the connection.



  • 4.  Re: Robot / Hub Communication Error

    Posted Jul 10, 2018 02:32 PM

    Have the network team verify that traffic is open bidirectionally between the hub and the servers.  You, or they, may need to do some packet captures on the hub and a server to truly verify traffic is actually traversing back and forth between the hub and the server.  I have seen network admins put the firewall rules in place, but they only had one direction enabled and forgot the other.  I have also seen the Windows firewall settings done the same way.  So, you will need to verify those as well.   



  • 5.  Re: Robot / Hub Communication Error

    Broadcom Employee
    Posted Jul 16, 2018 05:57 PM

    Thanks Chris



  • 6.  Re: Robot / Hub Communication Error

    Broadcom Employee
    Posted Jul 11, 2018 07:23 AM

    @Balkar Singh Kang did you do any progress after Chris's reply?



  • 7.  Re: Robot / Hub Communication Error

    Broadcom Employee
    Posted Jul 16, 2018 05:58 PM

    Yes This was a network issue.