DX Unified Infrastructure Management

  • 1.  Several Devices with random timed out alarm CA UIM

    Posted Nov 27, 2017 02:42 PM

    Hi there, how are you?

     

    We're having several CI's where during the day in a random way, are alarm with this

     

    Connection to ****** timed out

     

     

    For now, what I could detect the sceneario is

     

    AIX or LINUX devices.

    The same server where the rsp is installed (I have two servers with rsp).

    Happens and then it clears after a while (or not).

    Sometimes restarting the probe works to clear the alarms.

    If I check the devices in IM > RSP; sometimes I can see metrics and others not.

    I can connect to device (putty) when the alarm is alive

    The device responds to ping properly, both from rsp box, from IM server, and also my personal computer (is not shutdown or being reboot)

     

    From the rsp log what I can see is when the alarms are created 

     

    Nov 27 16:23:11:001 [6568] rsp: run_command_Others: Command Fired. Reading Data
    Nov 27 16:23:11:033 [5748] rsp: run_command_Others: - Command Timeout
    Nov 27 16:23:11:033 [8496] rsp: run_command_Others: - Command Timeout
    Nov 27 16:23:11:033 [5564] rsp: run_command_Others: - Command Timeout
    Nov 27 16:23:11:033 [8496] rsp: run_command_Others: Received an error - 3. Raising Alarm

     

    For now, I changed the "Command timeout" time from 120 to 240 just to see if happens again, but properly it will.

     

    Any tips in how to troubleshoot this?

     

    Regards.



  • 2.  Re: Several Devices with random timed out alarm CA UIM

    Broadcom Employee
    Posted Nov 28, 2017 11:41 AM

    Do you have full SUDO permissions for this probe?

    I've seen this error before with AIX when the probe was not running with full SUDO permissions.

     

    Also, make sure you've reviewed the installation considerations.

     

    Hope that helps!



  • 3.  Re: Several Devices with random timed out alarm CA UIM

    Posted Nov 28, 2017 12:11 PM

    I do for all the commands detailed on the Installation Consideration.

     

    But then again, this alarm happens randomly across the day, the alarm at some point, but then after a couple seconds/minutes they are automatically cleared.

     

    rsp (Remote System Probe) Release Notes - CA Unified Infrastructure Management Probes - Documentación de CA Technologies

    • AIX
      • /bin/uname
      • /usr/bin/pagesize 
      • /usr/bin/hostname
      • /usr/sbin/sar
      • /usr/bin/df
      • /usr/sbin/bootinfo
      • /usr/bin/vmstat
      • /usr/bin/uptime
      • /usr/bin/vmstat
      • /usr/sbin/swap (AIX 5.x)
      • ps -efo


  • 4.  Re: Several Devices with random timed out alarm CA UIM
    Best Answer

    Broadcom Employee
    Posted Nov 28, 2017 01:15 PM

    I might recommend performing some kind of network packet capture, i.e. WireShark.

    Since it's intermittent, I wonder if something is scanning/inspecting packets and is slowing down the transfer rates, causing a timeout. I've seen similar situations with AntiVirus programs, Security software (firewall), or backup software.

     

    If you can't find the culprit with that, I suggest opening a ticket with Support so we can help with the investigation.



  • 5.  Re: Several Devices with random timed out alarm CA UIM

    Posted Dec 05, 2017 02:04 PM

    Hi Philip how are you?

     

    I went with the AntiVirus real time scan. I asked our AV admin to make a rule to disallowed the realtime scan on the folders where the rsp probe is installed and also where the main hub is installed.

     

    I'm still seeing timed out alarms, but now I could reduced to a few devices and aparently are always the same.

     

    Regards