AnsweredAssumed Answered

Several Devices with random timed out alarm CA UIM

Question asked by Earamendi on Nov 27, 2017
Latest reply on Dec 5, 2017 by Earamendi

Hi there, how are you?

 

We're having several CI's where during the day in a random way, are alarm with this

 

Connection to ****** timed out

 

 

For now, what I could detect the sceneario is

 

AIX or LINUX devices.

The same server where the rsp is installed (I have two servers with rsp).

Happens and then it clears after a while (or not).

Sometimes restarting the probe works to clear the alarms.

If I check the devices in IM > RSP; sometimes I can see metrics and others not.

I can connect to device (putty) when the alarm is alive

The device responds to ping properly, both from rsp box, from IM server, and also my personal computer (is not shutdown or being reboot)

 

From the rsp log what I can see is when the alarms are created 

 

Nov 27 16:23:11:001 [6568] rsp: run_command_Others: Command Fired. Reading Data
Nov 27 16:23:11:033 [5748] rsp: run_command_Others: - Command Timeout
Nov 27 16:23:11:033 [8496] rsp: run_command_Others: - Command Timeout
Nov 27 16:23:11:033 [5564] rsp: run_command_Others: - Command Timeout
Nov 27 16:23:11:033 [8496] rsp: run_command_Others: Received an error - 3. Raising Alarm

 

For now, I changed the "Command timeout" time from 120 to 240 just to see if happens again, but properly it will.

 

Any tips in how to troubleshoot this?

 

Regards.

Outcomes