Hi there, how are you?
We're having several CI's where during the day in a random way, are alarm with this
Connection to ****** timed out
For now, what I could detect the sceneario is
AIX or LINUX devices.
The same server where the rsp is installed (I have two servers with rsp).
Happens and then it clears after a while (or not).
Sometimes restarting the probe works to clear the alarms.
If I check the devices in IM > RSP; sometimes I can see metrics and others not.
I can connect to device (putty) when the alarm is alive
The device responds to ping properly, both from rsp box, from IM server, and also my personal computer (is not shutdown or being reboot)
From the rsp log what I can see is when the alarms are created
Nov 27 16:23:11:001 [6568] rsp: run_command_Others: Command Fired. Reading Data
Nov 27 16:23:11:033 [5748] rsp: run_command_Others: - Command Timeout
Nov 27 16:23:11:033 [8496] rsp: run_command_Others: - Command Timeout
Nov 27 16:23:11:033 [5564] rsp: run_command_Others: - Command Timeout
Nov 27 16:23:11:033 [8496] rsp: run_command_Others: Received an error - 3. Raising Alarm
For now, I changed the "Command timeout" time from 120 to 240 just to see if happens again, but properly it will.
Any tips in how to troubleshoot this?
Regards.