I had a situation where probes were showing as red in infrastructure manager console and I had to restart nimbus agent to make it green. Is there any way we can monitor the probes status.
My NAS version
GitHub - fraxken/checkconfig_lua: CA UIM Checkconfig LUA for NAS
Or you can execute a SQL request on the discovery_server tables too..
Please see TEC000003617 -> How to view all probes that are either in a deactivated or errored state in a domain
Please note that the controller will normally generate an alarm if any probes are in an error state.
Please also refer to:
Thank you very much. As you mentioned that when probes goes down, it alerts. In our case, it did not generate any alert,
it could be missed from configuration. Since I am totally new to nimsoft, where should I check to see if we set to alert. Can you please help.
Here is some further clarification:
Using the robot controller callback 'probe_list' is helpful to differentiate probes that have been deactivated before a restart of the local robot and probes that 'stopped' running during the time the robot was running on the local machine - which also includes probes that run into an error state (red in IM) and probes that are deactivated after the robot has been started.
For the first category, probes that have been deactivated previously before a restart of robot, the 'process_state' returned will have a value of 'none'.
For the second category, the 'process_state' returned will have value of 'stopped', and for problem probes, 'last_started' that will be the time the probe was last started in number of seconds since 1970, here is an example for the output in the pu command, and restful calls will return the same value but in a different format.
-- db2 is with a red icon in IM, a pu call to controller/probe_list with probe name as parameter returns the following:
db2 PDS_PDS 516name PDS_PCH 4 db2description PDS_PCH 12 db2 monitorgroup PDS_PCH 9 Databaseactive PDS_I 2 0type PDS_I 2 2command PDS_PCH 12 db2_monitorarguments PDS_PCH 1config PDS_PCH 16 db2_monitor.cfgdatafile PDS_PCH 1logfile PDS_PCH 16 db2_monitor.logworkdir PDS_PCH 20 probes/database/db2timespec PDS_PCH 1times_activated PDS_I 2 1last_action PDS_I 11 1481189432pid PDS_I 3 -1times_started PDS_I 3 12last_started PDS_I 11 1481189432pkg_name PDS_PCH 4 db2expires_at PDS_I 11 1506105000pkg_version PDS_PCH 5 4.10pkg_build PDS_PCH 3 18process_state PDS_PCH 8 stoppedport PDS_I 3 -1is_marketplace PDS_I 2 0marketpl_block PDS_I 2 0
-- iostat is deactivated before robot restart:
iostat PDS_PDS 496name PDS_PCH 7 iostatdescription PDS_PCH 46 Generate disk QoS based on output from iostgroup PDS_PCH 7 Systemactive PDS_I 2 0type PDS_I 2 2command PDS_PCH 18 ../../../bin/perlarguments PDS_PCH 10 iostat.plconfig PDS_PCH 11 iostat.cfglogfile PDS_PCH 11 iostat.logworkdir PDS_PCH 21 probes/system/iostattimespec PDS_PCH 1times_activated PDS_I 2 0last_action PDS_I 2 0pid PDS_I 3 -1times_started PDS_I 2 0last_started PDS_I 2 0pkg_name PDS_PCH 7 iostatpkg_version PDS_PCH 5 1.10pkg_build PDS_PCH 3 01process_state PDS_PCH 5 noneport PDS_I 3 -1is_marketplace PDS_I 2 0marketpl_block PDS_I 2 0
--- processes is deactivated without restart of the robot:
processes PDS_PDS 478name PDS_PCH 10 processesdescription PDS_PCH 25 Process monitoring probegroup PDS_PCH 7 Systemactive PDS_I 2 0type PDS_I 2 2command PDS_PCH 10 processesarguments PDS_PCH 1config PDS_PCH 14 processes.cfglogfile PDS_PCH 14 processes.logworkdir PDS_PCH 24 probes/system/processestimespec PDS_PCH 1times_activated PDS_I 2 0last_action PDS_I 2 0pid PDS_I 3 -1times_started PDS_I 2 2last_started PDS_I 2 0pkg_name PDS_PCH 10 processespkg_version PDS_PCH 5 4.31pkg_build PDS_PCH 4 227process_state PDS_PCH 8 stoppedport PDS_I 3 -1is_marketplace PDS_I 2 0marketpl_block PDS_I 2 0
Here is an example of the pu command when using probe_list callback but Ive removed and replaced sensitive information:
C:\>"C:\Program Files (x86)"\Nimsoft\bin\pu.exe -u administrator -p <password><UIM_domain>/<UIM_hub>/<UIM_Robot>/controller probe_list controllerJan 25 13:41:43:879 pu: SSL - init: mode=0, cipher=DEFAULT, context=OKJan 25 13:41:43:880 pu: nimCharsetSet() - charset=======================================================Address: <UIM_domain>/<UIM_hub>/<UIM_Robot>/controller probe_list controllerRequest: probe_list======================================================controller PDS_PDS 465 name PDS_PCH 11 controller description PDS_PCH 34 Robot process and port controller ~ group PDS_PCH 15 Infrastructure active PDS_I 2 1 type PDS_I 2 0 command PDS_PCH 15 controller.exe config PDS_PCH 10 robot.cfg logfile PDS_PCH 15 controller.log workdir PDS_PCH 6 robot timespec PDS_PCH 1 times_activated PDS_I 2 0 last_action PDS_I 2 0 pid PDS_I 5 3304 times_started PDS_I 2 1 last_started PDS_I 11 1484088410 pkg_name PDS_PCH 13 robot_update pkg_version PDS_PCH 5 7.80 process_state PDS_PCH 8 running port PDS_I 6 48000 is_marketplace PDS_I 2 0 marketpl_block PDS_I 2 0
In this last case, notice that the process state is 'running'
I think this is a good question and I believe that we generatlly receive alert for probe issues.
I am using one perl script to check the communication issue between the agent and hub. Generally i run this script once in a day and find out the communication issues. In this practice i have found couple of issues where the agent is showing green on the console however in reality there was communication issue between agent and hub when checked. Sometimes the issue persist because of mis configuration in robot.cfg and sometimes firewall etc.
We have Lua scripts that check the communication to the robot and pings the device as well. UIM wasn't giving us all the alarms when a probe had failed.
Thanks Jason, I had downloaded Lua script. However not sure about how to execute. Can you please help me.
There are two ways to run the Lua.
1. Within NAS - open the nas probe and go to the scheduler tab.
2. NSA.exe - you can deploy the nsa compiler from your archive to any robot in your environment. That will enable you to run the script from the command line using the \Nimsoft\sdk\nsa\nsa.exe command.
If you search for a complete solution i recently created these probes to answer serious self-monitoring need on CA UIM.
GitHub - fraxken/selfmonitoring: CA UIM Self monitoring probe
GitHub - fraxken/robots_checker: CA UIM Robots_checker (check probes, and do callback on it)
New update are comming soon ( Supp_key, Alarm Enrichment , automatic clear when resolved ).
Just some work to install Perl lib and configure the framework for your system :
Release Light R4.0 · fraxken/perluim · GitHub
Starter guide · fraxken/perluim Wiki · GitHub
Actually running well on a 20,000+ servers environment. ( and ~ 40 hubs ).
Would you please help me how to use the robot_checker.cfg ? Need it to be deployed on to local hub or any other method is there to use ?
Thanks in advance.
Yeah sure ! I'm busy with a another UIM project but i'm available full time in April.
If you have any questions or if you need help to install the probe
Thanks Thomas. I have deployed the robot_checker.cfg file, It is saying installed . I just created a package and added this cfg file.How to know if it is working or not ?
Dont forget to take the version from the "release" tab (from my github). Latest is 1.5 :
Release Sariah (R1.5) · fraxken/robots_checker · GitHub
The Perluim framework is bundled with probe. If you work on a NIX system you will have to re-configure the package (at the time it was only edited for Windows).
To know more about the cfg and keys look at the readme on github.
Retrieving data ...