HW monitoring

Question asked by jorgen.lewitzki on Jan 11, 2008
Latest reply on Feb 19, 2013 by cegeka-nv

Hi There



A while ago I put a question to the Nimsoft support regarding HW monitoring



on servers:






What is the best way to monitor HW according to Nimsoft.....i chosed Component Cim_trap since this seems to be one way using HP compaq servers Haven´t tried it yet though.......but in general...since we might use other brands
Is there any other effective ways to do this and still maintain coop with nimbus
We need to detect Raid errors disk problems, fans etc






This was the answer



The best way to monitor H/W devices according to Nimsoft, is to use the snmp based probes. The probes using snmp are SNMPGET, SNMPGTW, SNMPTD and INTERFACE_TRAFFIC. You may also use cim_trap, which converts traps from Compaq messages into NimBUS alarms.









Well i have now tried the Nimbus way for HW monitoring 



snmptd with the cim_trap extension!



ok i must say that the problem descriptions added in the alams are a bit short!!   :smileysad:



for eg. i removed a disk in the raid on a lab server ....and Yes! an alarm was created



critical saying:  "Status is now 3"            !!!!!!!!  ????? hrrm ok!






Let´s say this alarm is received by a  “stressed tech” or viewed by the operation Noc



what is status 3 ?? on what ???






after a bit of investigation and one hint from the alarm seen throgh the nimbus manager



as: Suppression key snmptd/cpqDa6LogDvrStatusChange..



ok! they might identify this to be Drive or storage related



hmm ok lets look at the eventviewer






This is what i can see



“Drive Array Logical Drive Status Change.  Logical drive number 1 on the array controller in Slot 4 has a new status of 3.




Logical Drive
status values: 1=other, 2=ok, 3=failed, 4=unconfigured, 5=recovering, 6=readyForRebuild, 7=rebuilding, 8=wrongDrive, 9=badConnect, 10=overheating, 11=shutdown, 12=expanding, 13=notAvailable, 14=queuedForExpansion)









This is a bit more informative 






Well i know the Mib itself doesn´t provide all the detailed info as in the eventlog



and from what i can see there is some more variables who can be added to the alarm text



but to interpret this you really have to go through each and every mib possible and manually edit and add every profile in order to get  a understandable alarm text






if we where to use another server brand what then ?



using the eventlog instead isn´t gonna help us on a linux server either






Ok this is better than nothing but :/






Well im a bit novice on how to interperet the traps so I do rely on the monitoring software



to do this for me please share youre knowledge









Any tips and trix? Other tools or gadgets but still keep the cooperation to  Nimbus to create alarms ….has someone else already done this



Created your own extentions to the snmptd or ….?



What do you guys out there use to secure  HW monitoring?