AnsweredAssumed Answered

HW monitoring

Question asked by jorgen.lewitzki on Jan 11, 2008
Latest reply on Feb 19, 2013 by cegeka-nv

Hi There

 

 


A while ago I put a question to the Nimsoft support regarding HW monitoring

 

 


on servers:

 

 


 

 

 


Hello
What is the best way to monitor HW according to Nimsoft.....i chosed Component Cim_trap since this seems to be one way using HP compaq servers Haven´t tried it yet though.......but in general...since we might use other brands
Is there any other effective ways to do this and still maintain coop with nimbus
We need to detect Raid errors disk problems, fans etc

 

 


 

 

 


This was the answer

 

 


The best way to monitor H/W devices according to Nimsoft, is to use the snmp based probes. The probes using snmp are SNMPGET, SNMPGTW, SNMPTD and INTERFACE_TRAFFIC. You may also use cim_trap, which converts traps from Compaq messages into NimBUS alarms.

 

 


 

 

 


 

 

 


Well i have now tried the Nimbus way for HW monitoring 

 

 


snmptd with the cim_trap extension!

 

 


ok i must say that the problem descriptions added in the alams are a bit short!!   :smileysad:

 

 


for eg. i removed a disk in the raid on a lab server ....and Yes! an alarm was created

 

 


critical saying:  "Status is now 3"            !!!!!!!!  ????? hrrm ok!

 

 


 

 

 


Let´s say this alarm is received by a  “stressed tech” or viewed by the operation Noc

 

 


what is status 3 ?? on what ???

 

 


 

 

 


after a bit of investigation and one hint from the alarm seen throgh the nimbus manager

 

 


as: Suppression key snmptd/cpqDa6LogDvrStatusChange..

 

 


ok! they might identify this to be Drive or storage related

 

 


hmm ok lets look at the eventviewer

 

 


 

 

 


This is what i can see

 

 


“Drive Array Logical Drive Status Change.  Logical drive number 1 on the array controller in Slot 4 has a new status of 3.

 

 


(

Logical Drive
status values: 1=other, 2=ok, 3=failed, 4=unconfigured, 5=recovering, 6=readyForRebuild, 7=rebuilding, 8=wrongDrive, 9=badConnect, 10=overheating, 11=shutdown, 12=expanding, 13=notAvailable, 14=queuedForExpansion)

 

 


 

 

 


 

 

 


This is a bit more informative 

 

 


 

 

 


Well i know the Mib itself doesn´t provide all the detailed info as in the eventlog

 

 


and from what i can see there is some more variables who can be added to the alarm text

 

 


but to interpret this you really have to go through each and every mib possible and manually edit and add every profile in order to get  a understandable alarm text

 

 


 

 

 


if we where to use another server brand what then ?

 

 


using the eventlog instead isn´t gonna help us on a linux server either

 

 


 

 

 


Ok this is better than nothing but :/

 

 


 

 

 


Well im a bit novice on how to interperet the traps so I do rely on the monitoring software

 

 


to do this for me .....so please share youre knowledge

 

 


 

 

 


 

 

 


Any tips and trix? Other tools or gadgets but still keep the cooperation to  Nimbus to create alarms ….has someone else already done this

 

 


Created your own extentions to the snmptd or ….?

 

 


What do you guys out there use to secure  HW monitoring?

 

 


 

 

 

Outcomes