Monitoring needs are different for every company: keeping CPU usage low and ensure room for expansion is optimal in same cases while other organizations prefer to use their equipment efficiently keeping the CPU usage above 70%, so not always “lower is better”.
This documents intends to give some best practices to monitor CPU utilization and to easily detect CPU bottlenecks.
While Average CPU Utilization for the device as a whole is important to detect how busy is the system, it is also necessary to check CPU utilization for individual processors: There are single threaded applications that can take up to 100% of a single core and this can be missed if looking only at total average CPU usage.
A high CPU queue length (system load in Unix systems) indicates processes are waiting for CPU and this is a clear indicator of problems. Note that this queue can develop when utilization is well below 90% so CPU queue length should be a must in CPU monitoring as reported in several studies.
A basic rule (valid for several OS flavors) to detect a CPU bottleneck is to monitor if the CPU queue length is at least twice the value of number of processors. CDM probe can handle this condition: If running on a multi-CPU system, the queued processes will be shared on the number of processors. For example, if running on a system with four processors and using the default Max Queue Length value (4), alarm messages will be generated if the number of queued processes exceeds 16.
Make use of the built-in detection for predictive alarms (TTT – Time To Threshold) to proactively detect CPU bottlenecks before they happen and the TOT (Time over Threshold) to filter out spikes and focus on problematic situations.
Identify the top consuming processes of a server by configuring CpuErrorProcesses and CpuWarningProcesses metrics in the cdm probe. This feature is fundamental to determine the main applications impacting performance.
These guidelines can be implemented using the CDM probe and easily deployed by using UIM MCS to achieve an accurate and efficient CPU monitoring approach.