AnsweredAssumed Answered

Identity Manager Health Checks

Question asked by Eric Laney on Feb 18, 2015
Latest reply on Nov 22, 2017 by KevinMurphyCA

What kinds of health metrics is everyone using for Identity Manager?  I'm looking for any good ideas that I might have overlooked.

 

All of our servers have an automated log maintenance job that runs daily to zip log files older than 24 hours and delete log files older than 90 days.  I check the health of our Identity Manager infrastructure through a series of metrics.  I check these interactively (via script) every day at about 9:30am, right after my log file maintenance job finishes.  Many of these are also plugged into our company's official monitoring system so it will alert me if a service goes down.  The log thresholds are based on just watching each service to see how many files the system typically generates, since some are more prolific than others.

 

  • Automation server: # uncompressed log files older than today <= 3
  • Directory server: # DXserver services configured == # running
  • Directory server: Latest .zdb (binary) backup date == today
  • Directory server: Latest .ldif (text) backup date == Latest .zdb (binary) backup date
  • Directory server: # .zdb (binary) backups >= # non-Router DXservers
  • Directory server: # uncompressed .ldif (text) backups older than today <= # non-Router DXservers
  • Directory server: SNMP monitoring of each DXserver
  • JBoss server: # services configured == # running
  • JBoss server: IME status web page at /iam/im/status.jsp reports "OK"
  • JBoss server: JBoss status web page at /status?XML=true
    • memory used % < 80%
    • request error % (# request errors / # requests) < 40%
    • current thread % (# current threads / # max. threads) < 80%
    • busy thread % (# current busy threads / # current threads) < 80% (unless there is only 1 - the status page thread)
  • Provisioning server: # services configured == # running
  • Provisioning server: # uncompressed log files older than today <= # services * 4
  • Connector server: # services configured == # running
  • Connector server: # uncompressed log files older than today <= # services * 6
  • Reporting server: # services configured == # running
  • Reporting server: # uncompressed log files older than today <= # services + 20
  • Web Proxy server: # services configured == # running
  • Web Proxy server: # uncompressed log files older than today <= # services

Outcomes