DX Unified Infrastructure Management

  • 1.  When we could receive multiple alerts for one issue?

    Posted Nov 15, 2016 06:16 PM

    Hi Guys,

     

    We have received multiple tickets for one problem. In Nimsoft we have setup cfg to generate or update alert after 2sample. Poll time is also 3-5 minutes. However, we have received 17 tickets for one problem. In NAS i am not able to find if there was a new alert with new NIM ID.

     

    Kindly suggest me how to find the root cause because logically Nimsoft should not generate 17 new alerts for one problem.

     

    Logs has been replaced so nohing to check in that.

     

    Below is the sample of tickets.

     

    NumberSummaryCreated
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:11:13+01:002016-11-15 12:12
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:11:16+01:002016-11-15 12:12
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:08:25+01:002016-11-15 12:09
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:09:08+01:002016-11-15 12:10
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:10:19+01:002016-11-15 12:12
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:09:18+01:002016-11-15 12:11
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:09:58+01:002016-11-15 12:11
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:10:09+01:002016-11-15 12:12
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:09:11+01:002016-11-15 12:10
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:11:22+01:002016-11-15 12:12
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:11:09+01:002016-11-15 12:12
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:09:49+01:002016-11-15 12:11
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:10:47+01:002016-11-15 12:12
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:09:10+01:002016-11-15 12:10
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:11:45+01:002016-11-15 12:12
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:09:39+01:002016-11-15 12:11
    INC0******ServerName | Average (2 samples) disk free on C:\ is now 15%, which is below the warning threshold (15%) out of total size 68.3 GB | 2016-11-15T12:09:13+01:002016-11-15 12:11


  • 2.  Re: When we could receive multiple alerts for one issue?

    Posted Nov 16, 2016 02:36 AM

    Query the History of the Alarms from IM Console for the duration when this alarm was generated and that can help with the investigation 

     

    Also check what is the "Message counter" value set in the AO profile 



  • 3.  Re: When we could receive multiple alerts for one issue?
    Best Answer

    Posted Nov 16, 2016 03:06 AM

    If you activated in the nas transaction logging you could use a query like:

    ---

    select l.time, l.type, l.nimid, l.level, l.severity, l.message, l.subsys, l.source, l.hostname, l.prid, l.robot, l.hub, l.nas,
    l.domain, l.suppcount, a.suppcount, l.visible, a.supp_key, l.nimts, DATEDIFF(DAY, l.nimts,l.time) durationdd, CONVERT(varchar(8), DATEADD(minute, DATEDIFF(minute, l.nimts,l.time), 0), 114) durationhh, l.sid,l.acknowledged_by, l.user_tag1, l.user_tag2,l.origin,l.assigned_by,l.assigned_to
    from nas_transaction_log l
    left outer join nas_alarms a on l.nimid = a.nimid
    WHERE l.time >= DATEADD(hh, -1, GETDATE())
    and l.message like '%threshold%'
     order by l.time, l.suppcount

    ---

    This will show you the exact detail of incoming messages.

    Adapt the like statement to something unique in your message and check:

    - severity: do you see a clear between the messages, perhaps somebody did an acknowledge?

    - suppcount: like you only create a ticket when supcount = 2 (you didn't post the definition, so I assume it's "equal to")