DX Unified Infrastructure Management

Expand all | Collapse all

NAS - Duplicate Tickets for same NimID

  • 1.  NAS - Duplicate Tickets for same NimID

    Posted Dec 28, 2018 08:53 AM

    Hi 

     

    Currently we have integration with ServiceNow through sdgtw and what we observed that there are duplicate tickets are getting created for same nimid.

     

    Currently in CDM probe disk Interval is set to 5 mins and collecting 3 samples and average of 3 samples is above the thresholds then it should generate the alarm. But we are observing that its not working as required as few alarms getting in seconds.

     

    Example -

    2018-12-07 02:46:14 Message: Disk free on /orafiles/ is 10% , which is below critical threshold (10%) AlarmID: BO55079028-26297 
    2018-12-07 02:46:25 Message: Disk free on /orafiles/ is 10% , which is below critical threshold (10%) AlarmID: BO55079028-26297 
    2018-12-07 02:46:35 Message: Disk free on /orafiles/ is 10% , which is below critical threshold (10%) AlarmID: BO55079028-26297 
    2018-12-07 02:46:47 Message: Disk free on /orafiles/ is 10% , which is below critical threshold (10%) AlarmID: BO55079028-26297 

     

    Similar issue, I can see with different subsystem id alarms and duplicate tickets for the same too.

     

    UIM 8.51

    nas 9.0HF4

    sdgtw 2.0

     

    I have tried to change the AO profile overdue timings and verify but no luck too.

     

    Regards - Ripple



  • 2.  Re: NAS - Duplicate Tickets for same NimID

    Broadcom Employee
    Posted Dec 28, 2018 09:40 AM

    What version of CDM probe?

    Do you have the cdm probe set to send alarm on each sample?

    this might explain what you are seeing.

     

    Usually when using the sdgtw you would have an AO profile that is doing an assign for the username that triggers the ticket creation.

    you can set the ao for on overdue 1 minute and this might help as well.

     

    This could be complicated to track down exactly where the problem is so you might want to open a support case.



  • 3.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 01:17 AM

    Hi

     

    What version of CDM probe?  - CDM 6.30

    Do you have the cdm probe set to send alarm on each sample? No

     

    Already chasing with support but til now no luck.

     

    Regards - Ripple



  • 4.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 05:31 AM

    - do you have AO profiles on these messages? (because each AO rule can duplicate the alarm)

    - do you have multiple NAS probes ? (because routing alarms between several nas probes can create a ping-pong effect)



  • 5.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 05:37 AM

    We have AO profile rules but we are filtering it with the message or probe based only. So no two rules are same.



  • 6.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 05:53 AM

    Perhaps you can try, with a query like:

    select * from NAS_TRANSACTION_LOG where message like '%orafiles%' order by time desc

    to determine why/how/where the duplicate is coming from?



  • 7.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 06:13 AM
    nimidlevelseveritymessagesubsyssidassigned_byassigned_to
    PG02427742-338444majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2auto-operatorDatabase
    PG02427742-338444majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2auto-operatorDatabase
    PG02427742-338444majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2auto-operatorDatabase
    PG02427742-338444majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2auto-operatorDatabase
    PG02427742-338444majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2auto-operatorDatabase
    PG02427742-338444majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2NULLNULL


  • 8.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 06:16 AM

    Can you send me the full output with all columns (in csv) to luc.christiaens10@telenet.be (or attach it here)



  • 9.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 06:50 AM

    Sorry I am not able to attach the same not sure any other issue.

     

    timetypenimidnimtscm_idlevelseveritymessagesubsyssidpriduser_tag1user_tag2suppcountassigned_byassigned_toacknowledged_bytz_offsetvisiblei18n_tokeni18n_dsizei18n_data
    59:58.02PG02427742-3384459:30.0NULL4majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2oracleORACLE DatabaseUNIX0auto-operatorDatabaseNULL-198001as#database.oracle.dataguard_status_error336U1RBVFVTADcANgBFUlJPUgBpbnN0YW5jZQA3ADQAUFJEAHByb2ZpbGUANwAxMABEVFZTUFJEREIAREJfVU5JUVVFX05BTUUANwA2AFBSRERSAEVSUk9SADcANzEAT1JBLTE2MTk4OiBUaW1lb3V0IGluY3VycmVkIG9uIGludGVybmFsIGNoYW5uZWwgZHVyaW5nIHJlbW90ZSBhcmNoaXZhbABjaGVjawA3ADE3AGRhdGFndWFyZF9zdGF0dXMARkFJTF9EQVRFADcAMjAAMjAxOC8xMi8zMSAxNDs1NDsyOQBTVEFUVVNfTlVNADE2ADkAMC4wMDAwMDAA
    59:58.02PG02427742-3384459:30.0NULL4majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2oracleORACLE DatabaseUNIX0auto-operatorDatabaseNULL-198001as#database.oracle.dataguard_status_error336U1RBVFVTADcANgBFUlJPUgBpbnN0YW5jZQA3ADQAUFJEAHByb2ZpbGUANwAxMABEVFZTUFJEREIAREJfVU5JUVVFX05BTUUANwA2AFBSRERSAEVSUk9SADcANzEAT1JBLTE2MTk4OiBUaW1lb3V0IGluY3VycmVkIG9uIGludGVybmFsIGNoYW5uZWwgZHVyaW5nIHJlbW90ZSBhcmNoaXZhbABjaGVjawA3ADE3AGRhdGFndWFyZF9zdGF0dXMARkFJTF9EQVRFADcAMjAAMjAxOC8xMi8zMSAxNDs1NDsyOQBTVEFUVVNfTlVNADE2ADkAMC4wMDAwMDAA
    59:58.02PG02427742-3384459:30.0NULL4majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2oracleORACLE DatabaseUNIX0auto-operatorDatabaseNULL-198001as#database.oracle.dataguard_status_error336U1RBVFVTADcANgBFUlJPUgBpbnN0YW5jZQA3ADQAUFJEAHByb2ZpbGUANwAxMABEVFZTUFJEREIAREJfVU5JUVVFX05BTUUANwA2AFBSRERSAEVSUk9SADcANzEAT1JBLTE2MTk4OiBUaW1lb3V0IGluY3VycmVkIG9uIGludGVybmFsIGNoYW5uZWwgZHVyaW5nIHJlbW90ZSBhcmNoaXZhbABjaGVjawA3ADE3AGRhdGFndWFyZF9zdGF0dXMARkFJTF9EQVRFADcAMjAAMjAxOC8xMi8zMSAxNDs1NDsyOQBTVEFUVVNfTlVNADE2ADkAMC4wMDAwMDAA
    59:48.02PG02427742-3384459:30.0NULL4majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2oracleORACLE DatabaseUNIX0auto-operatorDatabaseNULL-198001as#database.oracle.dataguard_status_error336U1RBVFVTADcANgBFUlJPUgBpbnN0YW5jZQA3ADQAUFJEAHByb2ZpbGUANwAxMABEVFZTUFJEREIAREJfVU5JUVVFX05BTUUANwA2AFBSRERSAEVSUk9SADcANzEAT1JBLTE2MTk4OiBUaW1lb3V0IGluY3VycmVkIG9uIGludGVybmFsIGNoYW5uZWwgZHVyaW5nIHJlbW90ZSBhcmNoaXZhbABjaGVjawA3ADE3AGRhdGFndWFyZF9zdGF0dXMARkFJTF9EQVRFADcAMjAAMjAxOC8xMi8zMSAxNDs1NDsyOQBTVEFUVVNfTlVNADE2ADkAMC4wMDAwMDAA
    59:39.08PG02427742-3384459:30.0NULL4majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2oracleORACLE DatabaseUNIX0auto-operatorDatabaseNULL-198001as#database.oracle.dataguard_status_error336U1RBVFVTADcANgBFUlJPUgBpbnN0YW5jZQA3ADQAUFJEAHByb2ZpbGUANwAxMABEVFZTUFJEREIAREJfVU5JUVVFX05BTUUANwA2AFBSRERSAEVSUk9SADcANzEAT1JBLTE2MTk4OiBUaW1lb3V0IGluY3VycmVkIG9uIGludGVybmFsIGNoYW5uZWwgZHVyaW5nIHJlbW90ZSBhcmNoaXZhbABjaGVjawA3ADE3AGRhdGFndWFyZF9zdGF0dXMARkFJTF9EQVRFADcAMjAAMjAxOC8xMi8zMSAxNDs1NDsyOQBTVEFUVVNfTlVNADE2ADkAMC4wMDAwMDAA
    59:30.01PG02427742-3384459:30.0NULL4majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-16198: Timeout incurred on internal channel during remote archival).Oracle1.1.13.2oracleORACLE DatabaseUNIX0NULLNULLNULL-198001as#database.oracle.dataguard_status_error336U1RBVFVTADcANgBFUlJPUgBpbnN0YW5jZQA3ADQAUFJEAHByb2ZpbGUANwAxMABEVFZTUFJEREIAREJfVU5JUVVFX05BTUUANwA2AFBSRERSAEVSUk9SADcANzEAT1JBLTE2MTk4OiBUaW1lb3V0IGluY3VycmVkIG9uIGludGVybmFsIGNoYW5uZWwgZHVyaW5nIHJlbW90ZSBhcmNoaXZhbABjaGVjawA3ADE3AGRhdGFndWFyZF9zdGF0dXMARkFJTF9EQVRFADcAMjAAMjAxOC8xMi8zMSAxNDs1NDsyOQBTVEFUVVNfTlVNADE2ADkAMC4wMDAwMDAA
    53:47.02PG02427742-3161447:29.0NULL4majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-03113: end-of-file on communication channel).Oracle1.1.13.2oracleORACLE DatabaseUNIX0auto-operatorDatabaseNULL-198001as#database.oracle.dataguard_status_error308U1RBVFVTADcANgBFUlJPUgBpbnN0YW5jZQA3ADQAUFJEAHByb2ZpbGUANwAxMABEVFZTUFJEREIAREJfVU5JUVVFX05BTUUANwA2AFBSRERSAEVSUk9SADcANDgAT1JBLTAzMTEzOiBlbmQtb2YtZmlsZSBvbiBjb21tdW5pY2F0aW9uIGNoYW5uZWwAY2hlY2sANwAxNwBkYXRhZ3VhcmRfc3RhdHVzAEZBSUxfREFURQA3ADIwADIwMTgvMTIvMzEgMTQ7NDI7MjEAU1RBVFVTX05VTQAxNgA5ADAuMDAwMDAwAA==
    53:25.08PG02427742-3161447:29.0NULL4majorProfile DB, instance PRD-PRDDR has status ERROR. (ORA-03113: end-of-file on communication channel).Oracle1.1.13.2oracleORACLE DatabaseUNIX0auto-operatorDatabaseNULL-198001as#database.oracle.dataguard_status_error308U1RBVFVTADcANgBFUlJPUgBpbnN0YW5jZQA3ADQAUFJEAHByb2ZpbGUANwAxMABEVFZTUFJEREIAREJfVU5JUVVFX05BTUUANwA2AFBSRERSAEVSUk9SADcANDgAT1JBLTAzMTEzOiBlbmQtb2YtZmlsZSBvbiBjb21tdW5pY2F0aW9uIGNoYW5uZWwAY2hlY2sANwAxNwBkYXRhZ3VhcmRfc3RhdHVzAEZBSUxfREFURQA3ADIwADIwMTgvMTIvMzEgMTQ7NDI7MjEAU1RBVFVTX05VTQAxNgA5ADAuMDAwMDAwAA==


  • 10.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 07:32 AM

    Do you have integrations, like Spectrum, that can manipulate alarms?

    Do you have another instance of the nas running?

    Else i would suspect an AO rule. (can you send me/attach the AO rule that assigns alarms to Database)



  • 11.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 08:13 AM

    Do you have integrations, like Spectrum, that can manipulate alarms?   No

    Do you have another instance of the nas running?   No



  • 12.  Re: NAS - Duplicate Tickets for same NimID

    Posted Dec 31, 2018 09:44 AM

    I can see that within friction of seconds second alarm is getting generated for same conditions and as first alarm did not got acknowledgement from the servicenow, as per rule it passed second one also and due to this we got two tickets.

     

    But not able to understand why we can see the same alarm in friction of seconds? as polling interval is every 5 mins.



  • 13.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 01, 2019 04:39 AM

    Please attach (or mail) your AO rule (copy it in txt format from nas.cfg)



  • 14.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 02, 2019 05:31 AM

    <auto_operator>
       <setup>
          interval = 5min
          active = yes
          ignore_import = no
       </setup>
       <definitions>
          <EMAIL critical (repost)>
             active = no
             action = repost EMAIL
             overdue = 5m
             level = critical
             order = 1
          </EMAIL critical (repost)>
          <Automatic cleanup of low-severity messages (3 days)>
             active = yes
             action = close
             overdue = 3d
             level = information
             visible = 0
             order = 2
             break = no
          </Automatic cleanup of low-severity messages (3 days)>
          <Test_duplicPQRation_Dev>
             active = yes
             action = assign ABC
             overdue = on_arrival
             level = information,warning,minor,major,critical
             source = 10.10.96.129|10.11.17.132
             visible = 0
             order = 3
             break = no
          </Test_duplicPQRation_Dev>
          <Test_duplicPQRation_Dev2>
             active = yes
             action = assign ABC
             overdue = on_arrival
             level = warning,minor
             source = 10.11.63.135
             counter = eq 1
             visible = 0
             order = 3
             break = no
          </Test_duplicPQRation_Dev2>
          <ABC_FS_Email>
             active = yes
             action = EMAIL
             overdue = 2m
             level = warning,minor,major,critical
             subsystems = Disk
             counter = eq 1
             probe = cdm
             visible = 0
             order = 4
             origin = ABC-UIM
             break = no
          </ABC_FS_Email>
          <AutoAssigXYZUserSNOW>
             active = yes
             action = assign Database-Team/database
             overdue = on_arrival
             level = minor,major,critical
             counter = eq 1
             probe = db2|mysql|oracle
             visible = 0
             order = 5
             origin = ABC-UIM|XYZ-UIM|PQR-UIM
             break = no
          </AutoAssigXYZUserSNOW>
          <AutoAssigXYZFilesystems>
             active = yes
             action = assign Database-Team/database
             overdue = on_arrival
             message = /.*mysql|treasury|oracle|orafiles.*/
             level = minor,major,critical
             subsystems = Disk
             source = 10.11.64.225|10.10.52.225|10.10.96.129
             counter = eq 1
             probe = cdm
             visible = 0
             invert = 2
             order = 6
             origin = ABC-UIM|XYZ-UIM|PQR-UIM
             break = no
          </AutoAssigXYZFilesystems>
          <Email alarms for DBA group>
             active = yes
             action = EMAIL DBAGroup
             overdue = 1m
             level = information,warning
             counter = eq 1
             probe = db2|mysql|oracle
             visible = 0
             category = Email
             order = 7
             origin = ABC-UIM|XYZ-UIM|PQR-UIM
             break = no
          </Email alarms for DBA group>
          <AutoAssignWindowsUserSNOW>
             active = yes
             action = assign Windows-Team/windows_team
             overdue = on_arrival
             level = minor,major,critical
             counter = eq 1
             visible = 0
             order = 8
             origin = ABC-UIM|XYZ-UIM|PQR-UIM
             break = no
             user_tag2 = WINDOWS
          </AutoAssignWindowsUserSNOW>
          <AutoAssignUNIXUserSNOW>
             active = yes
             action = assign Unix-Team/unix
             overdue = on_arrival
             message = /.*mysql|treasury|oracle|orafiles.*/
             level = minor,major,critical
             counter = eq 1
             probe = cdm|processes
             visible = 0
             invert = 16
             order = 9
             origin = ABC-UIM|XYZ-UIM|PQR-UIM
             break = no
             user_tag2 = UNIX
          </AutoAssignUNIXUserSNOW>
          <Email alarms for Windows group for XYZ Client>
             active = yes
             action = EMAIL WindowsGroup
             overdue = on_arrival
             level = warning,minor,major,critical
             counter = eq 1
             probe = cdm
             visible = 0
             order = 10
             origin = XYZ-UIM
             break = no
          </Email alarms for Windows group for XYZ Client>
          <Email alarms for Unix group [cdm]>
             active = no
             action = EMAIL UnixGroup
             overdue = on_arrival
             level = warning
             probe = cdm
             visible = 0
             category = Email
             order = 11
             break = no
             user_tag2 = *UNIX*
          </Email alarms for Unix group [cdm]>
          <Test for ticket creation>
             active = yes
             action = assign ABC
             overdue = 3m
             message = */orafiles/QOS*
             level = information
             robot = QOS
             visible = 0
             order = 13
             break = no
          </Test for ticket creation>
       </definitions>
       <triggers>
          <example.network>
             active = no
             level = major,critical
             subsystems = Network
             category = example
          </example.network>
          <example.system>
             active = no
             level = major,critical
             subsystems = Disk,CPU,Memory,Filesystem,Process,NT-Services,Security,Application,System
             category = example
          </example.system>
          <example.SLM>
             active = no
             sid = 1.3|1.3.*
             category = example
          </example.SLM>
       </triggers>
    </auto_operator>



  • 15.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 02, 2019 05:45 AM

    - Is it possible to attach or mail the output of the previous sql query because I'm missing some important columns, like source, host.. (so use the proposed * for the select statement)

    - Is it your goal that an alarm can match multiple AO rules or that he has to stop after the first match?



  • 16.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 02, 2019 06:26 AM

    Table of contents



  • 17.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 02, 2019 06:38 AM

    This is with the same limited sql statement as your previous post; please use something like: (where the * is important to have all fields)

    select * from NAS_TRANSACTION_LOG where message like '%orafiles%' order by time desc



  • 18.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 02, 2019 06:39 AM

     Is it your goal that an alarm can match multiple AO rules or that he has to stop after the first match?

     

    Right after first match should stop.

    But what we noticed that in few seconds only second alarm is getting generated for same condition and as workflow is not completed its sending again.



  • 19.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 02, 2019 06:44 AM

    The problem with AO rules can be:

    - by default an alarm trie sto match every rule, except if you code: break=yes

    - if you have an AO action like assign, UIM will REgenerate the message, and depending your rules..match again.

    (you can perhaps add in your assign rule a NOT clause for assigned to database?)



  • 20.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 02, 2019 08:41 AM

    Currently under AO profile - Message counter is set to 1 then in that case second alarm should not be forwarded to sdgtw but in our case still we see rule is bypassed.



  • 21.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 03, 2019 03:06 PM

    Hi, I think the fact that your AO profiles are configured as "on_arrival" negates the counter = 1 criteria. When an alarm comes into the nas (on_arrival), the counter is always 1. The on_arrival mode precedes the alarm suppression process.

    Also, I can see in the AO profiles that they are not mutually exclusive. Just a quick scan shows that there are at least two that will execute an assignment for the same alarm - one assigns to the database group and the other to the Unix group. Since the nas is single threaded, the profiles are evaluated sequentially and one will fire, putting an alarm_assign message into the sdgtw queue and then the other will fire putting a subsequent alarm_assign message in the queue for the same alarm.



  • 22.  Re: NAS - Duplicate Tickets for same NimID

    Posted Jan 03, 2019 11:41 PM

    AO profiles are configured as "on_arrival" negates the counter = 1 criteria --- yeah but as we were getting duplicate alarms, I put this to see if I can restrict the second count of same alarm.

     

    there are at least two that will execute an assignment for the same alarm - one assigns to the database group and the other to the Unix group. ---- Yeah as per requirement we are sending DB related alarms to database team and the OS related alarms to unix team and it was further filtered based on the User Tag, Message etc.. so that it will not execute the next profile.



  • 23.  Re: NAS - Duplicate Tickets for same NimID

    Broadcom Employee
    Posted Jan 02, 2019 06:40 AM

    Hello, I would run Dr Nimbus (message sniffer on the hub) on the subject (is it just alarm_assign or more ?) that your tickets queue is subscribing to, so that you can see if your AO profiles are causing the problem or the sdgtw probe is the problem.

    If you are getting duplicate messages, then sdgtw is just doing its job and you need to check your ao profiles.

    HTH