DX Unified Infrastructure Management

  • 1.  NAS - Scheduler not working sometimes

    Posted Oct 10, 2017 08:21 AM

    CA Folks,

     

    We are facing a little issue with the NAS probe from CAUIM. Sometimes the Scheduler did not work very well.

     

    For example, we have the following rules:

     

    rule 1:

     

    rule 2:

     

    Both rules are added in the following scheduler:

     

    Concluding, we cannot have alarms from itens with hostname starting with FRQ or IP 10.106 from 9:00 PM to 7:00 AM

     

    However, we got an alarm last night.

     

     

    HOSTNAME: FRQ_9119

    IP: 10.106.101.110

    alarm time origin: 02h15m

    alarm id: JJ24247294-84254

    dev_id = DAF0FD27DAFD4814434CE932952BAA636

    probe: cisco_monitor

     

    And follow the logs from NAS at level 5:

     

     

    2h09m - First alarm, excluded by rule

    Oct 10 02:09:05:614 [18564] nas: Device_Approver APPROVED:  dev_id: 'DAF0FD27DAFD4814434CE932952BAA636' from '/Logicalis-Infrastructure-Management/HS1B-Dia/br5oimsnmdia001/cisco_monitor'

    Oct 10 02:09:05:614 [18564] nas: maint: entering inMaintenanceMode function

    Oct 10 02:09:05:614 [18564] nas: maint: Entered getMaintenanceMode function

    Oct 10 02:09:05:614 [18564] nas: maint: getMaintModeChecker passed

    Oct 10 02:09:05:614 [18564] nas: maint: validateMaintenanceIntervalsIncludeTime passed

    Oct 10 02:09:05:614 [18564] nas: maint: dev_id 'DAF0FD27DAFD4814434CE932952BAA636' from 'subscriber' '10.106.101.110' is NOT in maintenance.

    Oct 10 02:09:05:614 [18564] nas: EXCLUDED BY RULE 'RITM0432408 - Exclude de alarmes Franquias, 21h - 7h _ 2' - msg:The SNMP Agent at '10.106.101.110' in group 'Franquias' is not responding. [FRQ_9119],src:10.106.101.110,sev:5

     

    2h15 – second event, was not excluded by rule generating an alarm

    Oct 10 02:15:03:647 [43192] nas: dbsRun committed 1 requests. 0 remaining in queue...

    Oct 10 02:15:05:062 [82536] nas: Scheduler rescheduled profile:'RITM0362702 - Citrix Lefosse Reboot EventID', next run Tue Oct 10 02:15, 2017

    Oct 10 02:15:05:062 [82536] nas: Scheduler rescheduled profile:'RITM0358210 - Exclude Backup Spo49 e brsmcpr54', next run Tue Oct 10 02:15, 2017

    Oct 10 02:15:05:062 [82536] nas: Scheduler rescheduled profile:'RITM0347960 - Janela de backup, SRVMBX01', next run Tue Oct 10 02:15, 2017

    Oct 10 02:15:05:062 [82536] nas: Scheduler rescheduled profile:'RITM0182412 - Horario de Backup - BR1SAP', next run Tue Oct 10 02:15, 2017

    Oct 10 02:15:05:062 [82536] nas: Scheduler rescheduled profile:'RITM0432408 - Exclude de alarmes Franquias', next run Tue Oct 10 02:15, 2017

    Oct 10 02:15:05:154 [14916] nas: SqliteExecuteCallback: sqlite3_finalize returned:0

    Oct 10 02:15:05:154 [14916] nas: SqliteExecuteCallback: sqlite3_finalize returned:0

    Oct 10 02:15:05:155 [102280] nas: dbBeginTransaction actLogRun, OK - rc:0

    Oct 10 02:15:05:199 [102280] nas: dbCommitTransaction actLogRun, OK - rc:0

    Oct 10 02:15:05:284 [18564] nas: RREQUEST: hubpost <-10.55.249.10/48002  h=258 d=846

    Oct 10 02:15:05:284 [18564] nas: Device_Approver APPROVED:  dev_id: 'DAF0FD27DAFD4814434CE932952BAA636' from '/Logicalis-Infrastructure-Management/HS1B-Dia/br5oimsnmdia001/cisco_monitor'

    Oct 10 02:15:05:284 [18564] nas: maint: entering inMaintenanceMode function

    Oct 10 02:15:05:284 [18564] nas: maint: Entered getMaintenanceMode function

    Oct 10 02:15:05:284 [18564] nas: maint: getMaintModeChecker passed

    Oct 10 02:15:05:284 [18564] nas: maint: validateMaintenanceIntervalsIncludeTime passed

    Oct 10 02:15:05:284 [18564] nas: maint: dev_id 'DAF0FD27DAFD4814434CE932952BAA636' from 'subscriber' '10.106.101.110' is NOT in maintenance.

    Oct 10 02:15:05:284 [18564] nas: dbBeginTransaction subscriber, OK - rc:0

    Oct 10 02:15:05:284 [18564] nas: SqliteExecuteCallback: sqlite3_finalize returned:0

    Oct 10 02:15:05:286 [18564] nas: SREPLY: status = 0(OK) ->10.55.249.10/48002

    Oct 10 02:15:05:291 [18564] nas: dbCommitTransaction subscriber, OK - rc:0

    Oct 10 02:15:05:291 [18564] nas: pubCommitMonitor:  subscr_waiting:  'flushUncommitedAlarms'

    Oct 10 02:15:05:291 [18564] nas: pubCommitMonitor:  subscr_released: 'flushUncommitedAlarms'

    Oct 10 02:15:05:291 [18564] nas: RREQUEST: hubpost <-10.55.249.10/48002  h=258 d=948

     

    2h21 – excluded by rule

    Oct 10 02:21:05:498 [18564] nas: Device_Approver APPROVED:  dev_id: 'DAF0FD27DAFD4814434CE932952BAA636' from '/Logicalis-Infrastructure-Management/HS1B-Dia/br5oimsnmdia001/cisco_monitor'

    Oct 10 02:21:05:498 [18564] nas: maint: entering inMaintenanceMode function

    Oct 10 02:21:05:498 [18564] nas: maint: Entered getMaintenanceMode function

    Oct 10 02:21:05:498 [18564] nas: maint: getMaintModeChecker passed

    Oct 10 02:21:05:498 [18564] nas: maint: validateMaintenanceIntervalsIncludeTime passed

    Oct 10 02:21:05:498 [18564] nas: maint: dev_id 'DAF0FD27DAFD4814434CE932952BAA636' from 'subscriber' '10.106.101.110' is NOT in maintenance.

    Oct 10 02:21:05:499 [18564] nas: EXCLUDED BY RULE 'RITM0432408 - Exclude de alarmes Franquias, 21h - 7h _ 2' - msg:The SNMP Agent at '10.106.101.110' in group 'Franquias' is not responding. [FRQ_9119],src:10.106.101.110,sev:5

    Oct 10 02:21:05:499 [18564] nas: SREPLY: status = 0(OK) ->10.55.249.10/48002

     

    2h27 – excluded by rule

    Oct 10 02:27:05:568 [18564] nas: Device_Approver APPROVED:  dev_id: 'DAF0FD27DAFD4814434CE932952BAA636' from '/Logicalis-Infrastructure-Management/HS1B-Dia/br5oimsnmdia001/cisco_monitor'

    Oct 10 02:27:05:568 [18564] nas: maint: entering inMaintenanceMode function

    Oct 10 02:27:05:568 [18564] nas: maint: Entered getMaintenanceMode function

    Oct 10 02:27:05:568 [18564] nas: maint: getMaintModeChecker passed

    Oct 10 02:27:05:568 [18564] nas: maint: validateMaintenanceIntervalsIncludeTime passed

    Oct 10 02:27:05:568 [18564] nas: maint: dev_id 'DAF0FD27DAFD4814434CE932952BAA636' from 'subscriber' '10.106.101.110' is NOT in maintenance.

    Oct 10 02:27:05:568 [18564] nas: EXCLUDED BY RULE 'RITM0432408 - Exclude de alarmes Franquias, 21h - 7h _ 2' - msg:The SNMP Agent at '10.106.101.110' in group 'Franquias' is not responding. [FRQ_9119],src:10.106.101.110,sev:5

    Oct 10 02:27:05:568 [18564] nas: SREPLY: status = 0(OK) ->10.55.249.10/48002

    Oct 10 02:27:05:580 [18564] nas: RREQUEST: hubpost <-10.55.249.10/48002  h=258 d=846

     

    2h33 – excluded by rule

    Oct 10 02:33:05:432 [18564] nas: Device_Approver APPROVED:  dev_id: 'DAF0FD27DAFD4814434CE932952BAA636' from '/Logicalis-Infrastructure-Management/HS1B-Dia/br5oimsnmdia001/cisco_monitor'

    Oct 10 02:33:05:432 [18564] nas: maint: entering inMaintenanceMode function

    Oct 10 02:33:05:432 [18564] nas: maint: Entered getMaintenanceMode function

    Oct 10 02:33:05:432 [18564] nas: maint: getMaintModeChecker passed

    Oct 10 02:33:05:432 [18564] nas: maint: validateMaintenanceIntervalsIncludeTime passed

    Oct 10 02:33:05:432 [18564] nas: maint: dev_id 'DAF0FD27DAFD4814434CE932952BAA636' from 'subscriber' '10.106.101.110' is NOT in maintenance.

    Oct 10 02:33:05:432 [18564] nas: EXCLUDED BY RULE 'RITM0432408 - Exclude de alarmes Franquias, 21h - 7h _ 2' - msg:The SNMP Agent at '10.106.101.110' in group 'Franquias' is not responding. [FRQ_9119],src:10.106.101.110,sev:5

     

    2h39 – last alarm

    Oct 10 02:39:07:109 [18564] nas: Device_Approver APPROVED:  dev_id: 'DAF0FD27DAFD4814434CE932952BAA636' from '/Logicalis-Infrastructure-Management/HS1B-Dia/br5oimsnmdia001/cisco_monitor'

    Oct 10 02:39:07:109 [18564] nas: maint: entering inMaintenanceMode function

    Oct 10 02:39:07:109 [18564] nas: maint: Entered getMaintenanceMode function

    Oct 10 02:39:07:109 [18564] nas: maint: getMaintModeChecker passed

    Oct 10 02:39:07:109 [18564] nas: maint: validateMaintenanceIntervalsIncludeTime passed

    Oct 10 02:39:07:109 [18564] nas: maint: dev_id 'DAF0FD27DAFD4814434CE932952BAA636' from 'subscriber' '10.106.101.110' is NOT in maintenance.

    Oct 10 02:39:07:109 [18564] nas: EXCLUDED BY RULE 'RITM0432408 - Exclude de alarmes Franquias, 21h - 7h _ 2' - msg:The SNMP Agent at '10.106.101.110' in group 'Franquias' is not responding. [FRQ_9119],src:10.106.101.110,sev:5

    Oct 10 02:39:07:109 [18564] nas: SREPLY: status = 0(OK) ->10.55.249.10/48002

    Oct 10 02:39:08:339 [111976] nas: ptNetIpToHost - getaddrinfo failed for HCSAP4AND-11

     

    Has anyone seen this scenario before?

     

    Regards



  • 2.  Re: NAS - Scheduler not working sometimes
    Best Answer

    Broadcom Employee
    Posted Oct 10, 2017 02:06 PM

    So  I think your setup is wrong.

    I would suggest you put the hour to 21, minutes to 00 and the duration to 10 hours

    The way you have it setup your are enabling and disabling every 5 minutes

    I would expect some alarms to get missed during this operation.



  • 3.  Re: NAS - Scheduler not working sometimes

    Posted Oct 11, 2017 07:40 AM

    Gene,

     

    Thanks for your feedback, I think the same. 

    Now, we will use the operating period for schedule recurrence rules.