DX Unified Infrastructure Management

  • 1.  Logmon - Sum of errors found in more than one file logs

    Posted Sep 28, 2018 11:03 AM

    Hello.
    I need to configure logmon (v3.92) alarms on UIM.

    More than one log files "ars.*.yyyymmdd.*.log" are generated daily
    ars.38302.SR-WORKAD-AP01.20180927.006.log
    ars.38301.SR-WORKAD-AP01.20180928.001.log
    ars.38301.SR-WORKAD-AP01.20180928.002.log
    ars.38301.SR-WORKAD-AP01.20180928.003.log
    ars.38301.SR-WORKAD-AP01.20180928.004.log
    ars.38301.SR-WORKAD-AP01.20180928.005.log


    In those files must search for 2 types of errors
    "SERVER ERROR"
    "INTERNAL ERROR | STOP Routing"

    The search is done correctly, using time formatting primitives and regular expressions, according to what was observed in the Logmon log.

    I have defined 2 "watcher rules", activating "count matches", and associating QoS in each one.

    Currently the metrics that are generated correspond to the errors found in ONLY ONE log file.
    What is required is that the metrics generated are THE SUM OF ERRORS found in ALL scanned log files.

    We also need to generate alarms whose messages indicate the number of errors found.

    I have already looked at several documents and examples, and I have not found a way to implement what we need.

     

    Will this be possible?

     

    =====================================================================================

     

    Hello.

    I have read several documents found in the CA community, and I took the examples I found, but they have not given me the expected results so far.

    I was trying alternative options, using the "command" mode in the logmon profile.

    I did a script in bash, that searches in each log for the string "SERVER ERROR", calculates the amount found, and then adds those amounts to have the general total. That total exposes it with the "echo" command.
    The script is in a local folder to the server where logmon is installed (/ home / user_nn).

     

    Execution by command line:
    [user_nn @ SR-WORKAD-AP01 ~] $ ./check_ars_1.sh
    #ErrServer 11938

     

    The configuration raised in logmon is this:

    Profile = wfx_command_test
    Mode = command
    File = / home / user_nn / check_ars_1.sh
    Generate Quality of Service


    Watcher Rules = ErrSrv
    Standard Tag - Match Expression = "# ErrServer"
    Message to Send on Match = "TRY $ {ErrSrv}"


    Label Variables define variable ErrSrv
    Source Line = 1
    Source FROM Position = Column 2
    Source TO Position = Ignore 'To'

    QoS tag
    Name = ServerError
    Description = "Critical errors quantity"
    Target = sr-workad-ap01

     

    Logmon log:

    Oct 1 10:47:27:502 [140152144738080] logmon: ****************[ Re-starting ]****************
    Oct 1 10:47:28:569 [140152144738080] logmon: Adding to gQoSdefs: default_qos
    Oct 1 10:47:28:569 [140152144738080] logmon: Adding to gQoSdefs: ServerError
    Oct 1 10:48:01:876 [140152144738080] logmon: lgm: format start
    Oct 1 10:48:01:876 [140152144738080] logmon: lgm: FORMAT END START
    Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] In WithI18n section [#str_log="SERVER ERROR"],[ERCP_],[ISO-8859-1],[-1]
    Oct 1 10:48:01:876 [140152144738080] logmon: lgm: Read File
    Oct 1 10:48:01:876 [140152144738080] logmon: lgm: read the line: [cant=$(grep -c "$str_log" ${path_log}/$file_log | awk 'BEGIN {FS = "log:"} ; {SUM += $2} END { print SUM }')]
    Oct 1 10:48:01:876 [140152144738080] logmon: lgm: check format start..[0]
    Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] In Withon [cant=$(grep -c "I18n secti$str_log" ${path_log}/$file_log | awk 'BEGIN {FS = "log:"} ; {SUM += $2} END { print SUM }')],[ERCP_],[ISO-8859-1],[-1]
    Oct 1 10:48:01:876 [140152144738080] logmon: lgm: read the line: []
    Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] In WithI18n section [],[ERCP_],[ISO-8859-1],[-1]
    Oct 1 10:48:01:876 [140152144738080] logmon: lgm: read the line: [echo "#ErrServer $cant"]
    Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] In WithI18n section [echo "#ErrServer $cant"],[ERCP_],[ISO-8859-1],[1]
    Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] MATCH [ErrSrv] on line 0
    Oct 1 10:48:01:876 [140152144738080] logmon: returning message: echo "#ErrServer $cant"
    Oct 1 10:48:01:885 [140152144738080] logmon: lgm: Read File
    Oct 1 10:48:01:886 [140152144738080] logmon: lgm: read returned null
    Oct 1 10:48:01:886 [140152144738080] logmon: lgm: p->format: 0 format: 0 mode 0x201
    Oct 1 10:48:01:886 [140152144738080] logmon: [BF9511DE] used 14 ms scanning 266 bytes
    Oct 1 10:48:04:886 [140152060811008] logmon: (mysystem) - executed (pid:24745): /home/user_nn/check_ars_1.sh

     

    Any help?

    Thanks in advance.



  • 2.  Re: Logmon - Sum of errors found in more than one file logs

    Posted Oct 01, 2018 11:48 AM

    You may want to try rephrasing your question as it seems self contradictory - or at least verify that the correct pieces of script and log were in the initial item.

     

    If you are running a command, then all your log file selection rules would be within that command, not within UIM/logmon. You didn't supply the script being run so one can't comment on that content.

     

    Your snippet of logmon log shows logmon reading what looks like lines of code:

     

    cant=$(grep -c "$str_log" ${path_log}/$file_log | awk 'BEGIN {FS = "log:"} ; {SUM += $2} END { print SUM }')

     

    That looks like you have the logmon probe set to something other than command mode and have specified the script file as the file to read.

     

    -Garin



  • 3.  Re: Logmon - Sum of errors found in more than one file logs

    Posted Oct 01, 2018 01:52 PM

    Hi Garin, thanks for your feedback.

     

    Those are the details of my configurations, and new log messages.

     

    Logmon Profile

     

    Watcher Rules

     

    Watcher Rules - Variables

     

    Watcher Rules - QoS

     

     

     

    Script:

    [user_nn@SR-WORKAD-AP01 ~]$ cat check_ars_1.sh
    #!/bin/bash

    path_log="/srv/wss/output/wfx/7.0.6/cabv1/logs/ars"
    file_log="ars.*.20180929.*.log"
    str_log="SERVER ERROR"
    cant=$(grep -c "$str_log" ${path_log}/$file_log | awk 'BEGIN {FS = "log:"} ; {SUM += $2} END { print SUM }')

    echo "#ErrServer $cant"
    [user_nn@SR-WORKAD-AP01 ~]$


    Log Every 2 minutes:

    Oct 1 14:13:57:107 [139902164789024] logmon: ****************[ Re-starting ]****************
    Oct 1 14:13:58:158 [139902164789024] logmon: Adding to gQoSdefs: default_qos
    Oct 1 14:13:58:158 [139902164789024] logmon: Adding to gQoSdefs: ServerError
    Oct 1 14:14:01:213 [139902164789024] logmon: Regex from external file not used for watcher
    Oct 1 14:14:02:223 [139901958518528] logmon: [wfx_command_test] start scanning '/home/user_nn/check_ars_1.sh'
    Oct 1 14:14:02:223 [139901958518528] logmon: lgm: Read File
    Oct 1 14:14:02:281 [139901958518528] logmon: lgm: read the line: [#ErrServer 11938]
    Oct 1 14:14:02:281 [139901958518528] logmon: lgm: check format start..[0]
    Oct 1 14:14:02:281 [139901958518528] logmon: lgm: format start
    Oct 1 14:14:02:281 [139901958518528] logmon: lgm: FORMAT END START
    Oct 1 14:14:02:282 [139901958518528] logmon: [wfx_command_test] In WithI18n section [#ErrServer 11938],[ERCP_],[ISO-8859-1],[1]
    Oct 1 14:14:02:282 [139901958518528] logmon: [wfx_command_test] MATCH [ErrSrv] on line 0
    Oct 1 14:14:02:283 [139901958518528] logmon: lgm: Read File
    Oct 1 14:14:02:283 [139901958518528] logmon: lgm: read returned null
    Oct 1 14:14:02:283 [139901958518528] logmon: lgm: p->format: 0 format: 0 mode 0x220
    Oct 1 14:14:02:283 [139901958518528] logmon: [wfx_command_test] used 60 ms scanning 16 bytes
    Oct 1 14:15:58:356 [139901958518528] logmon: [wfx_command_test] start scanning '/home/user_nn/check_ars_1.sh'
    Oct 1 14:15:58:356 [139901958518528] logmon: lgm: Read File
    Oct 1 14:15:58:412 [139901958518528] logmon: lgm: read the line: [#ErrServer 11938]
    Oct 1 14:15:58:412 [139901958518528] logmon: lgm: check format start..[0]
    Oct 1 14:15:58:413 [139901958518528] logmon: lgm: format start
    Oct 1 14:15:58:413 [139901958518528] logmon: lgm: FORMAT END START
    Oct 1 14:15:58:413 [139901958518528] logmon: [wfx_command_test] In WithI18n section [#ErrServer 11938],[ERCP_],[ISO-8859-1],[1]
    Oct 1 14:15:58:413 [139901958518528] logmon: [wfx_command_test] MATCH [ErrSrv] on line 0
    Oct 1 14:15:58:414 [139901958518528] logmon: lgm: Read File
    Oct 1 14:15:58:414 [139901958518528] logmon: lgm: read returned null
    Oct 1 14:15:58:414 [139901958518528] logmon: lgm: p->format: 0 format: 0 mode 0x220
    Oct 1 14:15:58:414 [139901958518528] logmon: [wfx_command_test] used 58 ms scanning 16 bytes
    Oct 1 14:17:58:392 [139901958518528] logmon: [wfx_command_test] start scanning '/home/user_nn/check_ars_1.sh'
    Oct 1 14:17:58:393 [139901958518528] logmon: lgm: Read File
    Oct 1 14:17:58:462 [139901958518528] logmon: lgm: read the line: [#ErrServer 11938]
    Oct 1 14:17:58:462 [139901958518528] logmon: lgm: check format start..[0]
    Oct 1 14:17:58:462 [139901958518528] logmon: lgm: format start
    Oct 1 14:17:58:462 [139901958518528] logmon: lgm: FORMAT END START
    Oct 1 14:17:58:462 [139901958518528] logmon: [wfx_command_test] In WithI18n section [#ErrServer 11938],[ERCP_],[ISO-8859-1],[1]
    Oct 1 14:17:58:462 [139901958518528] logmon: [wfx_command_test] MATCH [ErrSrv] on line 0
    Oct 1 14:17:58:463 [139901958518528] logmon: lgm: Read File
    Oct 1 14:17:58:463 [139901958518528] logmon: lgm: read returned null
    Oct 1 14:17:58:463 [139901958518528] logmon: lgm: p->format: 0 format: 0 mode 0x220
    Oct 1 14:17:58:463 [139901958518528] logmon: [wfx_command_test] used 71 ms scanning 16 bytes

     

    Test Profile:

    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: check format start..[0]
    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: format start
    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: FORMAT END START
    Oct 1 14:20:25:561 [139902164789024] logmon: [BF9511DE] In WithI18n section [#str_log="SERVER ERROR"],[ERCP_],[ISO-8859-1],[-1]
    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: Read File
    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: read the line: [cant=$(grep -c "$str_log" ${path_log}/$file_log | awk 'BEGIN {FS = "log:"} ; {SUM += $2} END { print SUM }')]
    Oct 1 14:20:25:561 [139902164789024] logmon: [BF9511DE] In WithI18n section [cant=$(grep -c "$str_log" ${path_log}/$file_log | awk 'BEGIN {FS = "log:"} ; {SUM += $2} END { print SUM }')],[ERCP_],[ISO-8859-1],[-1]
    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: read the line: []
    Oct 1 14:20:25:561 [139902164789024] logmon: [BF9511DE] In WithI18n section [],[ERCP_],[ISO-8859-1],[-1]
    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: read the line: [echo "#ErrServer $cant"]
    Oct 1 14:20:25:561 [139902164789024] logmon: [BF9511DE] In WithI18n section [echo "#ErrServer $cant"],[ERCP_],[ISO-8859-1],[1]
    Oct 1 14:20:25:561 [139902164789024] logmon: [BF9511DE] MATCH [ErrSrv] on line 0
    Oct 1 14:20:25:561 [139902164789024] logmon: returning message: echo "#ErrServer $cant"
    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: read returned null
    Oct 1 14:20:25:561 [139902164789024] logmon: lgm: p->format: 0 format: 0 mode 0x201
    Oct 1 14:20:25:561 [139902164789024] logmon: [BF9511DE] used 1 ms scanning 266 bytes