AnsweredAssumed Answered

Logmon - Sum of errors found in more than one file logs

Question asked by jemilla on Sep 28, 2018
Latest reply on Oct 1, 2018 by jemilla

Hello.
I need to configure logmon (v3.92) alarms on UIM.

More than one log files "ars.*.yyyymmdd.*.log" are generated daily
ars.38302.SR-WORKAD-AP01.20180927.006.log
ars.38301.SR-WORKAD-AP01.20180928.001.log
ars.38301.SR-WORKAD-AP01.20180928.002.log
ars.38301.SR-WORKAD-AP01.20180928.003.log
ars.38301.SR-WORKAD-AP01.20180928.004.log
ars.38301.SR-WORKAD-AP01.20180928.005.log


In those files must search for 2 types of errors
"SERVER ERROR"
"INTERNAL ERROR | STOP Routing"

The search is done correctly, using time formatting primitives and regular expressions, according to what was observed in the Logmon log.

I have defined 2 "watcher rules", activating "count matches", and associating QoS in each one.

Currently the metrics that are generated correspond to the errors found in ONLY ONE log file.
What is required is that the metrics generated are THE SUM OF ERRORS found in ALL scanned log files.

We also need to generate alarms whose messages indicate the number of errors found.

I have already looked at several documents and examples, and I have not found a way to implement what we need.

 

Will this be possible?

 

=====================================================================================

 

Hello.

I have read several documents found in the CA community, and I took the examples I found, but they have not given me the expected results so far.

I was trying alternative options, using the "command" mode in the logmon profile.

I did a script in bash, that searches in each log for the string "SERVER ERROR", calculates the amount found, and then adds those amounts to have the general total. That total exposes it with the "echo" command.
The script is in a local folder to the server where logmon is installed (/ home / user_nn).

 

Execution by command line:
[user_nn @ SR-WORKAD-AP01 ~] $ ./check_ars_1.sh
#ErrServer 11938

 

The configuration raised in logmon is this:

Profile = wfx_command_test
Mode = command
File = / home / user_nn / check_ars_1.sh
Generate Quality of Service


Watcher Rules = ErrSrv
Standard Tag - Match Expression = "# ErrServer"
Message to Send on Match = "TRY $ {ErrSrv}"


Label Variables define variable ErrSrv
Source Line = 1
Source FROM Position = Column 2
Source TO Position = Ignore 'To'

QoS tag
Name = ServerError
Description = "Critical errors quantity"
Target = sr-workad-ap01

 

Logmon log:

Oct 1 10:47:27:502 [140152144738080] logmon: ****************[ Re-starting ]****************
Oct 1 10:47:28:569 [140152144738080] logmon: Adding to gQoSdefs: default_qos
Oct 1 10:47:28:569 [140152144738080] logmon: Adding to gQoSdefs: ServerError
Oct 1 10:48:01:876 [140152144738080] logmon: lgm: format start
Oct 1 10:48:01:876 [140152144738080] logmon: lgm: FORMAT END START
Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] In WithI18n section [#str_log="SERVER ERROR"],[ERCP_],[ISO-8859-1],[-1]
Oct 1 10:48:01:876 [140152144738080] logmon: lgm: Read File
Oct 1 10:48:01:876 [140152144738080] logmon: lgm: read the line: [cant=$(grep -c "$str_log" ${path_log}/$file_log | awk 'BEGIN {FS = "log:"} ; {SUM += $2} END { print SUM }')]
Oct 1 10:48:01:876 [140152144738080] logmon: lgm: check format start..[0]
Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] In Withon [cant=$(grep -c "I18n secti$str_log" ${path_log}/$file_log | awk 'BEGIN {FS = "log:"} ; {SUM += $2} END { print SUM }')],[ERCP_],[ISO-8859-1],[-1]
Oct 1 10:48:01:876 [140152144738080] logmon: lgm: read the line: []
Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] In WithI18n section [],[ERCP_],[ISO-8859-1],[-1]
Oct 1 10:48:01:876 [140152144738080] logmon: lgm: read the line: [echo "#ErrServer $cant"]
Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] In WithI18n section [echo "#ErrServer $cant"],[ERCP_],[ISO-8859-1],[1]
Oct 1 10:48:01:876 [140152144738080] logmon: [BF9511DE] MATCH [ErrSrv] on line 0
Oct 1 10:48:01:876 [140152144738080] logmon: returning message: echo "#ErrServer $cant"
Oct 1 10:48:01:885 [140152144738080] logmon: lgm: Read File
Oct 1 10:48:01:886 [140152144738080] logmon: lgm: read returned null
Oct 1 10:48:01:886 [140152144738080] logmon: lgm: p->format: 0 format: 0 mode 0x201
Oct 1 10:48:01:886 [140152144738080] logmon: [BF9511DE] used 14 ms scanning 266 bytes
Oct 1 10:48:04:886 [140152060811008] logmon: (mysystem) - executed (pid:24745): /home/user_nn/check_ars_1.sh

 

Any help?

Thanks in advance.

Outcomes