ketro01

Limitations to Agent Based Syslog Forwarding

Blog Post created by ketro01 Employee on Oct 16, 2018

As was mentioned in an earlier post, there are limitations to the agent based Syslog forwarding that Spectrum supports.  The first limitation is around scale.  Agent based log monitors typically scan a file which a syslog server is writing to on a polling interval.  For CA SystemEDGE, the default and minimum is 60 seconds.  For CA NSM, the default is 120 seconds and the minimum is 30 seconds.  In any case, you wind up with bursts of SNMP traffic that far exceeds the raw Syslog message rate:

 

 

 

Depending on how you've configured Spectrum, you may find that messages are being missed due to trap storm detection ( How does Spectrum determine a trap storm has been - CA Knowledge ).  A better approach would have the Syslog messages become alerts in Spectrum with minimal delay, avoiding delays introduced by writing to disk and reading on an interval.  Using rsyslog with the omsnmp module, traps can be sent to Spectrum at nearly the same rate that Syslog messages are received, eliminating bursts caused by interval based file scanning.  

 

The second limitation with agent based Syslog forwarding has to do with filtering.  Most agent based log monitors share the concept of using Perl Compatible Regular Expression and/or pattern matching rules to find matches in a log file.  Multiple rules can be implemented to look for log entries of interest and then take an action, such as send a trap.  With SystemEDGE, and perhaps other agents, these matching rules are all executed even if there has already been a match.  This is an example where someone wanted some messages with error categorized as major and all others ("Catch all") and critical:

 

watch logfile 1 0x00000041 all_devices.log '%.*(ERROR.*|PKTBUFFERFAIL|PM-SP-4-ERR-DISABLE|PM-SP-4-ERR_DISABLE).*' 'Production Switches in data centers' '' 1 major

watch logfile 2 0x00000041 all_devices.log '%PNNI-4-CONFIG_ERROR:.*' 'ATM related message' '' 1 major

watch logfile 3 0x00000041 all_devices.log '%.*(ERROR|ERR_DISABLE).*' 'Catch all' '' 1 critical
     

The problem with this configuration is that SystemEDGE will generate duplicate traps for certain messages because they'll match multiple lines.  Using a filtering configuration that works more like an Access Control List works better since you can stop after a successful match:

 

# Production Switches in data centers
      if (re_match($msg,"%.*(ERROR.*|PKTBUFFERFAIL|PM-SP-4-ERR-DISABLE|PM-SP-4-ERR_DISABLE).*")==1) then {
        action(type="omsnmp" transport="udp" server="127.0.0.1"
           trapoid="1.3.6.1.4.1.19406.1.2.1" port="162" version="1"
           messageoid="1.3.6.1.4.1.19406.1.1.2.1" community="public" template="SysEDGE_major")
        stop
        }

# ATM related message
      if (re_match($msg,"%PNNI-4-CONFIG_ERROR:.*")==1) then {
        action(type="omsnmp" transport="udp" server="127.0.0.1"
           trapoid="1.3.6.1.4.1.19406.1.2.1" port="162" version="1"
           messageoid="1.3.6.1.4.1.19406.1.1.2.1" community="public" template="SysEDGE_major")
        stop
        }

# Catch all
      if (re_match($msg,"%.*(ERROR|ERR_DISABLE).*")==1) then {
        action(type="omsnmp" transport="udp" server="127.0.0.1"
           trapoid="1.3.6.1.4.1.19406.1.2.1" port="162" version="1"
           messageoid="1.3.6.1.4.1.19406.1.1.2.1" community="public" template="SysEDGE_critical")
        stop
        }

 

Additional information on Rsyslog filtering can be found here.

Outcomes