DX Unified Infrastructure Management

  • 1.  The Execute request failed: communication error LUA Script

    Posted Jan 25, 2013 06:27 PM

    I've created a script to access the controller's probe_config_get to create 3 CSV files of the CDM probe (CPU, DISK, Memory) for an administrative overview of what our thresholds are on each host in Infrastructure Manager.

     

    The script works to a certain extent...

    It starts writing the CSV files (accurately) but after a few minutes I get an error message pop up:

    "The execute request failed: communication error"

     

    Which is puzzling because the script keeps running and writing to the file but although it DOES keep writing, eventually it will stop and never make it through the entire hub list.

     

    I have investigated the hosts where it seems to stop on but I cannot determine what the problem is as they are live and accessible through the controller and I am able to login to the hosts etc

     

    Below is the code. Any help is greatly appreciated.

     

    -- -- Extract the CDM configuration file from the controller as a table and -- access parts of the configuration, and print the contents sorted by section. --  fname1 = "hublist/CPU.csv" fname2 = "hublist/Memory.csv" fname3 = "hublist/Disk.csv"  --create file and create headers --CPU = fname1 file.create (fname1) file.write(fname1, "Facility,Hostname,Interval,Samples,QoS,AlarmActive,ErrorActive,ErrorThreshold,WarningActive,Warning(Theshold)\n")  --Memory = fname2 file.create     (fname2) file.write(fname2,"Facility,Hostname,Interval,Samples,QoS,AlarmActive,(PF)ErrorActive,(PF)ErrorThreshold,(PF)WarningActive,(PF)WarningThreshold,(PH)ErrorActive,(PH)ErrorThreshold,(PH)WarningActive,(PH)WarningThreshold,(SW)ErrorActive,(SW)ErrorThreshold,(SW)WarningActive,(SW)WarningThreshold\n")  --Disk = fname3 file.create     (fname3) file.write(fname3,"Facility,Hostname,Drive,Active,Percent,QoSDiskUsage,QoSDiskUsagePerc,InodePerc,QoSInodeUsage,QoSInodeUsagePerc,DeltaPerc,DeltaCalcAll,DeltaType,QoSDiskDelta,CriticalActive,CriticalThreshold,MajorActive,MajorThreshold,InodeCriticalActive,InodeCriticalThreshold,InodeMajorActive,InodeMajorThreshold,DeltaCriticalActive,DeltaCriticalThreshold,DeltaMajorActive,DeltaMajorThreshold\n")  args = pds.create() pds.putString(args,"name","cdm") domain = "CHS" hl = nimbus.request ("hub","gethubs",args) --Iterate through each Hub for h_row,h_entry in pairs(hl.hublist) do      if h_entry.domain == domain then           local mypds = pds.create()           local r_resp,rc = nimbus.request(h_entry.addr, "getrobots")                      --There's an else at the bottom if the RC is not 0 but I never get the error printout           if rc == 0 then            --Iterate through each Host           for r_row,r_entry in pairs(r_resp.robotlist) do                local mypds2 = pds.create()                cfg,rc = nimbus.request ("/CHS/"..h_entry.name.."/"..r_entry.name.."/controller", "probe_config_get",args)                -- Extract the named section '/cpu'                                --There's an else at the bottom if the RC is not 0 but I never get the error printout                if rc == 0 then                                     --CPU Leafs                     cpu = cfg["/cpu"]                     cpu_alarm = cfg["/cpu/alarm"]                     cpu_alarm_error = cfg["/cpu/alarm/error"]                     cpu_alarm_warning = cfg["/cpu/alarm/warning"]                                          --CPU                     --Headers: Facility, Hostname, Interval, Samples, QoS, AlarmActive, ErrorActive, ErrorThreshold, WarningActive, Warning(Theshold)                     ---------------------------------------------------------                     if cpu ~= nil then                          file.write(fname1,h_entry.name..",")                          file.write(fname1,r_entry.name..",")                          file.write(fname1,cpu.interval..",")                          file.write(fname1,cpu.samples..",")                          file.write(fname1,cpu.qos_cpu_usage..",")                     end                     if cpu_alarm ~= nil then                          file.write(fname1,cpu_alarm.active..",")                     end                     if cpu_alarm_error ~= nil then                          file.write(fname1,cpu_alarm_error.active..",")                          file.write(fname1,cpu_alarm_error.threshold..",")                     end                     if cpu_alarm_warning ~= nil then                          file.write(fname1,cpu_alarm_warning.active..",")                          file.write(fname1,cpu_alarm_warning.threshold.."\n")                     end                                          --Memory                     --Headers:      Hub,Host,Interval, Samples, QoS, AlarmActive, (PF)ErrorActive, (PF)ErrorThreshold, (PF)WarningActive, (PF)WarningThreshold, (PH)ErrorActive,                     --(PH)ErrorThreshold, (PH)WarningActive, (PH)WarningThreshold, (SW)ErrorActive, (SW)ErrorThreshold, (SW)WarningActive, (SW)WarningThreshold                          ---------------------------------------------------------                                               --Memory Leafs                     memory = cfg["/memory"]                     memory_alarm = cfg["/memory/alarm"]                     memory_alarm_pagefile_error = cfg["/memory/alarm/pagefile error"]                     memory_alarm_pagefile_warning = cfg["/memory/alarm/pagefile warning"]                     memory_alarm_physical_error = cfg["/memory/alarm/physical error"]                     memory_alarm_physical_warning = cfg["/memory/alarm/physical warning"]                     memory_alarm_swap_error = cfg["/memory/alarm/swap error"]                     memory_alarm_swap_warning = cfg["/memory/alarm/swap warning"]                                     --Interval                     if memory ~= nil then                          file.write(fname2,h_entry.name..",")                          file.write(fname2,r_entry.name..",")                                              file.write(fname2,memory.interval..",")                          file.write(fname2,memory.samples..",")                          file.write(fname2,memory.qos_memory_usage..",")                     else                          file.write(fname2,",")                     end                                          --Overall Alarm Status                     if memory_alarm ~= nil then                          file.write(fname2,memory_alarm.active..",")                     else                          file.write(fname2,",")                     end                     --Pagefile Error                     if memory_alarm_pagefile_error ~= nil then                          file.write(fname2,memory_alarm_pagefile_error.active..",")                          file.write(fname2,memory_alarm_pagefile_error.threshold..",")                     else                          file.write(fname2,",")                     end                     --Pagefile Warning                     if memory_alarm_pagefile_error ~= nil then                          file.write(fname2,memory_alarm_pagefile_warning.active..",")                          file.write(fname2,memory_alarm_pagefile_warning.threshold..",")                     else                          file.write(fname2,",")                     end                     --Physical Error                     if memory_alarm_physical_error ~= nil then                          file.write(fname2,memory_alarm_physical_error.active..",")                          file.write(fname2,memory_alarm_physical_error.threshold..",")                     else                          file.write(fname2,",")                     end                     --Physical Warning                     if memory_alarm_physical_warning ~= nil then                          file.write(fname2,memory_alarm_physical_warning.active..",")                          file.write(fname2,memory_alarm_physical_warning.threshold..",")                     else                          file.write(fname2,",")                     end                     --Swap Error                     if memory_alarm_swap_error ~= nil then                          file.write(fname2,memory_alarm_swap_error.active..",")                          file.write(fname2,memory_alarm_swap_error.threshold..",")                     else                          file.write(fname2,",")                     end                     --Swap Warning                     if memory_alarm_swap_warning ~= nil then                          file.write(fname2,memory_alarm_swap_warning.active..",")                          file.write(fname2,memory_alarm_swap_warning.threshold.."\n")                     else                          file.write(fname2,",\n")                     end                                          --DISK                     --Headers:                      --Drive                     --Active                     --Percent                     --QoSDiskUsage                     --QoSDiskUsagePerc                     --InodePerc                     --QoSInodeUsage                     --QoSInodeUsagePerc                     --DeltaPerc                     --DeltaCalcAll                     --DeltaType                     --QoSDiskDelta                     --CriticalActive                     --CriticalThreshold                     --MajorActive                     --MajorThreshold                     --InodeCriticalActive                     --InodeCriticalThreshold                     --InodeMajorActive                     --InodeMajorThreshold                     --DeltaCriticalActive                     --DeltaCriticalThreshold                     --DeltaMajorActive                     --DeltaMajorThreshold                                          ---------------------------------------------------------------------------------                                          --Disk Leafs                     disk = cfg["/disk"]                     disk_alarm = cfg["/disk/alarm"]                     disk_alarm_fixed = cfg["^/disk/alarm/fixed/([^/]+)$"]                                               local filesystems = {}                     for section,conf in pairs(cfg) do                          fs = string.match(section, "^/disk/alarm/fixed/([^/]+)$")                          if fs ~= nil then                               table.insert(filesystems, fs)                          end                     end                     table.sort(filesystems)                     for i,fs in pairs(filesystems) do                                               local fs_name = string.gsub(fs, "#", "/")                          local disk = cfg["/disk/alarm/fixed/"..fs..""]                          local disk_error = cfg["/disk/alarm/fixed/"..fs.."/error"]                          local disk_warning = cfg["/disk/alarm/fixed/"..fs.."/warning"]                          local disk_inode_error = cfg["/disk/alarm/fixed/"..fs.."/inode_error"]                          local disk_inode_warning = cfg["/disk/alarm/fixed/"..fs.."/inode_warning"]                               local disk_delta_error = cfg["/disk/alarm/fixed/"..fs.."/delta_error"]                               local disk_delta_warning = cfg["/disk/alarm/fixed/"..fs.."/delta_warning"]                                                                   --Disk/Alarm/Fixed/DiskLetter                                                    --I noticed that not every config file has each one of these fields so I check first to see if it exists                          --I'm sure there's a better way to do this but I'm new to LUA                                                    file.write(fname3,h_entry.name..",")                          file.write(fname3,r_entry.name..",")                          if disk.description ~= nil then                               file.write(fname3,disk.description..",")                               else file.write(fname3,",")     end                          if disk.active ~= nil then                               file.write(fname3,disk.active..",")                               else file.write(fname3,",")     end                          if disk.percent ~= nil then                               file.write(fname3,disk.percent..",")                               else file.write(fname3,",")     end                          if disk.qos_disk_usage ~= nil then                               file.write(fname3,disk.qos_disk_usage..",")                               else file.write(fname3,",")     end                          if disk.qos_disk_usage_perc ~= nil then                               file.write(fname3,disk.qos_disk_usage_perc..",")                               else file.write(fname3,",")     end                          if disk.inode_percent ~= nil then                               file.write(fname3,disk.inode_percent..",")                               else file.write(fname3,",")     end                          if disk.qos_inode_usage ~= nil then                               file.write(fname3,disk.qos_inode_usage..",")                               else file.write(fname3,",")     end                          if disk.qos_inode_usage_perc ~= nil then                               file.write(fname3,disk.qos_inode_usage_perc..",")                               else file.write(fname3,",")     end                          if disk.delta_percent ~= nil then                               file.write(fname3,disk.delta_percent..",")                               else file.write(fname3,",") end                          if disk.delta_calculate_all ~= nil then                               file.write(fname3,disk.delta_calculate_all..",")                               else file.write(fname3,",")     end                          if disk.delta_type ~= nil then                               file.write(fname3,disk.delta_type..",")                               else file.write(fname3,",")     end                          if disk.qos_disk_delta ~= nil then                               file.write(fname3,disk.qos_disk_delta..",")                               else file.write(fname3,",")     end                                                                    ----Disk/Alarm/Fixed/DiskLetter/Error                          if disk_error.active ~= nil then                               file.write(fname3,disk_error.active..",")                               else file.write(fname3,",")     end                          if disk_error.threshold ~= nil then                               file.write(fname3,disk_error.threshold..",")                               else file.write(fname3,",")     end                                                         ----Disk/Alarm/Fixed/DiskLetter/Warning                          if disk_warning.active ~= nil then                               file.write(fname3,disk_warning.active..",")                               else file.write(fname3,",")     end                          if disk_warning.threshold ~= nil then                               file.write(fname3,disk_warning.threshold..",")                               else file.write(fname3,",")     end                                                    ----Disk/Alarm/Fixed/DiskLetter/InodeError                          if disk_inode_error.active ~= nil then                               file.write(fname3,disk_inode_error.active..",")                               else file.write(fname3,",")     end                          if disk_inode_error.threshold ~= nil then                               file.write(fname3,disk_inode_error.threshold..",")                               else file.write(fname3,",")     end                                                         ----Disk/Alarm/Fixed/DiskLetter/InodeWarning                          if disk_inode_warning.active ~= nil then                               file.write(fname3,disk_inode_warning.active..",")                               else file.write(fname3,",")     end                          if disk_inode_warning.threshold ~= nil then                               file.write(fname3,disk_inode_warning.threshold..",")                                         else file.write(fname3,",")     end                                                    ----Disk/Alarm/Fixed/DiskLetter/DeltaError                          if disk_delta_error.active ~= nil then                               file.write(fname3,disk_delta_error.active..",")                               else file.write(fname3,",")     end                          if disk_delta_error.threshold ~= nil then                               file.write(fname3,disk_delta_error.threshold..",")                               else file.write(fname3,",")     end                                                    ----Disk/Alarm/Fixed/DiskLetter/DeltaWarning                          if disk_delta_warning.active ~= nil then                               file.write(fname3,disk_delta_warning.active..",")                               else file.write(fname3,",")     end                          if disk_delta_warning.threshold ~= nil then                               file.write(fname3,disk_delta_warning.threshold.."\n")                                    else file.write(fname3,",")     end                     end                else                     print("Received error", rc, " on request to ", r_entry.name)                end                           end           print("Received error", rc, " on request to ", h_entry.name)           end      end end 


  • 2.  Re: The Execute request failed: communication error LUA Script

    Posted Jan 25, 2013 08:50 PM

    I am guessing you are running this script in the NAS by opening it in the editor window and clicking the execute button. Right? The error you are getting back as a popup would be from the NAS GUI (of which the script editor is part) timing out on the connection to the NAS. I think that connection typically times out around 30 seconds, and your script probably takes longer than that to run. It might work better as a standalone script that you can run with the NSA, but you would probably need to add a call to nimbus.login(). It might work better in the NAS as a scheduled job.

     

    I am not sure why the script does not seem to complete, although it might complete but just not give you any updates that you can see.



  • 3.  Re: The Execute request failed: communication error LUA Script

    Posted Jan 28, 2013 10:45 PM

    Keith is correct here. I ran your script in a small environment and it returned all of the information you are looking for in the three CSV files.



  • 4.  Re: The Execute request failed: communication error LUA Script

    Posted Jan 29, 2013 09:34 PM

    You are correct that I'm running it from the NAS editor.

    I scheduled the job and I suppose that took care of that popup - but that wasn't necessarily the problem as it still continued to run after that message.

     

    Essentially everytime the script hits a CDM probe that doesn't have all the 13 fields under the /disk/alarm/fixed/DriveLetter:\ it seems to get hung up in the middle of writing the line (in the csv file)

     

     /disk/alarm/fixed/C:\ -
        active           yes
        description      File system C:\
        disk             \Device\HarddiskVolume2
        percent          yes
        qos_disk_usage   yes
        qos_disk_usage_perc yes
        inode_percent    yes
        qos_inode_usage  no
        qos_inode_usage_perc no
        delta_percent    no
        delta_calculate_all yes
        delta_type       both
        qos_disk_delta   no

     

     

    So it makes me think that my "if field is null" checks are not doing what I'm wanting them to do. BUT on the other hand when it doesn't find all the fields it seems to execute the 'else' part and just write a comma - which makes me scratch my head even more.

     

    Also I'm not sure why some of the CDM probes have missing fields as they have all been the latest revision (at least to my knowledge: 4.70)

     

    So what I've been doing is after it's stops writing I delete the CDM probe that it stopped on and redeploy it - which seems to work because if I don't do that it'll stop on the same one time and time again.

     

    So basically I've written a giant CDM error finder haha



  • 5.  Re: The Execute request failed: communication error LUA Script

    Posted Jan 30, 2013 08:39 AM

    Can you provide an example of both a good and a bad config section in cdm.cfg? And show how the failure looks in the CSV file? It is a bit hard to picture...



  • 6.  Re: The Execute request failed: communication error LUA Script

    Posted Jan 30, 2013 10:31 PM

    Sure:

    I've attached the good and bad .cfg's but below is what I think is causing the issues. But not 100% sure

     

    CSV File (stops after hitting an incomplete .cfg file - I've left it overnight to verify it indeed has stopped)

     

    GranburyTX_0149_A,tx0149moneue02,File system C:\,yes,yes,no,no,yes,no,no,no,yes,both,no,yes,2,yes,10,no,10,no,20,no,10,no,8
    GranburyTX_0149_A,tx0149wts01,File system C:\,yes,yes,yes,yes,yes,no,no,no,yes,both,no,yes,2,yes,10,no,10,no,20,no,10,no,8
    GranburyTX_0149_A,tx149wdom,File system C:\,yes,yes,yes,yes,yes,no,no,no,yes,both,no,yes,2,yes,10,no,10,no,20,no,10,no,8
    GranburyTX_0149_A,tx149wdom,File system D:\,yes,yes,yes,yes,yes,no,no,no,yes,both,no,yes,2,yes,10,no,10,no,20,no,10,no,8
    GranburyTX_0149_A,tx0149wfs01,File system C:\,yes,yes,yes,yes,yes,no,no,

     Portion of cdm.cfg file that I think is causing the problem:

    Fields are missing

     

    <C:\>                                                       ***Fields are missing*** Only 8 fields
                active = yes
                description = File system C:\
                disk = \Device\HarddiskVolume1
                percent = yes
                qos_disk_usage = yes
                qos_disk_usage_perc = yes
                inode_percent = yes
                qos_inode_usage = no
                qos_inode_usage_perc = no
                <error>
                   active = yes
                   threshold = 2
                   message = DiskError
                </error>
                <warning>
                   active = yes
                   threshold = 10
                   message = DiskWarning
                </warning>
                <inode_error>
                   active = no
                   threshold = 10
                   message = InodeError
                </inode_error>
                <inode_warning>
                   active = no
                   threshold = 20
                   message = InodeWarning
                </inode_warning>
                <missing>
                   message = DiskMissing
                </missing>
             </C:\>

     

    Healthy cdm.cfg file

     

       <alarm>
          active = yes
          <fixed>
             <C:\>                                              ***I've found that there are 13 fields in healthy .cfg's***
                active = yes
                description = File system C:\
                disk = \Device\HarddiskVolume2
                percent = yes
                qos_disk_usage = yes
                qos_disk_usage_perc = yes
                inode_percent = yes
                qos_inode_usage = no
                qos_inode_usage_perc = no
                delta_percent = no
                delta_calculate_all = yes
                delta_type = both
                qos_disk_delta = no
                <error>
                   active = yes
                   threshold = 5
                   message = DiskError
                </error>
                <warning>
                   active = yes
                   threshold = 10
                   message = DiskWarning
                </warning>
                <inode_error>
                   active = no
                   threshold = 10
                   message = InodeError
                </inode_error>
                <inode_warning>
                   active = no
                   threshold = 20
                   message = InodeWarning
                </inode_warning>
                <missing>
                   active = yes
                   message = DiskMissing
                </missing>
                <delta_error>
                   active = no
                   threshold = 10
                   message = DeltaError
                </delta_error>
                <delta_warning>
                   active = no
                   threshold = 8
                   message = DeltaWarning
                </delta_warning>
             </C:\>

     

     



  • 7.  Re: The Execute request failed: communication error LUA Script

    Posted Feb 01, 2013 05:51 PM

    The script as you provided it worked pretty well for me, but I encountered a few issues. I ran it via the NSA rather than the NAS, so I could see error messages, most of which were not legitimate. I suspect the script has been generating error messages when you run it, but you miss them because they are buried in nas.log. You might want to write errors to a standalone log file rather than using print(). The only change I needed to make to use the NSA was to add a call to nimbus.login() near the top.

     

    The big issue is that you are missing an "else" before the last print() statement in the script. It looks like line 297 (currently blank) should be an "end" and line 298 (currently "end") should be an "else". That eliminated the spurious error messages for me. I am not sure if it will help with the issue you are having; I did not experience an issue like that. The script seemed to handle missing disk options just fine when I ran it. (That was something of a relief; the error checking code looked very good to me.)

     

    You also need to update line 292 to print a newline rather than a comma. Otherwise you end up with two disks on the same line if there is no delta warning threshold.

     

    This is just a style/format thing, but line 37 (and many of those following it) should probably be indented. The script is long enough that it might not matter from a readability perspective. I guess that means you should break things out into functions, which would probably eliminate the long list of if...then...else...end statements.

     

    BTW, you don't need to manually build the controller address (line 39); you can just use r_entry.addr.

     

    Let us know if the changes make the script work better.



  • 8.  Re: The Execute request failed: communication error LUA Script

    Posted Feb 04, 2013 07:21 PM

    Thanks for all your help Keith. I don't know why it didn't like those handful of config files but i finally got through them (by reinstalling the cdm probes) and got a relatively comprehensive audit of everything.

     

    Currently going through the processes probe and NOT experiencing the same thing, shwew!

     

     



  • 9.  Re: The Execute request failed: communication error LUA Script

    Posted Feb 26, 2013 02:43 AM

    This is good stuff, I started doing this same type of a thing trying to get threshold setup per robot like the ProbeReport.exe does from the bin folder but for all probes since the ProbeReport.exe only does (CDM,Processes, Ntevl, Ntservices). I tried to instead of writing all of this data to a file I created an SQL table on my DB and made columns for the Section, probe, robot etc and and just dumped every section and key value pairs into SQL. From there my hope was to create a SQL query to pull but I havent accomplished that yet. 

     

     

    Jarrod Hinson

    MModal