DX Unified Infrastructure Management

  • 1.  LUA won't restart my web server running on Solaris x86 i.e. the restart is using putty and plink method

    Posted Oct 09, 2008 06:40 AM
    Hi,

    I wrote a Restart-PWSDS1_2.bat file that lives on the hub.  This batch file contains a line

    ""D:\Program Files\Putty\plink.exe" -i D:\WIT\jobs\wit\.ssh2\nanimbus5-rsa-key-1024-20080513-x86.ppk -batch aiadm@ds6 /www/PWSDS1_2/BOUNCESERVER &"

    This line is basically using plink to login into my remote server 'ds6' (an x86 host that runs on Solaris 10) then executes the bounceserver script to restart my web server. 

    I could run Restart-PWSDS1_2.bat directly on the localhost where the hub resides.  My remote web server was restarted successfully.  Afterward, I added the below to my nas->Script->Restart-PWSDS1_2

    ret = os.execute("D:\\WIT\\Commands\\Restart\\Restart-PWSDS1_2.bat")
    if (ret == 0) then
    nimbus.alarm (1,"Restarted server PWSDS1_2 ... SUCCESS")
    print ("Success")
    else
    nimbus.alarm (5,"Restarted server PWSDS1_2 ... FAILURE!")
    print ("Failed")
    end

    I ran the script via the nas->Scripts->Restart-PWSDS1_2->right click (Run).  A 'Success' message appeared after I ran it, but when I checked my remote web server, it did not get restarted and the timestamp of the WS pid also validated that.

    I also send the output of the Restart-PWSDS1_2.bat to a log file.  This log got the latest info that running the script Restart-PWSDS1_2 in the nas did make the call to the Restart-PWSDS1_2.bat, but the remote web server was not restarted at all.

    Let me know if you need more info to help me with this one. 

    Thanks,
    Hoang


  • 2.  LUA won't restart my web server running on Solaris x86 i.e. the restart is using putty and plink method
    Best Answer

    Posted Oct 10, 2008 09:08 AM

    Remember that all probes run in a different context than your user-login.  I would first try out a simple plink "echo Hello World" and then capture that output in a action.command() statement.

    This would give you an idea on if the problem is related to the authentication part that is needed to run this simple command.  If you get data, then you should be able to traverse the output from the action.command() like:

    out = action.command ("netstat -an")
    for i=1,#out do
       printf("%d: %s",i,out)
    end



  • 3.  LUA won't restart my web server running on Solaris x86 i.e. the restart is using putty and plink method

    Posted Oct 15, 2008 04:49 AM
    Hoang,

    I discovered that the plink command outputs to stderr, so in order to capture any data from plink you should redirect stderr to stdout (2>&1).

    I ran the following over plink to a linux-box:

    buf = action.command ("c:\\case\\bin\\putty\\plink.exe -v -i \\case\\tmp\\casepk2.ppk -batch root@spider 'ls -al' 2>&1")

    for i=1,#buf do
       printf ("%02d -> %s",i,buf)
    end

    Even though I didn't get my keys straight (yes, I added .ssh directory and authorized_keys :-) but  nevermind.  The output from plink was captured as below:


    ----------- Executing script at 14.10.2008 17:42:14 ----------

      01 -> Looking up host "spider"
      02 -> Connecting to 193.71.55.80 port 22
      03 -> Server version: SSH-1.99-OpenSSH_3.4p1
      04 -> We claim version: SSH-2.0-PuTTY_Release_0.60
      05 -> Using SSH protocol version 2
      06 -> Doing Diffie-Hellman group exchange
      07 -> Doing Diffie-Hellman key exchange with hash SHA-1
      08 -> Host key fingerprint is:
      09 -> ssh-rsa 1024 b3:0a:c7:d2:40:12:33:2f:50:f1:3d:1b:52:a4:10:11
      10 -> Initialised AES-256 CBC client->server encryption
      11 -> Initialised HMAC-SHA1 client->server MAC algorithm
      12 -> Initialised AES-256 CBC server->client encryption
      13 -> Initialised HMAC-SHA1 server->client MAC algorithm
      14 -> Reading private key file "\case\tmp\casepk2.ppk"
      15 -> Using username "root".
      16 -> Offered public key
      17 -> Server refused our key
      18 -> Server refused public key
      19 -> Keyboard-interactive authentication refused
      20 -> Disconnected: Unable to authenticate

    Carstein


  • 4.  LUA won't restart my web server running on Solaris x86 i.e. the restart is using putty and plink method

    Posted Oct 16, 2008 12:05 AM
    Found the problem.  plink.exe expects the SSH hostkeys to recide in the windows registry.  As I expected earlier, the NAS runs in a SYSTEM context.  When running putty.exe to a ssh server, this key is stored into registry by putty ( yes, in your user context - HKEY_CURRENT_USER\Software\SimonTatham\PuTTY\SshHostKeys)

    The way around this is to export the hostkeys from regedit to a .reg file, and run the following line in a NAS script:

    action.command ("\\windows\\system32\\regedt32.exe /s \\case\\tmp\\putty-hosts.reg")

    This will load the keys into "this" users context (SYSTEM).  If your keys are in place ( my previous key problem was related to sshd handling the root user different from an ordinary user), then you can run:

    buf = action.command ("\\case\\bin\\putty\\plink.exe -ssh -i \\case\\tmp\\casepk2.ppk -batch case@spider ls -al 2>&1")

    for i=1,#buf do
       printf ("%02d -> %s",i,buf)
    end

    Note the redirect stderr to stdout ( 2>&1)

    ----------- Executing script at 15.10.2008 13:03:37 ----------

      01 -> total 36
      02 -> drwx------    3 case     case         4096 Oct 15 09:35 .
      03 -> drwxr-xr-x    7 root     root         4096 Oct 15 09:13 ..
      04 -> -rw-------    1 case     case          320 Oct 15 09:57 .bash_history
      05 -> -rw-r--r--    1 case     case           24 Oct 15 09:13 .bash_logout
      06 -> -rw-r--r--    1 case     case          191 Oct 15 09:13 .bash_profile
      07 -> -rw-r--r--    1 case     case          124 Oct 15 09:13 .bashrc
      08 -> -rw-r--r--    1 case     case          120 Oct 15 09:13 .gtkrc
      09 -> drwx------    2 case     case         4096 Oct 15 09:35 .ssh
      10 -> -rw-------    1 case     case         1005 Oct 15 09:35 .viminfo

    Hope this helps,
    Carstein


  • 5.  LUA won't restart my web server running on Solaris x86 i.e. the restart is using putty and plink method

    Posted Oct 22, 2008 10:42 AM
    Hi Carlstein,

    That is really cool.  I got it to work. 

    NAS Script1
    action.command ("C:\\WINDOWS\\system32\\regedt32.exe /s D:\\WIT\\jobs\\wit\\.ssh2\\putty-hosts.reg")

    --------------------------------------------------------------

    NAS Script2

    buf = action.command ("D:\\WIT\\Commands\\Restart\\plink.exe -v -i D:\\WIT\\jobs\\wit\\.ssh2\\nanimbus5-rsa-key-1024-20080513-x86.ppk -batch aiadm@ds6 /www/PWSDS1_2/BOUNCESERVER 2>&1")

    for i=1,#buf do
       printf ("%02d -> %s",i,buf)
    end

    ----------- Executing script at 10/21/2008 2:38:20 PM ----------

      01 -> Looking up host "ds6"
      02 -> Connecting to 10.22.105.89 port 22
      03 -> Server version: SSH-1.99-OpenSSH_4.3
      04 -> Using SSH protocol version 2
      05 -> We claim version: SSH-2.0-PuTTY_Snapshot_2008_05_13:r7993
      06 -> Doing Diffie-Hellman group exchange
      07 -> Doing Diffie-Hellman key exchange with hash SHA-1
      08 -> Host key fingerprint is:
      09 -> ssh-rsa 2048 d6:30:08:40:3f:e8:65:e5:61:a5:1e:2c:ce:28:62:03
      10 -> Initialised AES-256 SDCTR client->server encryption
      11 -> Initialised HMAC-SHA1 client->server MAC algorithm
      12 -> Initialised AES-256 SDCTR server->client encryption
      13 -> Initialised HMAC-SHA1 server->client MAC algorithm
      14 -> Reading private key file "D:\WIT\jobs\wit\.ssh2\nanimbus5-rsa-key-1024-20080513-x86.ppk"
      15 -> Using username "aiadm".
      16 -> Offered public key
      17 -> Offer of public key accepted
      18 -> Authenticating with public key "rsa-key-20080513"
      19 -> Access granted
      20 -> Opened channel for session
      21 -> Started a shell/command
      22 -> server has been shutdown
      23 -> /usr/lib/lwp:/www/sun/ws61sp5/bin/https/lib:/www/sun/ws61sp5/bin/https/jdk/jre/lib/i386/server:/www/sun/ws61sp5/bin/https/jdk/jre/lib/i386:/www/sun/ws61sp5/bin/https/jdk/jre/lib/i386/native_threads:/usr/lib/lwp:/www/sun/ws61sp5/bin/https/lib:/www/sun/ws61sp5/bin/https/jdk/jre/lib/i386/server:/www/sun/ws61sp5/bin/https/jdk/jre/lib/i386:/www/sun/ws61sp5/bin/https/jdk/jre/lib/i386/native_threads:/usr/lib/mps/secv1:/usr/lib/mps:/usr/lib/mps/sasl2:../../../lib:/usr/lib/mps/secv1:/usr/lib/mps:/usr/lib/mps/sasl2
      24 -> Sun ONE Web Server 6.1SP5 B08/18/2005 00:48
      25 -> info: CORE5076: Using  from
      26 -> info: WEB0100: Loading web module in virtual server  at
      27 -> info: WEB0100: Loading web module in virtual server  at
      28 -> info: HTTP3072:  http://10.22.105.190:80 ready to accept requests
      29 -> startup: server started successfully
      30 -> Server sent command exit status 0
      31 -> Disconnected: All channels closed

    --------------------------------------------------

    Remote server side (check timestamp)

    ds6-28 ~> /usr/bin/ps -ef | grep PWS
        root 15409     1   0 14:38:14 ?           0:00 ./webservd-wdog -r /www/sun/ws61sp5 -d /www/sun/ws61sp5/https-PWSDS1_2/config -
       hthai 15416 15139   0 14:38:38 pts/4       0:00 grep PWS
        root 15410 15409   0 14:38:14 ?           0:00 webservd -r /www/sun/ws61sp5 -d /www/sun/ws61sp5/https-PWSDS1_2/config -n https
       aiadm 15411 15410   2 14:38:14 ?           0:17 webservd -r /www/sun/ws61sp5 -d /www/sun/ws61sp5/https-PWSDS1_2/config -n https


    A big THANK YOU!!!
    -Hoang


  • 6.  LUA won't restart my web server running on Solaris x86 i.e. the restart is using putty and plink method

    Posted Nov 14, 2008 12:13 PM
    Hi,

    I know I've been beating the horse on this LUA thing, but I'll go ahead and post the question anyway 8)

    This is to follow up of the same thread. 

    I am trying to improve the script by looking for this line "Server sent command exit status 0".  If it's there, then I want to send an alarm saying it's OK, else, FAILED.  I am posting the script. 
    I know it has something to do with the for loop, but I can't put the finger around the solution yet. 
    The end result is that I got two suppressed alarms: one (with many suppressed alarms) said FAILED
    and another (with no suppressed alarm --> see the below Successful Restart result that is in bold) said OK. 

    I welcome your suggestion.

    Thanks,
    Hoang

    Script
    local svr_name = "PLSPS5_1"

    buf = action.command ("D:\\WIT\\Commands\\Restart\\plink.exe -v -i D:\\WIT\\jobs\\wit\\.ssh2\\nanimbus5-rsa-key-1024-20080513-x86.ppk -batch aiadm@ds11 /www/PLSPS5_1/BOUNCESERVER 2>&1")

    for i=1,#buf do
       if buf == "Server sent command exit status 0" then
             nimbus.alarm(1, "AutoRestart "..svr_name.." OK")
                   printf("Successful Restart of "..svr_name)
       else
                   nimbus.alarm(4, "AutoRestart "..svr_name.." FAILED")
                   printf("Failed to Restart "..svr_name)
                   printf("Take a look into this Restart ASAP")
    end
    end


    Result
    ----------- Executing script at 11/13/2008 4:02:13 PM ----------

    Error in line 14: '<eof>' expected near 'end'


    ----------- Executing script at 11/13/2008 4:00:30 PM ----------

      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP
      Successful Restart of PLSPS5_1
      Failed to Restart PLSPS5_1
      Take a look into this Restart ASAP




  • 7.  LUA won't restart my web server running on Solaris x86 i.e. the restart is using putty and plink method

    Posted Nov 15, 2008 01:42 AM
    Hoang,

    I think you have 2 issues here.  First, you are sending alarm messages before you can be sure the restart actually failed (because there are more lines to process).  Second, your clear alarm message is not suppressing with the other alarm message.

    Alarm messages only suppress into a single alarm when they either have identical message text or a common suppression key.  Since you are not using a suppression key in your script, both alarm messages would have to match for them to suppress but do not because of the "OK" on the clear message.  The automatic suppression based on message text is not very useful for clear messages, since you normally want the clear to be at least a little different.  I would recommend a suppression key for your script.  In that case, the script would look more like this:
    local svr_name = "PLSPS5_1"

    buf = action.command
    ("D:\\WIT\\Commands\\Restart\\plink.exe -v -i
    D:\\WIT\\jobs\\wit\\.ssh2\\nanimbus5-rsa-key-1024-20080513-x86.ppk
    -batch aiadm@ds11 /www/PLSPS5_1/BOUNCESERVER 2>&1")

    supp_key = "nasscript/bounceserver/"..svr_name

    for i=1,#buf do
       if buf == "Server sent command exit status 0" then
          nimbus.alarm(1, "AutoRestart "..svr_name.." OK", supp_key)
          printf("Successful Restart of "..svr_name)
       else
          nimbus.alarm(4, "AutoRestart "..svr_name.." FAILED", supp_key)
          printf("Failed to Restart "..svr_name)
          printf("Take a look into this Restart ASAP")
       end
    end
    Of course, this would not completely solve your problem yet.  This would generate 2 alarms.  The first would clear after several repeats, and the second would not clear because there is a non-matching line after the matching line.  I think you are looking for something more like this:

    local svr_name = "PLSPS5_1"

    buf = action.command
    ("D:\\WIT\\Commands\\Restart\\plink.exe -v -i
    D:\\WIT\\jobs\\wit\\.ssh2\\nanimbus5-rsa-key-1024-20080513-x86.ppk
    -batch aiadm@ds11 /www/PLSPS5_1/BOUNCESERVER 2>&1")

    successful = false
    supp_key = "nasscript/bounceserver/"..svr_name

    for i=1,#buf do
       if buf == "Server sent command exit status 0" then
          nimbus.alarm(1, "AutoRestart "..svr_name.." OK", supp_key)
          printf("Successful Restart of "..svr_name)
          successful = true
          break
       end
    end

    if not successful then
       nimbus.alarm(4, "AutoRestart "..svr_name.." FAILED", supp_key)
       printf("Failed to Restart "..svr_name)
       printf("Take a look into this Restart ASAP")
    end
    This should only generate an alarm if the success message is completely missing from the output.  Another alternative is to generate an alarm every time the script needs to restart the server but then clear the alarm when it succeeds.  This could be useful if you wanted to have the option of finding these in the alarm history later.  That could be done like this:
    local svr_name = "PLSPS5_1"
    supp_key = "nasscript/bounceserver/"..svr_name

    nimbus.alarm(4, svr_name.." requires restart", supp_key)

    buf = action.command
    ("D:\\WIT\\Commands\\Restart\\plink.exe -v -i
    D:\\WIT\\jobs\\wit\\.ssh2\\nanimbus5-rsa-key-1024-20080513-x86.ppk
    -batch aiadm@ds11 /www/PLSPS5_1/BOUNCESERVER 2>&1")

    successful = false
    for i=1,#buf do
       if buf == "Server sent command exit status 0" then
          nimbus.alarm(1, "AutoRestart "..svr_name.." OK", supp_key)
          printf("Successful Restart of "..svr_name)
          successful = true
          break
       end
    end
    Either of these 2 options should leave you with an open alarm if the restart fails, but they get there different ways.  The 2nd option would be a bad idea if the alarm message would prompt someone to take immediate action, since the script is still trying a restart for a little while after creating the alarm.

    Regards,
    Keith


  • 8.  LUA won't restart my web server running on Solaris x86 i.e. the restart is using putty and plink method

    Posted Nov 20, 2008 11:54 AM
    Keith,

    I have been migrating my scripts into LUA; this is why I have not had a chance to respond to your post.  I am using option 2 and it works well.  Hopefully I won't run into any new 'gotchas' issue.

    Thanks again for your pointer,
    Hoang