Symantec Access Management

Expand all | Collapse all

What are the common cause of the ilusive "Socket Error 32" on Siteminder?

  • 1.  What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 10, 2014 11:26 AM

    Recently we get so many Socket Error 32 on our smps.log. What are general troubleshooting steps that you recommend?

     

    Our current Policy Server version and its OS version:

    Product=Policy Server,

    Platform=SunOS 5.10,

    Version=12.0,

    Update=03.07,

    Label=  460,

    Crypto=   128,

     

    Troubleshooting steps that we’ve done so far:

    *We ran Siteminder Trace Analysis report created by CA tool (see the attached file).

    *We are trying to recreate the error occurrences using Jmeter stress tests with various user accounts.

     

     

    When the errors occur, customers from various applications who authenticate via Siteminder will experience delays, slow responses,

    application freezes while using their software applications after they successfully authenticate.

     

    Thank You



  • 2.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 10, 2014 08:53 PM

    Hi,

     

    Socket error 32 is general message. Basically it tells us that when policy server try to response the request, the client side (ie: web agent) already close the socket as it has waited long enough. Network delay, policy server performance bottleneck, backend store delay are common candidates that contribute the issue. To troubleshoot further, web agent trace, policy server trace capture the event will help us proceed further.

     

    Regards,

    Kar Meng



  • 3.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 10, 2014 09:19 PM


  • 4.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 14, 2014 07:56 AM

    THis sounds like an OS issue, not a SiteMinder Issuse. SiteMinder reports it cannot get a socket because the Operating System wont let it have one, SiteMinder then reports the OS error number, for sun 32. remember, SM reports the errors of the underlying systems, so in troubleshooting this type of thing, you must verify the underlying system's error codes too.

     

    -Josh



  • 5.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 14, 2014 09:37 AM

    What are OS specific configurations that I need to verify?

    I am running SunOS 5.10 here.

    I also notice that a quick look at the symptoms I have witnessed:

    1. The Policy Server(s) is configured to "Idle Timeout" in 10 minutes
    2. The Policy Server(s) is configured to do LDAP Searches for 300 seconds
    3. The suspect Web Agents timeout in between 60 seconds and 75 seconds
    4. Active Directory Auth is not configured in "round robin" but failover in the PS (we wee consistent errors from these dirs)
    5. Now the new finding of one Domain: "weblogicc10-pr01" with 740 policies

    Mind you I am not the original implementer of this system. Thank you for your help, folks....



  • 6.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 14, 2014 09:45 AM

    As Unix (Solaris, Linux, BSD, etc) operating systems are C based, the C error codes are useful.

    One site, errno.h - C Error Codes in Linux happens to have a number listed.

    In this list we see that this is an OS level "Broken Pipe" which generally means that there was something on the OS or network side causing the OS to give this error to SiteMinder.

     

    Thus  looking at the SiteMinder application is not necessarily going ot resolve your issue. While getting your siteminder settings is important, you should also look at the OS settings around the functions and Network Settings around the functions with the issue.

     

    If this is an LDAP connection, you want SIteMinder to control the time out.

    What is the OS time out? if this is 10 minutes the OS can break it. If it is less the OS will break it. You should have it at 11 minutes minimum.

     

    Likewise, look  at the network settings. DO any network devices have the same 10 minutes? more? less? you allways want moree so that  SIteMinder breaks and reconnects.

     

    Hope this  helps.

     

    -Josh



  • 7.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 17, 2014 11:11 AM

    Yes, it's a LDAP connection and Siteminder controls the time out.

    I came from SUSE Linux so my Solaris knowledge is rather rudimentary.

    Where can I find all those network settings, OS time out parameters, etc in Solaris?

    I am also looking at any potential issue with the CA Directory LDAP connection but I shouldn't worry about it as long as Siteminder controls the time out, right?

     

    Thank You



  • 8.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 17, 2014 11:36 AM

    It have been a while since I worked with Solaris.

    I'm not positive myself. I would prefer not  to acccidentallly send you to the wrong area. Have you tried seaches? Oracle should  have the informaiton in Solaris Documentaion.

    Sun had it well  documented online.



  • 9.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 17, 2014 11:39 AM

    There is no such thing as a wrong area. As of right now, I am shooting in the dark. If I know the area to look for, I can definitely search Oracle Solaris documentations.



  • 10.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 20, 2014 08:16 AM

    ask CA (in a support case probably) for the CA SiteMinder Reproduction Information Gathering tool

    it should get most, if not all the information, CA needs to make a scaled version of your server.

     

    some of the information on unix may require "root" level access.

     

    I wonder if anyone has kept that tool up. lol.

     

    it will create a print out with what you seem to seek and a lot of other information you may find useful.

     

    I know for a fact I left copies of the tool with both Jon Taft and Rick Burnham when I left CA 2 years ago.



  • 11.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 21, 2014 01:27 PM

    OK, changing a strategy here....

    What's the best network appliance (with some sort of fancy dashboard) out there that can monitor the Siteminder and it's activities in a real time (preferably with a minimum footprint)?

    Or is there any?

    Mystrategy is to catch it while it happens instead of second guessing/going through all these logs and variables.

    Thank You




  • 12.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 21, 2014 02:29 PM

    how about i ignore the virtual slap in the face of your response and try to assist again?
    would you like that?
    probably, so let me point one thing out: when one doesnt get responses one expects, it's often a sign that there is a communication issue.
    to resolve communication issues, one should re-evaluate their own stance, predispoition and such. because if you have any, which you clearly do, then they will get in the way.

    you seem to think that SiteMinder is broken and people should be rushing to fix it for you.

     

    why dont you bookmark the FAQ on How To Ask Questions The Smart Way and read it later. i first read it in college in 1998  when a TA told me "Josh, you have the wrong idea here. instead of cutting you down i'm going to ask you tell me what you learn reading a website. i'll email you after office hours with the link. if i like your response you'll have my undivided attention and help outside of office hours today"

     

    the link was: http://www.catb.org/esr/faqs/smart-questions.html

     

    it was a good read. i completely changed my approach. the TA and i became friends, and i did well. Maybe we'll have that impact on you too. who knows.

     

     

    Now lets review your issue. you're seeing "error 32" in your SMPS/Trace logs and can't figure out why.
    Now let's review some facts:

    1: this is not something a monitor can tell you siteminder is having
    2: your trials at resolution are failing because you think SIteMinder is the issue -- it's not.
    3: let's look at the error in terms of a programming language; namely the one Solaris, Linux, BSD and other unix variants are written in: C

     

    okay. C says this, errno 32, means the pipe (communication stream) has been broken.

     

    let's think about this a minute, if the communication stream is broken, the side that will report it is the side that doesn't expect it and is trying to act normal.
    what would you gain checking it's logs? nothing. it wouldnt tell you hints to why. but you can try to isolate the broken pipe if there's multiple things being communicated with.

    let's look at siteminder. pipes are used with the Web Agent, User Directories, and the Stores (Policy, Key, Session and Audit).

    next, let's evaluate the pipe flows here.
    Web Agents would not be likely to  cause this, they only connect/disconnect as needed.
    directories and stores... those the Policy server uses a lot and keeps open. it wouldnt want them to close.

    wait. it wont want them to close? ok we can rule out 1 of the 3 pipe types.

     

    now let's look closer at the next two.

    stores... this one first because it is less likely to be the issue.
    why?
    1: key store pipe closure results in suspend mode. you'd be complaining about that if key store
    2: policy store would be reopened next time it needs to write/update, and it wouldnt cause a hang

    the other stores might not be used by you, and are likely not going to cause the issue you describe.

     

     

    so let's look at the probably broken pipe source: user directories
    shockingly, this would cause the policy server threads to hang for a big, then die, backing up the other processes and slowing everything down until your policy server is so bogged in trying old threads it stops accepting new ones.

     

    "but wait!", mr ICAMguy is probably going to exclaim "the policy server is functioning according to what you said, just very slow and now missing stuff!"

     

    yeah. that's right. and it causes the web agents to fail too. it's probably what you're seeing.

    what about these pipes? these can be closed by many things not siteminder:
    1: your os; it has a max time to hold the pipe. your solaris admin should be able to get that for you. or it should be nicely formatted by the CA SM Reproduction Informationo Gatherer tool, a perl script written about 3 years ago by a former CA Support person whom wanted an easy way to get all the information needed to make scaled environments for reproduction.
    2: your firewalls
    3: your routers
    4: your user directories

    now, siteminder's default  time out on pipes, when it reopens them, is 10 minutes.
    so while your solaris admin gets you the solaris value, remember you want to see 10 minutes 15 seconds minimum from there, so SiteMinder contorls these properly

    ok. so now you've pinged your network guys. same deal, you want 15 seconds minimmum latency for SiteMinder, so 10 min 15 seconds minimum time out

    finally cal your user directory admin, and get the same from him.


    what? one is less than 10 minutes 15 seconds?
    well now you know the root.
    let's move on to resolving the issue so it doesnt happen anymore.

    choices:
    1: get the  timeout bumped up. you have to have it over 10 minutes 15 seconds.
    2: adjust the SiteMinder timeout.

    given the way you're going about things, i strongly suggest #1 for you.
    #2 requires manual adjustments that can make things worse if you ***** up, and you don't seem comfortable enough with things like registeries to be comfortable here.

     

     

    hopefully, mr ICAM, you didnt go ballistic, as i expect, from the early part of this.
    hopefully, mr ICAM, you adjusted your attitude, disposition, and opened your mind.

     

    -Josh



  • 13.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Nov 25, 2014 11:44 PM

    Hi,

    Not sure if you get close to the root cause. Following article from my colleague could be benefit on your socket error 32. If the issue happen intermittently and happen during peak load, it could be sizing issue in your environment.

    https://communities.ca.com/people/Mark.ODonohue/blog/2014/09/03/siteminder-spiral-of-death-or-why-do-i-have-all-these-socket-32-errors-in-my-policy-server-logs



  • 14.  Re: What are the common cause of the ilusive "Socket Error 32" on Siteminder?

    Posted Feb 19, 2015 09:47 AM

    Have you tried any performance tuning?  Tuning can be done on the web agent side and the policy server side.

     

    I don't have a general CA SSO performance tuning guide (there should be one though), but here are some general parameters to look at:

     

    ACO parameters to look at:

    IgnoreQueryData (setting to yes reduces IsProtected? calls)

    IgnoreExt (make sure worthless file types are not protected)

    DisableDotDotRule (apps with strange directory names can cause unwarranted protection of file types listed in IgnoreExt)

    SessionGracePeriod

    SessionUpdatePeriod

    ResourceCacheTimeOut

    MaxResourceCacheSize

    MaxSessionCacheSize

    IdleTimeout (realm level)

    MaxtimeOut (realm level)

     

    Policy Server:

    User Directory - properly load balanced?

    HCO's - properly clustered?

    MaxConnections

    MaxSocketsperConnection

    Analyze your smaccess logs to see if any unnecessary processing is occurring and tweak the agent parameters above to reduce traffic to the policy servers

    Other PS cache settings?

     

    That's all I can think of off the top of my head for now...