Hello, as UIM 8.2 was released today, if anyone hits any issues or big "WOAH's" please throw it into this thread. We we looking to upgrade to v8.2 but need feedback from everyone with the bugs, or gotchas...
-Installer will fail if you have custom probes that do not have 'description' field filled on your primary hub
-message routing seems to be somewhat broken with robot 7.70
-I had an issue where a lot of probes were inactive for a while on my primary hub, claiming unknown socket errors. had to do some stuff to get past that. I'm not crediting this to any update, could just be windows bug too.. but it's the first time this ever occurred for me
-Apparently policy based configuration is not included even if the release notes might suggest so. Only the health_index is currently included in the policy based management
-Verification of succesfull installation had some invalid probe versions
UMP went pretty much just fine, except for bad bad documentation and release notes.
Altogether I'd say this release doesn't seem worth the trouble. It really feels to me like this was forcibly rushed so it could be released within Q1.
I am in the Technical Information group within CA.
Would you please provide specific information regarding your comment on the UMP documentation? If you provide us specific comments we can update the documentation.
I updated the release notes around lunchtime (MST) today and we are still adding late breaking items.
You can also leave comments on our DocOps site (CA Unified Infrastructure Management - 8.2 - CA U...). We monitor the comments and publish updates.
My general issue was with the release notes that lead one to believe that policy based management (other than ehealth) is included. This lead me to try to chase it down in the UMP for some time. There was also a diagram which included the templates and a set of instructions that referred to them - but as search doesn't seem to turn them up now, I guess they might have been taken down.
PBM is in controlled release state right now - I believe they wanted a smaller test group to focus on issues before releasing it for general availability. From what I understand, it's a fairly big complex thing that they do not want to rush.
The problem with message routing is not directly a routing problem: with 7.70, it seems actually some new ports are used and therefore it seems to be a firewall issue. This is, however, a changed behaviour from 7.63 and seems like a bug to me, since it's using wrong ports for this sort of communication.
It's also partly about selinux, it seems 7.70 doesn't like restrictive setting anymore in this context, whereas 7.63 had no problem with it.
I believe Jason discovered this same issue - we made a KB article about it. I believe Jason is forwarding the issue to development ASAP.
Here is the KB pasted below.
In some environments, after upgrading to hub 7.70, tunnel connections may appear unstable. Downgrading to 7.63 appears to restore stability.
First, the following article should be checked -- the hub upgrade may have reset some of the settings if you have previously changed them:
Additionally, there has been a change in the way internal tunnel ports are allocated, which could have implications in highly "locked down" or secured environments.
In such environments, it may be that a customer has internal firewalls which block communication even between hubs and/or robots except for the normal Nimsoft port range, e.g. 48000+ so that the robots and hubs can only communicate with each other on that specific range.
In hub 7.63 and prior, a tunnel client/server connection would allocate internal ports to use for the tunnel connections which would fall into the 48*** range unless the "First Tunnel Port" option was explicitly set to "ignore first probe port from controller" and a specific tunnel port designated.
In 7.70, the behavior has changed -- now, if a specific First Tunnel Port is not explicitly set, the internal ports used for the tunnel connections will be on randomly allocated ports instead of automatically falling into the 48*** range.
This change may cause tunnel connections or communication across tunnels to appear to fail, sometimes intermittently.
To resolve this you should set "Ignore first probe port from controller" in the hub GUI under Tunnel->Advanced and set the first tunnel port to 48004.
To me the problem actually seems to be the controller, not the hub. I also tried altering that setting in the hub it didn't seem to fix this issue for me. At times this can be a hard thing to test since it really does seem to allocate them randomly within a larger range.
Also, I wonder if this might psoe an issue with IM connectivity to the hub? Yesterday I also noticed that my connections from my laptop to my hub were being unsuccesful. These two are in different networks and there's an actual firewall between them. Unfortunately I was too busy troubleshooting the issue in general and didn't have time to take a closer look into that issue. At least rebooting the hub server didn't solve that issue.
I also have a case open regarding this issue.
What's the case #
Sent from my Verizon Wireless 4G LTE DROID
I'm describing it briefly for everyone's benefit here:
A <- no tunnel -> B <- tunnel -> C
A is my primary, B is tunnel proxy and C is customer.
with 7.70, A can access B, but not C. B can access C and C can access both A and B. This is with ports 48000-48050 allowed between A anb B
with 7.63, every hub can access all the other hubs.
It seems to be that when accessing C from A (7.63), B takes connection from A in port 48005 (this is what it looks like, I could be wrong). With 7.70, B takes this connection from A in random port (looked like it could go as low as 35000 and as high as 55000, at random testing)
At this rate we'll never migrate off of NMS server version 7.1
To be more precise about the problem: the port that is in too low a port is assigned to hub, but everytime I'm running controller 7.63 it works, but 7.70 doesn't.
Can you expand on the comment:
What probes had invalid versions?
First Nimsoft 7.0 release was unable to work with AD authentication for SQL DB connection
First CA UIM 8.0 release was unable to work with AD authentication for SQL DB connection
... So I am not surprised that my upgrade to 8.2 release on our sandbox UIM configured with AD authentication failed !!
fortunately, moving from AD to SQL authentication and restart upgrade solved the issue (the basic of Nimsoft are still quite reliable and upgrade van be restarted as many times as needed with less impacts)...But that means that my customers will not be able to migrate quickly !!! 90% of them use AD authentication for standard security policies !
Some people would cry regarding this lack of lesson learned !! i am one of them !!!
I also tried to find how the subsysid filtering capability was reintroduced in alarm console (major defect identified as solved in 8.2)... so far i did not find the way to filter on this attribute and the documentation does not indicate that this attribute can be used as filter... Maybe in release 8.3.... Frustation
What version of SQL Server?
What was the last version that had SQL DB connection was working for you?
When you say SQL DB connection - you are referring to Windows Authentication within Data Engine?
Do you have a case that is open for it; can you PM me the case #?
What is the Nimsoft service running as? (Is it the same as the Windows Authentication?)
Standard SQL 2008 R2 latest SP
it used to work with release 8.1.
yes i am talking about Windows authentication in data_engine (integrated security SSPI), using the service account associated to nimbus robot watcher service on Nimsoft main server.
Case 00160264 has been raised by my colleague.
We did not make further investigation on that topic since the move from AD authentication to SQL authentication temporaly solved the issue, making the upgrade works at last . To be honnest, it is not a surprise for us as we are aware of the lack of Quality Control on UIM releases (but how to make a complete Quality Check when releasing 4 versions a year !!!)
So; it seems to have failed during the installation - is that correct?
What about post installation? If you change the following would it connect?
On NMS Server:
- Change Nimsoft Robot Watcher service's Log on to the Windows/domain user- Restart Nimsoft Robot Watcher service- Open data_engine probe and add "Integrated Security=SSPI" to Parameters, so it will look like following:
Network Library=dbmssocn;Language=us_english;Integrated Security=SSPI
- Restart data_engine probe- Verify from logs and Status section of the probe that data_engine is connected to database and working correctly
For full instructions; Take a look at this
That's for older UIM instructions; I haven't tested AD on 8.2; just a suggestion since this is a dev environment.
Funny discussion !!!
Yes it happened during upgrade Process... As i told you, i am near 100% sure AD athentication is not tested before release becomes GA... This is the third Time we experienced such issue... This is dféminité ly not serious.
Well i know how to move Back to AD authentication but this is not a scenario we can accept, in most cases, DBA doesn't provide SQL credentials. I do hope support will Identity issue and Provide hotfix soon.
We test Windows Authentication (AD) mode in our lab. Thanks for adding the SF case #. We'll review your installer log and data_engine.cfg to see what differences might exist to give unsuccessful installation results.
The Subsystem ID is filterable from the named alarm filters in 8.2.
Do you mean using alarm filters in dashboard ? Or directly in USM alarmconsole ??
My need is to define custom alarmconsole portlets with multiple SID based filters (uusing regex)
In the USM alarm view there's a new ability to create and persist complex alarm filters which can include the subsys id.
For your use case we have also exposed the subsysid as an option on the URL for standalone or in the portlet preferences for a portlet.
Here's an example that could be used in the portlet preferences for USM:
i must admit that we were a little bit annoyed by this lack of feature oon new USM alarm console.
We use custom sid in all deployment project we manage.
is the documentation updated ?? Last week, i was unable to find the info you mentioned.
Unfortunately the docs don't have this information right now. I've pointed out this gap to the documentation team for this and hopefully that can be updated soon.
The documentation has been updated and reference for this feature can be found at:
https://wiki.ca.com/display/UIM82/Launch+a+Standalone+USM or https://wiki.ca.com/display/UIM82/Configure+the+USM+Alarm+View
Anyone have any comments on
* Admin Console speed? I heard it should have better performance
* Discovery Performance?
* Thoughts on snmpcollector 2.1 and Self-Cert portlet?
Admin Console speed: Well at least in my case it went from never working to at least now displaying hubs. service_host takes about 4 hours to get to the point in start-up where the web port goes live. And it still has to go out and touch every single hub which it appears to do serially - I've got 700 hubs and that takes a bit - especially since it has to wait for the obligatory timeout for those hubs that are unresponsive. The web pages also crash frequently. And the search feature at the top of the list of hubs never seems to finish working. even after waiting 10 minutes it's still grey and you have to reload the page.
Discovery Performance - That's a sore spot. What I can say is that it's footprint is smaller - in 8.1 on my system discovery needed about 3.5GB. I don't know yet if it is working in 8.2 (don't want to look and find out it's not) but in task manager, it;s only using 2GB.
No thoughts on SNMP
I upgraded from SNMPCollector 2.00 to 2.10. I lost all profile configurations on 1 of the 4 boxes I upgraded, the other 3 completed without an issue. After the upgrades, all my F5's stopped collecting data. That's odd because my F5 models weren't officially supported in 2.00, but they were working fine (they are officially supported in 2.10 and they're not working with no errors in the logs). The EMC and Cisco devices didn't miss a beat. I also had a few alerts come through after the upgrades (due to a prior misconfiguration on my part). That's great, except that means 2.00 wasn't working properly. SNMPCollector has great potential but so far it's still proving to be unpredictable and unreliable.
I started playing with the Self Cert portlet but it's definitely not for the faint of heart.
I like that the new admin console finally highlights the hub robot (with a blue dot)... although I would have much rather seen a performance increase
In beta, there were a few known issues with install on Oracle DB. Did those get addressed before release? Definitely seems like this release was rushed out the door.
If you use the performance report scheduling portlet to email reports, this is broken in 8.2. It generates the reports but the integration with email seems to be toast. Support has been working on this for three days and is suspiciously silent on the cause.
Well, at least I found a fix for this.
The upgrade from 8.1 to 8.2 fails to include the two values:
smtp_auth = false
smtp_starttls = false
from the <reportscheduler> section of wasp.cfg.
They get added if you run the email configuration tool. Unfortunately that tool was updated and as such becomes subject to being cached in browser cache and so not being updated. Chrome on win 2008R2 enterprise seems also to not always be able to successfully clear its cache (the fix for portlets not updating) and so sometimes will retain old objects.
So lesson learned - whenever you upgrade UMP, get a new workstation that has never been touched to run UMP.
I would recommend not updating the robot controllers to 7.70. I ran into major issues when doing so in a mixed UNIX environment - looks like there may be issues with the UNIX packages at least - haven't narrowed it down further but rolled back to 7.63.
I think I'm running into some right now - appears a Sigseg 11 right after we upgraded the primary hub's controller.
Not sure why yet - did a valgrind check but haven't parsed it just yet.
Same thing I saw and service_host and other java based probes throwing lots of errors. I couldn't access the web page for primary hub and also the nis_server probe wasn't getting a port - not starting fully
What version of Linux were you using? Did you try an LDD command? Any missing libraries?
Linux 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Dane, it sounds a a little like an issue I had for a brief moment shortly after upgrading to 8.2. I was on windows 2012 R2 though.. A lot of probes were unable to start and some were complaining about something like unspecified socket error. I rebooted etc and that didn't help much. After that I saw those errors on some probes, so I figured I'd release some sockets by deactivating non-vital probes. I did so, and after that enabled the core probes in small groups. That worked, and after that I was also able to enable rest of the probes. Later that day there was in issue with the server and I couldn't even log in through console. I didn't report it as a bug as I'm not quite certain it wasn't a Windows issue (couldn't see anything in the event logs, though), but the timing was awfully suspicious..
Didn't take long for hub 7.71 to be dropped. Just showed up in the archive.
going to give it a shot. fingers crossed..
Based on a quick test, it might actually fix the issue. Will see what happens in the long run..
btw I browsed to the "hub" release notes in the wiki.. funny how it still shows 7.70 state as beta. Here's the release notes since it's not in the wiki yet:
Hub 7.71 fixes an issue seen with hub 7.70 in assigning ports for tunnel client connections. Prior to 7.70, the tunnel client connections would consistently use the 48*** port range (based on the controller’s default first_probe_port setting of 48000). An issue in hub 7.70 caused the tunnel client connections to use a system-assigned port number, which would not be reliably fall in the 48*** range. This caused issues with firewalls where the tunnel ports were explicitly allowed and expected to be in the specific range.
With hub 7.71, the default port range for tunnel client connections will again fall in the 48*** range. The specific ports for tunnel connections can be overridden (as previously) by setting tunnel/ignore_first_probe_probe = 1 and tunnel/first_tunnel_port = in the hub.cfg.
Can anyone from CA comment if the hub download on the "Downloads" section for 8.2 was updated to include the fixed hub v7.71 version so folks don't hit this tunneling issue when standing up a new hub at a remote site?
Windows Infrastructure (HUB, distsrv and Robot) installation (41 MB, MD5 Signature: 8a7dec457e740ca907433efe21720760 )
Using the link you just provided, installed it and hooked it up to IM - no go; shows hub 7.7
Thanks Phil. I opened a support case asking for them to please update this... Case#: 160567
Jason said it'll be a few days - what's the ticket #?
seems like health_index is sending alarms with empty "origin"
Ok, so we have upgraded our lab environments to UIM 8.2 now, and lets say we have run into a problem that prevents upgrade in production.
It seems robot v7.70 has broken proxy_mode. We use proxy_mode extensively, and it seems when you enable proxy_mode on 7.70, controller ALWAYS binds to 48001, preventing spooler from starting (and all other probes).
Prior to 7.70 upgrade, you can see controller usually bind to some random port:
Sep 10 13:04:58:257  Controller: Controller on andersns01.lab.basefarm.net port 47696 started in proxy mode with proxy port 48000
Sep 23 03:02:16:027  Controller: Controller on andersns01.lab.basefarm.net port 45626 started in proxy mode with proxy port 48000
Oct 21 03:08:50:532  Controller: Controller on andersns01.lab.basefarm.net port 44213 started in proxy mode with proxy port 48000
Nov 18 02:57:35:911  Controller: Controller on andersns01.lab.basefarm.net port 45091 started in proxy mode with proxy port 48000
Nov 28 14:42:36:986  Controller: Controller on andersns01.lab.basefarm.net port 47848 started in proxy mode with proxy port 48000
Dec 16 03:40:53:036  Controller: Controller on andersns01.lab.basefarm.net port 50030 started in proxy mode with proxy port 48000
Jan 13 13:59:18:201  Controller: Controller on andersns01.lab.basefarm.net port 43331 started in proxy mode with proxy port 48000
Jan 20 03:41:52:749  Controller: Controller on andersns01.lab.basefarm.net port 44837 started in proxy mode with proxy port 48000
Jan 26 10:02:38:929  Controller: Controller on andersns01.lab.basefarm.net port 42878 started in proxy mode with proxy port 48000
Feb 17 02:58:41:927  Controller: Controller on andersns01.lab.basefarm.net port 40876 started in proxy mode with proxy port 48000
Mar 17 03:20:10:762  Controller: Controller on andersns01.lab.basefarm.net port 60859 started in proxy mode with proxy port 48000
Apr 8 11:34:01:284  Controller: Controller on andersns01.lab.basefarm.net port 48001 started in proxy mode with proxy port 48000
Apr 8 14:30:06:190  Controller: Controller on andersns01.lab.basefarm.net port 48000 started
Apr 8 14:32:48:665  Controller: Controller on andersns01.lab.basefarm.net port 48001 started in proxy mode with proxy port 48000
Then, after we upgraded to 7.70 at 11:34am today, you see it bound to 48001. Then we tested by setting proxy_mode to 0, and first_probe_port to 48000. Then you can see it binds as expected to 48000, and it works.
We then enabled proxy_mode again, and removed first_probe_port (used to cause problems with proxy_mode), and it binds to 48001, and spooler refuses to start.
So, robot v7.70 seems to break proxy_mode on your robots, so if you use it, be careful.
Here's an update I just got.
Just a heads up that there is a new version of 8.2 that has been posted on the downloads page which includes fixes for:
There is also a final hiccup with the current one posted where the installer version didn’t match the internal version (just shows 8.2.1 in the title window) so we’re correcting that and will get another one with new checksums posted. They'll also put a note on the page letting folks know the build version number change so if they had downloaded an earlier version they can just go get the new hub 7.71 from the archive and don’t need to feel like they need to run through the whole installer again with associated downtime/maintenance windows. DanielBlanco BTW, the 7.71 install_*** install packages are now on the web archive as well.
Hi Phil, thanks for the update.
To clarify was the main "UIM Server 8.2 GA Windows installer (1.9GB)" the download that was updated?
What about the "Other Installer Packages" section where the stand alone hub installer is located? This specific download: Windows Infrastructure (HUB, distsrv and Robot) installation (41 MB, MD5 Signature: cdf3d166799f86865c18e532b8c48fd8 )
Was that updated to installed hub v7.71?
Also where the is "note on the page letting us know there was a build # change?" If the note is the little two word "build 2" that is not at all noticeable.Why not make use of the tab called "ANNOUNCEMENT" to inform UIM end users that there was a change. Maybe put the "Note" on that page which folks would check.
While I haven't tested it - I assume so, the md5 signatures are different.
Yes, the Windows Infrastructure (HUB, distsrv and Robot) installation (41 MB, MD5 Signature: cdf3d166799f86865c18e532b8c48fd8 ) is now also updated similar to build 290 - to include hub 7.71 (and robot_update 7.70.)
Retrieving data ...