I have problems with my Primary Hub server and the UMP Portal server, both have problems with the Java's process raised by NimSoft, the CPU goes to 100% every time.
Any one have problems like that?
I hope you can help me.
Do you have those on seperate servers or the same server?
In case you are installing NImsoft on a VM hosted on VMWare ESX Server, have you referred "Java on VMWare Virtual Machines" section in NImsoft server installation guide ?
The company in charge of Install the solution verify the configuration made in the ESX server for that machines. They did not find any think wrong.
I send you the screen shoot take for them to validate the configuration. For your review.
I am not sure whether this helps or not but try allocating more memory to Java in dashboard_engine / dap probes and check it it helps.
The result is the same.
In the attached file you can review the situation.
Which probes are causing the high CPU utilization? There are several that rely on Java.
On the report server (UMP Portal) I detected that dap, dashboard engine and wasp (all they used the Java).
On the Primary HUB I don't know.
In your screenshot (which I believe is of the processes on the UMP server), 2 of the 3 java.exe processes show 0% utiliization. Are you sure you are having problems with all 3?
It was for the moment I take the snapshot. But Im sure that the 3 java process (not at the same time exactly) take the 100 % of CPU. And that right was in the Reporter.
I send you the evidence that java process (associated to the probes used by UMP) portal are taking all the CPU.
You can see all the theads they used and more specific information.
If I turn off the Controller ( that go down the probes on the UMP) the CPU usages go to 5 %.
It looks like only two of the Java probes are using CPU time; the third appears to be idle. My guess is that the dap probe is the idle one because it typically does not do much.
I suspect that even though two probes are consuming most of the CPU time one of them is probably the root cause. If you disable them one at a time, I think you may find that stopping one of them makes the other one behave better. I would guess the dashboard_engine is probably the culprit, but that is just a guess.
If you can narrow it down to one probe that is causing most of the trouble, you should make sure the loglevel is turned up on that probe and check the log for clues to explain what it is doing. If you have not done so yet, you are probably getting close to the point where a support case with Nimsoft makes sense.
Another wild guess... On your primary hub I would first look at the discovery_server to see if it might be the cause of the high CPU utilization. It is not the only probe that could act that way, but I have seen it cause problems of this sort before.
The probe that takes more CPU is wasp, the second one is dashboard_engine and finally as you said the dap. I let the server running for about 30 minutes and after that the CPU goes down. If I login into the web portal page the CPU goes up at 100 % again for some time and after manage the petition goes down. I think is problem of CPU capacity. I have only 2 CPU assigned to the VM. I will made the test of put more 3 CPUs to the machine and see what happened. This is for the par of the Reporter server.
The Primary HUB I will try to turn off probes and robots to discover what is taking the CPU. Because these server have 4 CPUs and 8 GB on RAM. Maybe we need more CPU and RAM for that server too.
Thanks for the tips and for try to help me.
I put 4 CPUs to the Virtual machine and the result is the same. The java process takes all the CPU for 40 minutes or more.
Im going to prepare 2 new server using Windows 2008 R2-32 bits. I want to discard problems for 64 bits OS.
I will tell you about the result.
Keith I need to ask something, I hope you can help me.
I have the productive environment, this environment is connecting to a SQL Server and the installation of NMS created a database named "***", I don't want to loose that historical data in case we can fix the situation with the java process, so I want to prepare 2 new servers and created a new database with different name on the SQL Server and don´t update o replace the first one. This is possible to do? Is in the part when the NMS installation program ask for an automatic installation or custom installation?
What versions of the WASP and Dashboard engine do you have?
I will say that I have no issue with the CPU of these probes in my 2008 R2 64bit environment. I really think that Keith's recommendation about how the discovery server probe is acting on the primary hub server will help you deterermine root cause of your issue.
To your question to Keith - I am 99% sure you can change the name of the database during the regular install. Worst case you could always rename the database on the server side.
I agree with Brandon. You can enter the database name during the installation and make it anything you want. I think you get this opportunity even if you choose to do an automatic installation. I might be wrong about this, but I think automatic installation just picks the components for you but still asks for the database setup.
Thanks for the info about the new database instance.
The wasp is version 2.91 and dashboard_engine is 3.30.
Do you have it in Virtual machines? What ESX version are you running?
Thanks for the information about the new instance of database.
Yes Im using UMP version 2.5.1. I will try to upgrade to version 2.5.2 and the probes to newer versions.
Thanks for the advice.
I made more tests about this situation and I find these situations:
1. If I have installed the Antivirus (Symantec End Point Protection 11.0.5) the antivirus is taking 25 % of CPU all the time. So I uninstall the Antivirus and the performance is better in both servers.
2. In the server with the UMP after uninstall the Antivirus, stop and restart the Robot. The CPU at 100% takes about 21 minutes and after that the UMP portal works OK and the CPU never goes back to 100%.
3. On the Primary Hub Server without antivirus the stop and start process take 10 minutes of 100 % CPU and after that goes down and never takes more that 60%.
Some one have that Antivirus and have this behavior?
Some one can configure exceptions or something like that to avoid this behavior?
I would imagine that adding exceptions to the anti-virus software would help. I am not exactly sure what you would need exclude. You could obviously exclude all of the Nimsoft directory, but it seems like you should be able to be more precise than that. I just cannot think of anything big that is likely to cause trouble with anti-virus and would run on both the hub server and portal server. It would be nice if you could have the anti-virus tell you what it is doing at any given point in time, but I am not sure if that option would exist. On the hub server, I would definitely recommend excluding the hub queue files, which are constantly being updated. But those files should not exist on the portal server.
The behavior with the anti-virus software removed still seems rather strange. Those are very long times to have such high CPU utilization. When you restart the hub server, are there any large queue files?
I put exceptions to directory where NimSoft software was installed. Even I put the java.exe file as exception. But the problem still. As you said maybe we need be more specific in the exceptions. That was I asked if some one have this problem and to know what exceptions to use.
Can you please tell me how to verify large queue files in the Primary Hub? Im not expert in the tool. I´m learning.
You can double click on the Hub probe running on your primary server and click the status tab. Do you see anything listed in the column Queud after you select update serveral times ( this number if it exists at all should not be trending upwards) or you can look in the file system under program files -> nimsoft -> hub - q
I take the information you ask for. The Queue column most of the time show values of 0. And in some time (very short) value for dashboard_engine.
In the attached document are the snapshots taken.
In the last part of the file are the robots and the probes running on the primary Hub.
The screenshot only shows temporary queues. When you checked the queues, did you make sure to look at all of them in the list? I imagine you did, but I wanted to make sure.
When you restart the robot and see high CPU utilization for an extended time, is all of that CPU time going java.exe processes? Given some of the other information you have provided regarding this issue, I imagine that is probably the case.
Yes I look all the list. I can tell you that are 3 queues in green state (audit, data_engine and nas) but no queued values are showed (always in 0).
Yes the Java process are taking all the CPU at restart but if the Antivirus is present, the antivirus take 20 % of the CPU.
That why I uninstall the Antivirus and the response is much better. I install Microsoft antivirus called Security Essentials in order to discards only problems with the Symantec antivirus. But the behavior is the same, take many minutes (about 40) to start the java.exe process (NimSoft process) and after that the CPU goes down. But I think is very much time for that. The same behavior is in the Primary HUB with this antivirus. I dont know what is happening.
We found the problem. The Fisical memory used for the ESX server was too low for all the virtual machines running on it. We move from 8 GB to 64 GB and all works ok.Thanks for the help.
They are in separate servers. They are running over ESX VMWare Server.
Refer the section mentioned in my earlier post and see if it helps.
Retrieving data ...