How can I get QoS for the msg/sec statistics from the data_engine. I have issues sometimes with users putting in excessive QoS and I'd like to monitor dramatic spikes in the amount of msg/sec coming into the data_engine.
Does the data_engine callback get_stats give you the information you are looking for?
Yes it does and I can add a meter widget to a custom dashboard that queries get_stats. How do I get into the NMS QoS database though?
I know the other forum users are going to tell you that a lua script will do what you want - if they do they should give you and example of said script
I think the following will work but perhaps write a script that writes the message rate to a file -> use logmon to monitor file -> and if you build the logmon configuration correctly I think logmon will allow you to write this value as a QoS.
As I am not a DBA, take my suggestion with a bucket of salt, but...
What if you were to create a stored procedure that ran once an hour that queried the s_qos_data table for a count of distinct records in the QOS column and insert that into a table with a timestamp.
That would give you a number that you can chart against with time.
Or even just create a trigger on the s_qos_data table that would do this so it will only record when there is an insertion into the table.
For monitoring you could use one of the database monitors depending on what database you are using. For SQL Server you could use the sqlserver probe and create a custom metric and monitor it for changes over time or come up with some creative way of monitoring it.
Any DBA want to chime in on why this is doable/bad idea?
Steven,I do something very similar to what you suggest only for the count of s_qos_definition table. This way I monitor if my users are creating "rouge" QoS definitions. You could either use the alarming function in the sql_response probe or perhaps create a QoS monitor/alarm in SLM. This method assumes that rgivens's users are creating unique QoS and not doing something like decreasing the polling interval - thus increasing the message rate however.
I have attached queue_check probe to send QoS based on QoS' msg/sec
You can deploy this in the same robot where data_engine resides. Please install nsa in advance since that it is using Nimsoft LUA extension. You can makme a custom page that shows this data and determine how many msg/sec in your environment, trigger and alarm if there is any high number QoS stuck in the hub queue.
So is there any special configuration necessary for this probe. I have the nsa installed on the hub where my data_engine is running. I placed this probe on the hub and I get a (timeout) - got kicked from the probe. every 5 seconds.
the problem I'm experiencing is UMP is completely non-functional. Support indicates that its a database issue (to much QoS, Nimsoft database is to big - 180GB, etc.) I would disagree as the database host and SQL is fine. It's the application not playing well with the database that is the issue. We are on NMS 5.60 and UMP 2.5.2. It's a clustered windows with sql 2005 enterprise x64. I've basically been told that what would fix this is to upgrade sql to 2008 r2 and Nimsoft 5.61 / UMP 2.61. UMP 2.61 requires SQL 2008 R2. I can't upgrade to SQL 2008 R2 as Nimsoft isn't the only application that uses this database host and I just can't get a window to do it. We have stuff like BES, VMware, and Remedy on the host, which by the way have no issues whatsoever.
So I'm stuck trying to find a way to remove monitoring from our environment to decrease the load of the application so that it can function. We are pushing about 1000 to 1500 messages per minute in the data_engine.
What's frustrating me is that UMP 2.5.2 is not end of life. I should be getting support for my issues, but I'm not. I'm being told to upgrade. If there is a problem with the application, which is clear to me there is, I should get support for the version that I'm on. Upgrading shouldn't be the only form of support that I get. So I'm stuck with a non-functional application which I'm ready to rip out as it's been problematic since it's been installed.
Wow - I can appreciate your frustration as we went through something very similar with UMP 1.5.x. We do now have a 1TB 2005 MSSQL database inserting up to 80,000 msg/min (if statistics is to be believed) at peak
And yes I too get so sick of hearing "just upgrade to the new major version we release every three months or so because I don't really know what is causing your problem" Seriously, I can't upgrade every three months .The version you are running is what about 5 months old.
To your issue - What about the UMP is not functioning for you specifically? Why does support think upgrading will help you? Simply upgrading the version of UMP and SQL is supposed to make your issue go away ? I would doubt that as well.
the custom dashboards and alarm portlet continually fail with errors. It was happening sporadically with UMP 2.5.1. Our IT Operations Center has been complaining about the sporadic errors and continue to use Enterprise Console and our old dashboards because of it. So I opened a case and was told that these issues would be addressed if I upgrade. So I upgraded as far as I could go, which is NMS 5.60 and UMP 2.5.2 and now the problems are much more severe. Instead of sporadically happening they are happening continually to the point where it is just unusable. I'll attach some screenshots of the errors.
I'm in the situation of where I need to upgrade to a version that doesn't support Enterprise Console anymore, but I can't use UMP because it doesn't work. So our company has no dashboards. Even if I had SQL 2008R2 to ugrade to I can't because I would lose the Enterprise Console dashboards. I'm honestly thinking about looking into other solutions and I'm looking forward to telling our rep.
Support indicates that this is a large database for SQL 2005 - 180GB's. We have db's with TB's of data so I don't see that really as a valid argument. Maybe it's large for a Nimsoft DB. They also indicate that having 1000 to 1500 messages per minute being written to the Nimsoft dataabase is a lot. They say that with new features of SQL 2008 R2 and it's optimizations and the new features / optimizations of NMS 5.61 and UMP 2.6.1 that we won't have the performance issues with the database and that UMP will function better because of it.
That's a tough one - do you know if you have to upgrade to 2008 R2 - or is that only to take advantage of the SQL performance enhancements. I am just wondering if upgrading to 5.60 (legacy installer to support EC) and UMP 2.61 ( still running on SQL 2005 ) is a possibilty?
I am also assuming you looked at the database tweaking documents out there to enhance performance - I have attached one from 2010
I was told by support that it had to be on SQL 2008R2 to go to UMP 2.6.1 and it's mentioned in the Install Guide as a requirement. You have to upgrade UMP to 2.6.1 prior to upgradiing to NMS 5.61. Thanks for the doc. I'll look it over. They seem to be common best practices for setting up a machine and SQL server which we follow.
I think we've experianced similar issues with our dashboards but not to the extent you are seeing. Is this happening on every custom dashboard or only when certain ones are ran? The other thing may not be your SQL engine but rather the system running the UMP itself. It simply may need more resources, which was the case with our UMP recently.
I'd be curious of the server setup that is currently running the UMP.
The server running UMP is a dual processor 3.60GHz Intel Xeon IBM xSeries 366 with 8GB's of memory. I've done QoS on the machine and the memory and CPU are barely touched. I've asked previously about tuning UMP or if there is anythng special that I need to do to UMP to make it perform better but haven't gotten any specific answers. The machine exceeds minimum requirements but I have been thinking about rebuilding it on a bigger machine.
What's the process of moving UMP from on machine to another? Is it like other Nimsoft components where I can copy the installation directory over to a new machine and run through the installation process to restore the new machine to the state of the old machine?
That is far more than what we have and currently we have our UMP and primary hub together. In fact Unified Reporter runs there too but system resources are around 70% usage. Our environment is pushing around 20k msg/min, with the server specs below. The database is a little over 1 TB in size and is 2008 R1.
VMWare Virtual Server
CPU: 2.93Ghz dual core x2
4 GIG Ram
In addition, since I forgot to add it, we are currently running the latest releases of NMS, Unified Reports and Unified Monitoring Portal. I am starting to wonder if maybe the spread of components is causing more of an issue than anything else? Just trying to compare to what we are running here.
Russ, a few things stick out for me too when I have troubleshot issues with the UMP:
1) Have you installed x64 java (and I still stick with build 26) on the UMP and forced the controller to use the 64bit java jar files? I assume you have with 8GB of memory, but this is something they don't tell you about and it needs to be done.
2) How much memory are you allocating for the Java startup parameters for dashboard_engine and WASP? Maybe you can increase?
3) What are the update intervals in the dashboard_engine? Maybe you can increase as well?
4) In the WASP --> Setup --> Nimpool can you up the Max Active connections and lower the Timeout?
Sorry for the delay in response and thanks for the assistance.
1) I'm using java pacakge that comes with Nimsoft which looks to be 32-bit version 1.6.0_24. How would I force the controller to use a 64bit version of java jar files?
2) java startup parameters for dashboard_engine are java_mem_max - 1024 and java_mem_init is 256. For wasp it's java_mem_max - 1024, java_mem_int - 512 and max_perm_size - 512.
3) update intervals for dashboard_enginer are - client data update - 30, dynamic views summary data - 60, backend system changes check - 30, variable server data retrieval - 30, nis data retrieval - 30, and dashboard NIS queries - 60
4) For WASP I have max active connections at 50 for nimpool and the timeout of 180
To answerthe orignal question in this thread I have a lua I run from the nas. Here you go:
We will be adding the ability to have metrics reported added as a future enhancement to the data_engine. This would let you see when new messages start to be sent, but won't indicate where they are coming from, I think you would still need to run a sql query to see what hosts recently started reporting metrics, which you could get from querying S_QOS_DATA.
Retrieving data ...