DX Unified Infrastructure Management

Expand all | Collapse all

Large amounts of QOS data - Size and # of Rows are out of control

  • 1.  Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 18, 2017 08:24 AM

    Large amounts of QOS data - Size and # of Rows are out of control. I am new to UIM and we just stood up the solution. We are about 1 month into the process of getting UIM up and migrating our current monitoring into it. The database crashed last week due to the disk running out of space. We run mysql on RHEL 7.2 and UIM version 8.51. I found the data retention settings in the data_engine probe and changed them from the default (delete raw data older then 180 data and historic data older than 720 days). Originally I changed it to 30 and 60 days, then I read that I shouldn't change it that drastically as it can cause problems with the maintenance process. Anyways I cant seem to get the maintenance process to run. I set the log level to 3 and when I tick Start Now under the Schedule tab nothing happens. Number of rows and size of certain data is not changing and I see no log entries about cleaning up old data, also files on the disk are still huge. Is there a way to manually run the maintenance process or am I missing something else?

     

    Thanks for any help, I am pretty new to UIM so I may need some hand holding....



  • 2.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 18, 2017 09:54 AM

    The thing to do is manually delete the rows that are too old via direct query. I only kind of know this stuff and usually have to spend a good amount of time figuring tings out.

     

    For deletes I always start with a select, and in this case something like

    select from_unixtime(sampletime), count(*) from RN_QOS_DATA_0001 where sampletime > unix_timestamp() - (3*86400) group by tstamp

     

    (3*86400) is just a random value, so you would have to test and modify accordingly. Then when you got it working, backup and then change the select to the delete. The RN_ table name will have to be changed for each one.

     

    HeidiSQL is a great tool that makes working with mysql much easier, but not sure if there is a linux version.

     

    Maybe someone else better at sql has a script or better suggestion.



  • 3.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 18, 2017 10:17 AM

    Hello,

    Large amounts of data are usually a sign that polling interval in the probes are too low, and capturing too much data. You mentioned you only have the system 1 moth so at this point data retention should yet be a factor.

     

    To find out what probes are capturing so much data, so you can change the polling interval you can run the below query:

     

    Query to determine which probes/sources are generating the most QOS
    -- (Same results as the data_engine Status button but adjusted for bigint)
    create table #t (id bigint, name varchar(max), rname varchar
    (max), hname varchar(max), dname varchar(max), bname
    varchar(max),
    rrows bigint, hrows int, drows int, brows int, rsize
    int, hsize int, dsize int, bsize int)
    insert #t (name,rname,rrows,id, hname, dname, bname)
    SELECT sqd.name, object_name(i.object_id), sum(p.rows)
    as rowCnt, sqd.qos_def_id,
    replace(object_name(i.object_id),'RN','HN'), replace
    (object_name(i.object_id),'RN','DN'), replace(object_name
    (i.object_id),'RN','BN')
    FROM
    sys.indexes i INNER JOIN sys.partitions p ON i.object_id =
    p.object_id AND i.index_id = p.index_id
    INNER JOIN s_qos_definition sqd on
    sqd.qos_def_id = cast(replace(object_name
    (i.object_id),'RN_QOS_DATA_','') AS int)
    WHERE object_name(i.object_id) like 'RN_QOS_DATA_%'
    AND i.index_id <= 1
    GROUP BY sqd.name,i.object_id, i.[name], sqd.qos_def_id

     

    To see the table results, then run the SELECT portion:
    SELECT sqd.name, object_name(i.object_id), sum(p.rows)
    as rowCnt, sqd.qos_def_id,
    replace(object_name(i.object_id),'RN','HN'), replace
    (object_name(i.object_id),'RN','DN'), replace(object_name
    (i.object_id),'RN','BN')
    FROM
    sys.indexes i INNER JOIN sys.partitions p ON i.object_id =
    p.object_id AND i.index_id = p.index_id
    INNER JOIN s_qos_definition sqd on
    sqd.qos_def_id = cast(replace(object_name
    (i.object_id),'RN_QOS_DATA_','') AS int)
    WHERE object_name(i.object_id) like 'RN_QOS_DATA_%'
    AND i.index_id <= 1
    GROUP BY sqd.name,i.object_id, i.[name], sqd.qos_def_id
    ORDER BY rowCnt DESC

     

    Please note "Maintenance in not recommended for large tables (over 10 GB)".  So to delete the data past a period of time you can use the query below:

     

    Query to purge data based on the age of the data
    --Purge of data by date. See date (as per sampletime below)
    --Here is the Script to check WHAT you'll be deleting:
    declare table_id_cursor cursor for select table_id from s_qos_data
    declare @sql varchar(1000)
    declare @r_table varchar(64)
    declare @h_table varchar(64)
    declare @table_id int
    OPEN table_id_cursor
    FETCH NEXT FROM table_id_cursor into @table_id
    WHILE @@FETCH_STATUS = 0
    BEGIN
    set @r_table = (select r_table from S_QOS_DATA where table_id = @table_id)
    set @h_table = (select h_table from S_QOS_DATA where table_id = @table_id)
    set @sql = 'select COUNT(*) from ' + @r_table + ' where table_id = ' + CAST(@table_id AS varchar(10)) + ' and sampletime < ''2016-05-30 00:00:00'''
    EXECUTE(@sql)
    set @sql = 'select COUNT(*) from ' + @h_table + ' where table_id = ' + CAST(@table_id AS varchar(10)) + ' and sampletime < ''2016-05-30 00:00:00'''
    EXECUTE(@sql)
    FETCH NEXT FROM table_id_cursor into @table_id
    END
    CLOSE table_id_cursor
    DEALLOCATE table_id_cursor
    -- Here is the Script to actually DELETE the data (BACK UP YOUR DB FIRST!!!)
    -- Note the delete statements are currently limited to the top 5000 rows so the operation can complete. This value can be adjusted.
    -- You can use the TOP clause to limit the number of rows that are deleted in a DELETE statement.
    -- When a TOP (n) clause is used with DELETE and no where clause, the delete operation is performed on a random selection of n number of rows.
    -- Use a SELECT statement and data in the wehere clause to verify deletion
    declare table_id_cursor cursor for select table_id from s_qos_data
    declare @sql varchar(1000)
    declare @r_table varchar(64)
    declare @h_table varchar(64)
    declare @table_id int
    OPEN table_id_cursor
    FETCH NEXT FROM table_id_cursor into @table_id
    WHILE @@FETCH_STATUS = 0
    BEGIN
    set @r_table = (select r_table from S_QOS_DATA where table_id = @table_id)
    set @h_table = (select h_table from S_QOS_DATA where table_id = @table_id)
    set @sql = 'delete top (1000) from ' + @r_table + ' where table_id = ' + CAST(@table_id AS varchar(10)) + ' and sampletime < ''2016-05-30 00:00:00'''
    EXECUTE(@sql)
    set @sql = 'delete top (1000) from ' + @h_table + ' where table_id = ' + CAST(@table_id AS varchar(10)) + ' and sampletime < ''2016-05-30 00:00:00'''
    EXECUTE(@sql)
    FETCH NEXT FROM table_id_cursor into @table_id

    END
    CLOSE table_id_cursor
    DEALLOCATE table_id_cursor

     

    Please note any changes to the DB are unsupported, and I would recommend you open a support case to investigate.

     

    Hope this helps. 

     

    Thanks



  • 4.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 18, 2017 10:56 AM

    Thanks for the reply. I am using HeidiSQL as David mentioned above, however the select statements are popping syntax errors. Should I be able to just pop these statements into a SQL Statement execute window and fire them off... 



  • 5.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 18, 2017 11:28 AM

    Sorry I forgot to mention these commands are for MSSQL. They will need to be adapted for other DB's. But yes they should be pretty straight forward.



  • 6.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 19, 2017 07:55 AM

    OK Im going to submit a ticket. So to confirm - my understanding is now that I have set a proper data retention period for QOS any new metrics that pass this will be removed during the maintenance process. Anything that was out there already is basically forgotten about? 



  • 7.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 19, 2017 10:52 AM

    Very helpfull this queries! Now, another question: based on tables I got, for example: 

    QOS_INTERFACE_BYTESIN RN_QOS_DATA_1062 284355835 1062 HN_QOS_DATA_1062 DN_QOS_DATA_1062 BN_QOS_DATA_1062

     

    How can I know from where or who is causing this ton of data (origin and robot) ?

     

    Thank you.

     

    Clecimar



  • 8.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 18, 2017 01:14 PM

    Check out the post linked below for the LUA script named "removeOldQosData". This script can be used to help clean up QoS by age prior to retention period or debugging purposes. I run this script plus version that have adapted to query by probe specific objects. The output from the script can then be run from DB management tools.

     

    Another few items. Can also define QoS individually for retention if you know there is content that is not required for long-term retention. If running DB2 or exchange_monitor probes with certain checkpoints enabled both those probes are known to produce high volumes of CI objects.

     

    Deleting Servers from Nimsoft 



  • 9.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 22, 2017 08:10 AM

    Can someone please post some mysql statements that I can use to delete QOS data older than "X" days? It would be much appreciated as CA said the DB is ours to support. It would be nice to identify the probes causing the issue but for now I just need to get this issue under control. Thanks for any assistance....



  • 10.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 22, 2017 10:17 AM

    Take look at the previously linked post for the "removeOldQosData" script. You need to update the DB source in the code comments to change from "sqlserver" to "mysql". This script will produce DB strings with all the RN tables subject and that would need to be executed from db management console. Would suggest reviewing this in sandbox or development environment first to determine if achieves desired results. As mentioned in prior reply, I have gone as far as extending the referenced script to work against specific probes. Also if you run the script and prior to executing any produced "delete" commands you can perform select against those tables to see which data is being deleted.

     

    For probe specific, modified line 17 to the following. On each run replace the probe = content.

    sSql = "select distinct r_table from s_qos_data where probe = 'replace_with_desired_probe_name'"

     



  • 11.  Re: Large amounts of QOS data - Size and # of Rows are out of control

    Posted May 22, 2017 10:35 AM

    Also, go under Quality of Service in data_engine on the primary hub to review QoS object names and group. Can query s_qos_definition table to correlate your RN table relationship.