DX Unified Infrastructure Management

  • 1.  QoS Delete is breaking the data_engine processing

    Posted Jan 23, 2015 11:04 PM

    Has anyone else noticed this. I'm stil on UIM v7.5 running the latest hub version.

     

    Whenever anyone does a massive QoS delete, either using the UMP's SLM tool, the Win32 SLM tool or a direct SQL Query, these delets are killing the database and it breaks the data_endinge. My data_engine stops working and processing data from that point on.

     

    Has anyone else come across this issue? I then try to stop, start the data_engine which doens't fix the issue. The D_E starts for 1 second then stop processing msgs in the data_engine queue. processes about 50k-100k messages but stops when there is a back up of about 2M+ in the D_E queue.

     

     

    When it gets into this state I have to do a full STOP/Start on my primary hub. 



  • 2.  Re: QoS Delete is breaking the data_engine processing

    Posted Jan 23, 2015 11:12 PM

    Hi,

     

    This is due to delete requiring a table / page level lock. data_engine is not smart enough to work around this. You might somewhat work around it with having several threads inserting stuff, but even those might all end up, depending on what kind of qos they are inserting.

     

    Lets say you're deleting data from rn_qos_data_0001. You're removing a servers data for a year or so. The delete might take a long time to process, and within that time no thread can insert to it simultaneously. Data_engine handles this badly and it leads to a restart loop. Now, I also remember that you had a thread about your DB being slow, so for you this might be a particularly nasty situation, if deletes take a long time.

     

    One way to work around this is to use database partitioning.

     

    Also.. UMP SLM. At least in 6.6 UMP this was a truly horrible tool. Quite often I ran into a situation where it somehow hung and data_engine couldn't insert data to the DB before WASP was restarted. Don't know if this has been improved in subsequent versions.

     

    -jon



  • 3.  Re: QoS Delete is breaking the data_engine processing

    Posted Jan 24, 2015 12:38 AM

    Thanks Jon. I think from now on we are just not going to delete anything from now on. This just takes an absurd amount of time to do the delete even on just one box with 180 days max. This is our retention period and just today we did straight SQL Query deletes on 1 box and was taking > 45minutes. The D/E turned yellow and would not start up or process any more messages after a stop, start. Eventually it did come back but out of the blue. Not something that I did myself. 

    Thanks.



  • 4.  Re: QoS Delete is breaking the data_engine processing

    Posted Jan 24, 2015 07:28 AM

    Yep, have the same experience.

     

    You are recommended to deactivate data_engine, do the delete, then activate data_engine. It's also suggested that if you are using UMP, that you bounce that too.

     

    Database partitioning will help the process of deleting the oldest data and it gets you more parallelism options in the SQL query optimizer.

     

    On the other hand, it's not well documented and if you have issues support will not be much help.

     

    Also since partitioning is done by day, if you keep 180 days of data, that means that any tables go from having one partition to 180 which makes some of the functions that display table usage run slower.

     

    Also, at least on MS SQL server, only 1000 partitions are allowed so you can only keep up to 985 days of data (partitioning creates 14 days into the future plus today).

     

    -Garin



  • 5.  Re: QoS Delete is breaking the data_engine processing

    Posted Jan 24, 2015 09:14 AM

    I believe 64-bit sql supports 15k partitions per table/index, starting from sql server 2012.

     

    -jon