DX Unified Infrastructure Management

  • 1.  data_engine spike

    Posted Dec 03, 2008 01:53 AM
    We have an intermitent problem when the data_engine queue (QoS data) on the primary hub spikes and continues to grow (upto 1GB).

    I've checked the usual places hub (queues), restarted data_engine, report_server etc and can find nothing concrete as to why this is happening or where the blockage is.

    The QoS is moving and going to the backend SQL server just not fast enough it appears. Has anyone seen this before or can they suggest anywhere to investigate or to find out if a particular set of robots is spamming the server.

    Cheers


  • 2.  data_engine spike

    Posted Dec 03, 2008 03:46 AM
    I have seen this in my environment.  in my case it
    was a volume issue.  once my interface_traffic QOS tables reached
    160,000,000+ row i started seeing this. I had several factors that
    contributed to this. but the net result was a lot of read traffic
    interfering with updates.

    1) The group server was generating
    reads against these 6 tables to update the dynamic group total pages. 
    i changed the interval for updates out to 900 which reduced this some.

    2)
    dynamic reports created a lot of read traffic on these 6 tables.  with
    2700+ interfaces there wasn't anything i could do for this.  my
    interval was already set to 8 hours between runs for last week and
    month and 4 hours for last day..


    my resolution was to reduce the number of rows in these 6 tables by modifying 2 setup parameters on the interface traffic probe.

    1) poling interval changed from 300 to 525
    2) samples to base average on was changed from 6 to 4
    this
    will reduce the number of rows in those tables by a little less than
    half.  if i remember my calculations it should result in 90,000,000 +-
    rows per table for 90 days of data.
    3) reduced history to 1 day and set the raw data to 95 days.

    i
    am about a month into this change and the tables are around 115,000,000
    rows.  my problem went away around 140,000,000.  this will be based
    largely on the performance of your sql server and the disks that the
    database is installed on.

    places to look and see if your problem is similar to mine.
    open perfmon on your sql server and check the following counters

    cpu %
    (on drive that has your data file)
    avg. disk queue Length  - mine was averageing 1.5 or less
    disk bytes/sec
    disk reads/sec
    disk writes/sec
    disk transfers/sec

    if
    your problem is similiar to mine you will see a high number of
    reads/sec  a high transfers/sec -   writes/sec will be minimul.


    You
    will also see in the report engine where it take a long time go
    generate reports and it will take a long time to do the bulk inserts
    that relate to these 6 large tables. (this is how i identified
    initially what tables were involved in the slowdown.  I was seeing bulk
    inserts take 20 to 30 seconds per table for 4 of the 6 tables.)

    My
    sql server is idling the whole time (10 to 15% utilization)  my disks
    (high speed san disk) are capable of higher transfer rates than i was
    experiencing.  it seems that it is simply a matter of the time it takes
    sql server to process the request and the total number of requests... 
    I did not do any additional sql tuning other than normal maintenance. i
    may have been able to tweek some of the setting to reduce the response
    time of these queries. Multi-threading the data_engine would have
    helped in my case i beleive, but that is not possible with the current
    version of the data engine.

    hope this helps a little.