DX Unified Infrastructure Management

  • 1.  Alarm History - nothing returned

    Posted Sep 19, 2008 05:21 AM

    When I run an alarm history query ( right-click history ) and choose any time period ( hour, today, etc ), the query will spin and then return an empty results screen.  The NAS log indicates: dbGetTransactions exec error(7): temporarily out of resources although resources on the server are not tapped.  The server that the NAS runs from is dedicated to NimBUS core functions: reporting, NAS, Web, etc.  I do not have any robots running on this server.  I've also tweaked NAS configurations to try to fix issue: Admin interval: daily, compress after 7 days, keep hist 31 days, trans summary 30 days...100 duplicate messages. transaction logging enabled.

    I've updated to NAS 3.15 and it's worse now.  Prior to 3.15, I could run history requests and would get a response after 3 to 5 requests, now I get nothing.


    Has anyone had similar issue?  And if so, were you able to resolve? 



  • 2.  Alarm History - nothing returned

    Posted Sep 19, 2008 10:15 PM
    Quote: (swright@raleys.com)

    When I run an alarm history query ( right-click history ) and choose any time period ( hour, today, etc ), the query will spin and then return an empty results screen.  The NAS log indicates: dbGetTransactions exec error(7): temporarily out of resources although resources on the server are not tapped.  The server that the NAS runs from is dedicated to NimBUS core functions: reporting, NAS, Web, etc.  I do not have any robots running on this server.  I've also tweaked NAS configurations to try to fix issue: Admin interval: daily, compress after 7 days, keep hist 31 days, trans summary 30 days...100 duplicate messages. transaction logging enabled.

    I've updated to NAS 3.15 and it's worse now.  Prior to 3.15, I could run history requests and would get a response after 3 to 5 requests, now I get nothing.


    Has anyone had similar issue?  And if so, were you able to resolve? 


    I have seen problems like this when there has been high disk I/O, typically when driving the SLM database on the same server, lots of reports for the report_engine to chew etc.  The 'out of resources' message is actually a timeout on the sql to the database, telling the caller to retry. A simple test is to run a commandline utility sqlite3.exe (http://www.sqlite.org/sqlite-3_6_2.zip) on the transactionlog.db and time the request, this will indicate that something outside the nas is causing the I/O issues.  The size of the transactionlog.db may also cause slower responses.

    Assuming you've installed the sqlite3.exe in C:\temp, and you are currently in the NAS directory:

    \temp\sqlite3 transactionlog.db "SELECT COUNT(*) FROM NAS_TRANSACTION_SUMMARY"





  • 3.  Alarm History - nothing returned

    Posted Sep 20, 2008 04:17 AM
    Assuming 153,785 is too large and is the reason for my timeout.  How do I reduce that number?
    Here are my settings:
    Admin interval: every hour  ( I had same issue when it was daily )
    Compress trans: 7 days
    Keep trans history: 30 days
    Keep trans summary 30 days.
    Log alarm updates every 100 duplicate messages.



  • 4.  Alarm History - nothing returned

    Posted Sep 22, 2008 08:32 PM
    Quote: (swright@raleys.com)
    Assuming 153,785 is too large and is the reason for my timeout.  How do I reduce that number?
    Here are my settings:
    Admin interval: every hour  ( I had same issue when it was daily )
    Compress trans: 7 days
    Keep trans history: 30 days
    Keep trans summary 30 days.
    Log alarm updates every 100 duplicate messages.


    The query I gave you, queried the smallest table (NAS_TRANSACTION_SUMMARY), the table that in your case contains alot more than your 150k rows (which actually is nothing), is the NAS_TRANSACTION_LOG table. This table contains all the alarm events between the Open and Close. So if you have alot of suppressions they will be recorded here.  My settings are:

    Compress trans: 2 days
    Keep trans history: 7 days
    Keep trans summary 30 days.
    Log alarm updates every 1000 duplicate messages.

    The summary table was introduced in NAS 3.xx and contains the vital information about the alarm, prior to this version the alarm transaction was "compiled" by reading all the necessary transaction-logs.

    The NAS performs housekeeping every "admin interval", and compresses the database(s) at 00:30 every night.


  • 5.  Alarm History - nothing returned

    Posted Sep 24, 2008 04:25 AM
    Thanks Carstein, I changed the settings to 21 days trans, 31 days summary and 1000 duplicate messages.  The history reports are very fast and dependable now.  I haven't had one timeout since making the changes. 


  • 6.  Alarm History - nothing returned

    Posted Oct 01, 2008 07:16 AM
    Carstein,

    So I guess receiving a count of 744,581 rows is a lot then...  :-)

    I also have the default settings configured in our NAS, so I might have to see if we can afford to keep less history.  We experience this same problem, although I do not think it has been as bad for us.  Do you have any recommendations on database size (in MB or number of records) for the transactionlog.db?  On disk, our currently takes up 767 MB.

    Can you also explain what exactly happens every "administration interval" and what it means to "compress transactions" after X days?  I think I understand what it means to keep the history and the summary for the configured number of days.

    Thanks,
    Keith


  • 7.  Alarm History - nothing returned

    Posted Oct 02, 2008 04:26 AM

    Well, I have tested the SQLITE database against 5 million rows, and it responded quickly when extracting a subset of the data.  Things get bad when a complete selection is queried, and the NAS needs to allocate this in memory prior to sending.  Anyways, the UI's will choke to death :-)

    I have done changes to the memory allocation routines regarding the database queries and they should return much faster now (3.16).  The general consensus in the SQLITE community is that dbfiles > 1GB is something one should avoid.  I'll run some tests with the latest (and greatest) to see where the threshold is.

    The work done on the "administration interval" is:
      On the NAS_TRANSACTION_LOG table:
       Deletes rows with time older than 'transaction history days'
       Deletes rows of type 2 and 16 (suppress) with time older than 'compress transaction ..'

      On the NAS_TRANSACTION_SUMMARY table:
       Deletes rows with time older than 'transaction summary days'

    In addition to this every night at 00:30 a database reorganize is run.



  • 8.  Alarm History - nothing returned

    Posted Oct 02, 2008 02:04 PM
    Carstein,

    Thank you again; this is very good information.

    When you say that compression involves deleting rows of type 2 and 16 (suppress), what types does that leave then?  Just making sure I know what I can and cannot access in the alarm history at any point in time.

    Does the reorganization of the database simply mean the data is moving around on disk, but the actual database contents remain the same from a query perspective?  I am currently guessing it might be like reorganizations in MS SQL Server, which I only understand at a rather basic level.

    Thanks,
    Keith


  • 9.  Alarm History - nothing returned

    Posted Oct 02, 2008 08:22 PM
    This leaves the New,Close,Ack and Assign types in the transaction_log table.

    Reorganization also means a physical truncation of the database file. Actually a copy of the database is made only containing the rows that are not marked for delete.

    NAS performs the VACUUM (http://www.sqlite.org/lang_vacuum.html) primitive during its nightly housekeeping.