Automic Workload Automation

  • 1.  Random WP Crashed

    Posted Sep 06, 2018 11:05 AM

    Good Morning,

    We upgraded our backend DB from Oracle 11g to 12c approximately 3 weeks ago.  There have been no issues and we are shooting to do the same upgrade this coming weekend.

    I did some further digging to ensure everything has been functioning as normal and noticed  2 WPs had randomly crashed at different times in the last 10 days.  They came back online within seconds, so the system did what it was supposed to do, but it was just very odd and random.

    When I look at the crash, all events before the crash were operating as usual and this is what's in the logs before the dmp file was created and the WP cycled itself:

    20180901/125526.675 - U00029108 UCUDB: SQL_ERROR    Database handles  DB-HENV: 163ccf10  DB-HDBC: 16417848

    20180901/125526.675 - U00003591 UCUDB - DB error info: OPC: 'OCIStmtExecute' Return code: 'ERROR'

    20180901/125526.675 - U00003592 UCUDB - Status: '' Native error: '3135' Msg: 'ORA-03135: connection lost contact

    Process ID: 31680

    Session ID: 214 Serial number: 22404'

    20180901/125526.675 - U00003536 UCUDB: FATAL DATA BASE ERROR: Re-connection will be attempted in 10 seconds.

    20180901/125526.675 - U00003537 UCUDB - RECONNECT: DB call 'OCITransRollback': Return code: '-1'.

    20180901/125526.675 - U00003590 UCUDB - DB error: 'OCITransRollback', 'ERROR   ', '', 'ORA-03114: not connected to ORACLE'

    20180901/125526.675 - U00003592 UCUDB - Status: '' Native error: '3114' Msg: 'ORA-03114: not connected to ORACLE'

    20180901/125526.737 - U00003538 UCUDB: Re-connection to database successful. Processing will continue.

    20180901/125526.753 - U00000006 DEADLOCK

     

    I asked our DBAs to cross reference the logs on the DB server for any errors on our DEV DB at those times and there are no errors.  Now, it is of course DEV so could have been a temporary communication issue, but just wondering if anyone else has come across this before when moving from 11g to 12c, or even those who may already be on 12c.

     

    Our current version of Automic is 11.2.6FH1 and we are running on Windows 2012R2 with Oracle backend.

     

    Thanks!!

    Tina



  • 2.  Re: Random WP Crashed

    Posted Sep 10, 2018 03:43 AM

    Hi Tina,

     

    is the Oracle DB in the same Network, maybe a firewall between what is causing the disconnect, work on VM switch configurations if the Automation Engine is running in a virtual machine, ... ? Often it's not the DB, more issues caused on one of the Network resources.
    Also please check if the oracle client on the Automation Engine Server was updated too and all recommended settings in Oracle 12c was checked in case of an update of the oracle instance or done, if this was a migration of the DB to a new server/instance to be on the new oracle version.

    Best,

    Franz  



  • 3.  Re: Random WP Crashed

    Posted Sep 11, 2018 10:12 AM

    Hi Franz,

     

    Yes, all in the same network, and all clean logs across the board.  Unfortunately, we did have a crash in PROD this morning, but again just 1 WP and as we're not a huge shop, the WPs took on the other loads successfully.

    We did instal oracle 12c client on our Windows server, and the settings are at the front of the environment path.  The Oracle server is still the same and only our instance was updated (I'd even say the app seems to perform better on 12c).  Interestingly enough, our DBA has just found that there is a Db setting called resource_limit which was set to true and he noticed that was false in DEV.  A quick search found that this might be the cause as per this article.
    Oracle Parameters 

    I'm not sure if this will full solve the issue and right now we will have to monitor and see how it goes.  If anyone else has come across this before, please let me know.

    Thanks!!



  • 4.  Re: Random WP Crashed

    Posted Sep 12, 2018 07:57 AM

    Hi

     

    We just upgraded our DBs from 11g to 12C some weeks ago (DEV+TEST System) and observed no issues.

     

    The link to oracle parameters is a good hint for the DBAs and Automic Admins, we did check the parameters as well after DB Upgrade.

     

    AE: 11.2.3+HF2 / Linux / ORA12C

     

    Regarding your WP crashes - what does the DMP file say?

     

    ORA Client is a hot hint :-)

     

    We have a similar issue with crashing WPs with another ORA error message that occurs mostly after a DB reorg (index rebuild, ...) thats fixed with SP6.

     

    cheers, Wolfgang



  • 5.  Re: Random WP Crashed

    Posted Sep 12, 2018 11:21 AM

    Hello,

    So our DBAs told me the settings are all in place as per the document which is great.  When Oracle 12 client was installed, sqlnet.ora and tnsnames.ora were copied over from the previous Oraclehome directory.  I've been doing some digging myself because other than the actual DB upgrade, that would be the only changed item.  I might be missing something but where are the documents for the actual Oracle client settings?

    When I look at the dump files, I can see that sometimes a specific job is in the list, but it runs very frequently and isn't always causing the dump.  This process also hasn't changed so it's odd to me that upgrading the backend DB would now cause this.

    In our systems, when we've done our weekly maintenance, we haven't run in to any issues, so we're happy about that

    If there's anything else I'm missing, please let me know.

    Thanks for all of the help!!



  • 6.  Re: Random WP Crashed
    Best Answer

    Posted Oct 12, 2018 10:02 AM

    Good Morning,

    So I had a similar question, bu want to make sure our final workaround was posted here as well.  The setting we had to apply in sqlnet.ora in our Oracle Client was SQLNET.SEND_TIMEOUT=60000.  This setting also had to be applied to the same file on the server side as well, so we had to coordinate this update with our DBAs.

    No restarts were required, so that was very helpful.

    Thanks for the guidance!