- Article Type: General
- Product: Aleph
- Product Version: 20
We have had 3 instances (2 on production and 1 on pre-prod) where the ue_19 logs (in the xxx40/scratch) show
Lost the oracle connection
Oracle error: update_cursor z700
ORA-03114: not connected to ORACLE
* No ORACLE connection - process is terminated... *
We will also see this in the xxx30/scratch ue_01_a logs. The first time this happened in production was Sept. 29. Our DBA sent me this:
The Oracle Diagnostics tell me what the top wait events and top SQL statements were during that time frame. They don’t tell me what ran or what died at an exact time, and I did not see any report or batch errors in /exlibris/aleph/prod occurring at exactly 10:00 (although there was a terrific I/O bottleneck building at that time). The top wait events between 9:00am and 11:00am that day were “log file sync” and “User I/O”. The top SQL statements were those used in indexing Bibs (SQL for Z00 and Z00R) and indexing patrons (SQL for Z353 and Z111).
It happened on pre-prod on Nov. 14. Our DBA noted:
When I look at Lily’s Oracle logfile, I see that there are many Oracle connection errors beginning just after midnight Friday (early Saturday morning, 12:29 am):
ORA-609 : opiodr aborting process unknown ospid
This error is caused by a “lost” connection between Tulip and Lily. We had the same problem on 9/29/2010 and had to reboot; I also see errors on 10/5/2010, but don’t recall anyone reporting a problem that day. The DBA forums I searched point to this being caused by a network or firewall issue… I can’t find any other reason.
Production had the same problem on 11/17, which we resolved by doing a reboot.
We are alerted to the problem when regular checks we run to see that all ue processes are running report that some ue_19 processes are not running. We will restart them and monitor, often restarting the same processes repeatedly.
What is causing this? Is it a LINUX problem, a hardware/firmware problem, the fact that aleph and oracle are on separate servers, or something else?
Please examine KB's 16384-18898 and 16384-33297. These and other non-KB Support Incidents indicate that the message "ORA-03135: connection lost contact" has to do with (transient) network problems.
You have a two-task environment, with a separate database server.
I believe that when you restart the ue_17 process and after a while get the error message again it is because the transient network problem which caused the first instance has occurred again.
[From site:] we have not yet been able to determine the cause; it has not happened again.
- Article last edited: 10/8/2013