Oracle comes down; not sure why
- Article Type: General
- Product: Aleph
- Product Version: 17.01
Description:
About an hour ago I was notified by staff that they couldn’t connect to the GUI client. I checked the servers and almost all the servers were down and almost all the processes were down as well. I looked at the z0102 log and saw this error:
Oracle error: fetch z01
ORA-03113: end-of-file on communication channel
Oracle error: fetch z02
ORA-03114: not connected to ORACLE
***************************************************
* No ORACLE connection - process is terminated... *
***************************************************
We took Apache, Oracle and Aleph down and restarted them all. So far everything is running ok but it isn’t clear what happened. We’ve never had this problem before. Our Oracle team is looking into but I was wondering if you have seen this problem before. We are on version 17.
Processes such as the ue-01 in the ABC30 and the ue-11 and batch queue in ABC50 never went down.
Our Oracle team reviewed the Oracle alert log and trace files. It appears that the database was running fine until it was shutdown at 2:42pm – this is when we restarted everything. They did not see any trace files written at the time Aleph became unresponsive.
So we don’t know what happened.
When I do a ‘tail’ on the other logs I don’t see any errors. It looks like they were just all of a sudden disconnected. A few examples:
Tail on the www_server:
Header: Referer <http://bison.buffalo.edu:8991/F?func=find-b&find_code=WRD&request=government+special+education>
Header: Accept-Language <en-us>
Header: User-Agent <Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)>
Header: Host <bison.buffalo.edu:8991>
Header: Connection <Keep-Alive>
WWW-F : FULL-SET-SET
2007-04-11 13:51:58 70 [000] [vrb] server_main: OUT 0.2160 27627
2007-04-11 13:51:58 89 [000] [log] read 0 data from socket
2007-04-11 13:51:58 89 [004] [log] read 0 data from socket
2007-04-11 13:51:58 89 [003] [log] read 0 data from socket
Tail on the pc_server:
pc_server_write_log.c: Value too large for defined data type
SERVICE : C0152
MODULE : Common Services
DESCRIPTION: Expand Item Information
ACTION :
PROGRAM : pc_com_c0152
2007-04-11 13:51:59 00 [003] [log] Read 5222 bytes
Resolution:
We have seen cases (see KB 8192-3623, for example) where the ue_01_xxxx processes became disconnected from Oracle, but in those cases, the problem was limited to those processes.
In this case, since the symptom was apparent in various modules, it seems that various processes had become disconnected from Oracle.
Check the Oracle bdump alert logs to see if Oracle came down and for messages in the alert log indicating problems. (You can use util o/3/1 to view the alert log -- the only exception is When the ORA_HOST is on a different server, then util o/3/1 doesn't work.)
If this doesn't help, check the xxx_server log files and run_e... logs for messages which might indicate a problem.
In regard to "pc_server_write_log.c: Value too large for defined data type", please implement the suggestions in KB 6589 -- such as not writing the pc_ser_6nnn file (see KB 3966). Though we have not previously found any connection between the "pc_server_write_log.c: Value too large for defined data type" message and the servers not functioning, eliminating it may possibly help.
Since it seems that you have done all of the preceding, we will just need to see if it happens again. If it does, you and we will need to look at it more intensively.
- Article last edited: 10/8/2013