Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Oracle comes down; not sure why

    • Article Type: General
    • Product: Aleph
    • Product Version: 17.01

    Description:
    About an hour ago I was notified by staff that they couldn’t connect to the GUI client. I checked the servers and almost all the servers were down and almost all the processes were down as well. I looked at the z0102 log and saw this error:

    Oracle error: fetch z01
    ORA-03113: end-of-file on communication channel

    Oracle error: fetch z02
    ORA-03114: not connected to ORACLE

    ***************************************************
    * No ORACLE connection - process is terminated... *
    ***************************************************

    We took Apache, Oracle and Aleph down and restarted them all. So far everything is running ok but it isn’t clear what happened. We’ve never had this problem before. Our Oracle team is looking into but I was wondering if you have seen this problem before. We are on version 17.

    Processes such as the ue-01 in the ABC30 and the ue-11 and batch queue in ABC50 never went down.

    Our Oracle team reviewed the Oracle alert log and trace files. It appears that the database was running fine until it was shutdown at 2:42pm – this is when we restarted everything. They did not see any trace files written at the time Aleph became unresponsive.

    So we don’t know what happened.

    When I do a ‘tail’ on the other logs I don’t see any errors. It looks like they were just all of a sudden disconnected. A few examples:

    Tail on the www_server:

    Header: Referer <http://bison.buffalo.edu:8991/F?func=find-b&find_code=WRD&request=government+special+education>

    Header: Accept-Language <en-us>
    Header: User-Agent <Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)>
    Header: Host <bison.buffalo.edu:8991>
    Header: Connection <Keep-Alive>

    WWW-F : FULL-SET-SET

    2007-04-11 13:51:58 70 [000] [vrb] server_main: OUT 0.2160 27627
    2007-04-11 13:51:58 89 [000] [log] read 0 data from socket
    2007-04-11 13:51:58 89 [004] [log] read 0 data from socket
    2007-04-11 13:51:58 89 [003] [log] read 0 data from socket

    Tail on the pc_server:

    pc_server_write_log.c: Value too large for defined data type

    SERVICE : C0152
    MODULE : Common Services
    DESCRIPTION: Expand Item Information
    ACTION :
    PROGRAM : pc_com_c0152

    2007-04-11 13:51:59 00 [003] [log] Read 5222 bytes

    Resolution:
    We have seen cases (see KB 8192-3623, for example) where the ue_01_xxxx processes became disconnected from Oracle, but in those cases, the problem was limited to those processes.

    In this case, since the symptom was apparent in various modules, it seems that various processes had become disconnected from Oracle.

    Check the Oracle bdump alert logs to see if Oracle came down and for messages in the alert log indicating problems. (You can use util o/3/1 to view the alert log -- the only exception is When the ORA_HOST is on a different server, then util o/3/1 doesn't work.)

    If this doesn't help, check the xxx_server log files and run_e... logs for messages which might indicate a problem.

    In regard to "pc_server_write_log.c: Value too large for defined data type", please implement the suggestions in KB 6589 -- such as not writing the pc_ser_6nnn file (see KB 3966). Though we have not previously found any connection between the "pc_server_write_log.c: Value too large for defined data type" message and the servers not functioning, eliminating it may possibly help.

    Since it seems that you have done all of the preceding, we will just need to see if it happens again. If it does, you and we will need to look at it more intensively.


    • Article last edited: 10/8/2013
    • Was this article helpful?