Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    pc_server and www_server don't work; reboot doesn't help

     

    • Product: Aleph
    • Product Version: 20, 21, 22, 23
    • Relevant for Installation Type: Dedicated-Direct, Direct, Local, Total Care

     

    Description

        We are unable to access the pc_server with the GUI client.  There is no error message; it just displays a blank, generic screen.  Nothing is written to the pc_server log.
        We are unable to access the www_server/OPAC.  After hanging for a minute it gives the message:  "internal error - server connection terminated".  This attempted access doesn't write anything to the www_server log.   The log contains entries like the following, written every 30 seconds or so:        "check_timeout: network connection timeout".
        The z39_server creates processes which it fails to terminate; they gradually build up to thousands of "zombie" processes.    The oclc_server continues to function OK.
      It's likely that the www_server and pc_server processes don't appear as they should under util w/1.
      Rebooting the server doesn't help.

    Resolution

    This problem can occur in an "NFS-mount" situation, where the $alephe_tab/server_info and ./server_info_childs files are located on an NFS (network file system) device.  Due to a momentary communication outage, the server loses connection with this NFS device, and is no long able to find the processes listed in server_info and server_info_childs files: it cannot see them and cannot kill them.
    (The problem with the z39_server occurs because it uses the pc_server running on the port specified in the "hostname localhost" line in the $alephe_tab/z39_server/z39_server.conf file.)

    This problem can fixed by doing: 
      cd $alephe_tab
      rm server_info
      rm server_info_childs
      touch server_info 
      touch server_info_childs

    then
      ps -ef |grep www_server
      ps -ef |grep pc_server  

    to locate the old, obsolete processes and stopping them with the "kill" command.

    The problem can be prevented by moving the server_info and server_info_childs to local (non-NFS) device  (-- one site moved them  to /dev/shm --) and creating soft links in $alephe_tab to this new location.  
    The information in these files is temporary.  Thus, the new files can and should be empty.  But the old processes need to be terminated.  Under normal circumstances, running $alephe_root/aleph_shutdown as  the first step should do that.  But do the following to confirm that they are gone:
      ps -ef |grep www_server
      ps -ef |grep pc_server  

    Then.... Example of moving the files to local device:
      cd $alephe_tab
      rm server_info
      rm server_info_childs
      ln -s /dev/shm/server_info  server_info
      ln -s /dev/shm/server_info_childs  server_info_childs
      cd /dev/shm
      touch server_info 
      touch server_info_childs

    (Then run $alephe_root/aleph_startup.)

    A related article is:  server_monitor (and util w/1) don't show anything running .

     


    • Article last edited: 30-Aug-2017
    • Was this article helpful?