pc_server and www_server don't work; reboot doesn't help
- Product: Aleph
- Product Version: 20, 21, 22, 23
- Relevant for Installation Type: Dedicated-Direct, Direct, Local, Total Care
Description
We are unable to access the pc_server with the GUI client. There is no error message; it just displays a blank, generic screen. Nothing is written to the pc_server log.
We are unable to access the www_server/OPAC. After hanging for a minute it gives the message: "internal error - server connection terminated". This attempted access doesn't write anything to the www_server log. The log contains entries like the following, written every 30 seconds or so: "check_timeout: network connection timeout".
The z39_server creates processes which it fails to terminate; they gradually build up to thousands of "zombie" processes. The oclc_server continues to function OK.
It's likely that the www_server and pc_server processes don't appear as they should under util w/1.
Rebooting the server doesn't help.
Resolution
This problem can occur in an "NFS-mount" situation, where the $alephe_tab/server_info and ./server_info_childs files are located on an NFS (network file system) device. Due to a momentary communication outage, the server loses connection with this NFS device, and is no long able to find the processes listed in server_info and server_info_childs files: it cannot see them and cannot kill them.
(The problem with the z39_server occurs because it uses the pc_server running on the port specified in the "hostname localhost" line in the $alephe_tab/z39_server/z39_server.conf file.)
This problem can fixed by doing:
cd $alephe_tab
rm server_info
rm server_info_childs
touch server_info
touch server_info_childs
then
ps -ef |grep www_server
ps -ef |grep pc_server
to locate the old, obsolete processes and stopping them with the "kill" command.
The problem can be prevented by moving the server_info and server_info_childs to local (non-NFS) device (-- one site moved them to /dev/shm --) and creating soft links in $alephe_tab to this new location.
The information in these files is temporary. Thus, the new files can and should be empty. But the old processes need to be terminated. Under normal circumstances, running $alephe_root/aleph_shutdown as the first step should do that. But do the following to confirm that they are gone:
ps -ef |grep www_server
ps -ef |grep pc_server
Then.... Example of moving the files to local device:
cd $alephe_tab
rm server_info
rm server_info_childs
ln -s /dev/shm/server_info server_info
ln -s /dev/shm/server_info_childs server_info_childs
cd /dev/shm
touch server_info
touch server_info_childs
(Then run $alephe_root/aleph_startup.)
A related article is: server_monitor (and util w/1) don't show anything running .
- Article last edited: 30-Aug-2017