System slow, high cpu; pc_server; z39

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Article Type: General
Product: Aleph
Product Version: 20, 21, 22, 23

Problem Symptoms:
A. Intermittent problems with very slow response and 0% cpu idle time for extended periods (up to several hours unless we interfere). "top" shows numerous pc_server processes using 90+% cpu. Load averages go well above 50 during these times. Disk i/o does not seem to be an issue.

B. It seems that Z39.50 requests are involved. Restarting the pc server usually fixes this problem, at least for a while. Restarting the z39_server also fixes it.

C. OCLC's World Cat DISCOVERY having problems getting availability info; it is asking for availability for multiple (10) titles at a time - often timing out before the results can be displayed on their "brief screen". OCLC seeing message: "z39 clients max session".

Cause:
Not having a separate pc_server for z39.50, and not optimizing z39_server parameters.

Resolution:
A. Start a separate pc_server for z39.50. $alephe_tab/z39_server/z39_server.conf contains a hostname parameter, such as, "hostname localhost:6992", which determines what pc_server the z39_server uses. It is common to start a separate pc_server for z39.50 in larger installations.

B. Improve the performance of z39.50 as described in the ./Ex Libris Documentation/Aleph/Technical Documentation/Z39.50 SRU (Cross-Version)/z39/z39_server document:
To improve performance, z39_server stores the records returned by pc_server in the internal cache. In ./aleph/proc/z39_server script two environment variables are defined:

1. z39_server_cache_size ?€“ defines the size of the cache (for example, z39_server_cache_size 100). If z39_server_cache_size is 0 or undefined, the records returned by pc_server are not cached.

2. z39_server_present_size ?€“ defines how many records z39_server requests from pc_server in each interaction. (for example z39_server_present_size 10). The default is 1.
<end quote from doc>

C. In the case of the OCLC World Cat DISCOVERY, the problem seemed to have been around the ipfilter on the port redirect from 210 to 9991. The session states were being lost. (Perhaps that caused havoc with the tracking of the 30 max sockets?)
Once we got OCLC to point directly to port 210, the problem went away. If there is a future bottleneck again, we will tell OCLC to increase their max_sockets, as the operating/license on our side should be able to handle up to 250.

Article last edited: 20-Oct-2016