- Article Type: General
- Product: Aleph
- Product Version: 20
Since our production upgrade from v19 to v20 last week, we've had widespread reports of significant response-time problems from both staff (GUI users, reporting many "failed to read reply" errors and general system slowness) and users (Web OPAC users, reporting slow search response, sometimes with blank results screens that correspond to broken pipe errors in the www/apache logs).
We're seeing similar problems on the back end of ALEPH (e.g., standard batch jobs are typically taking 2-3 times longer to run on v20 than on v19).
Our campus IT group has been monitoring the servers hosting ALEPH and Oracle and report that CPU and memory metrics appear to be okay. We did move to a two-task environment with the upgrade to v20, and have checked and re-checked the two-task settings to make sure they are correct.
We have checked that all of our indexes are present and valid.
Are there any Aleph or Oracle settings/parameters *specific to the two-task environment* that we should look at as possible contributors to the problem?
Ex Libris did the following:
1. Finding that the ora_connect_mode between Aleph and Oracle was multi-thread mode ("LISTENER_MTS"), changed it to "LISTENER" and restarted aleph. This MTS mode is not recommended and is known to be problematic.
2. Installed rep_change's 2575 and 3099. As described in KB 16384-28961, rep_change 2575 fixes a number of "Failed to read reply" problems.
After the preceding, response time returned to normal.
We believe that the improvement was due primarily to disabling of MTS.
Note: We also recommended changing the allocation of CPU to the 2 servers (Aleph & Oracle) so that each one of them will have 2 cpu, but that change has not yet been implemented and is not part of the solution.
- Article last edited: 10/8/2013