Each day between 1:15 -> 2:30: very slow OPAC; timeouts; "base/library not avail

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Article Type: General
Product: Aleph
Product Version: 19.01

Description:
Each day between 1:15 and 2:30 we experience a severe slowdown in responses to searching in the production OPAC. Some transactions actually reach the 1-minute timeout limit (we see “timeout” error messages in the www_server log); in other cases the response time is just extremely slow (30 - 45 seconds). Sporadic "base/library is not accessible" messages; default bases display. The pc_server is also affected.

Monitoring the system using prstat and mpstat showed normal operations.

This is the month of July -- *not* a high use time for the system.

Toggling down all JOBD, QUEUE-BATCH AND UE jobs and processes (with util w/5) during the problem time interval did not help. Shutting down the pc_servers also did not help.

We then stopped and started Oracle and Aleph, but when we restarted the system -- during the problem time period -- the severe slowdown continued.

We also occasionally experience lesser, shorter slowdowns at other times of the day.

Resolution:
[From site:] We believe the problem lies with the Disaster Recovery process (Falconstor) in writing blocks to the ATA drives.

The problem is that before a change is made to the data that Falconstor watches, it will grab the unchanged data and write it to a "recovery space". That space has been on ATA drives. So they are a bit slower than the Raid 1/0 or Raid 5 we use in other areas. The testing and work with Falconstor shows that moving the DR (disaster recovery) off of the ATA drives to faster drives will resolve the issue.

Other possibly relevant information:

We made the following changes -- /ora01 is now in its own RAID 1/0 group which consists of 12 - 15K RPM fibre channel drives and /ora02 is in its own RAID 1/0 group which consists of 6 – 15K RPM fibre channel drives, so there should not be any contention for spindles between the the two file systems if there were any before.

Below is the SSD device representation converted to c#t#d#s# -- we are using mpio:

ssd6 ? c5t2101000D770CCAC0d3 Path a of /ora01
ssd13 ? c7t2101000D770CBFD4d3 Path b of /ora01

ssd5 ? c5t2101000D770CCAC0d4 Path a of /ora02
ssd12 ? c7t2101000D770CBFD4d4 Path b of /ora02

Article last edited: 10/8/2013