High CPU utilization running compress_seq_file
- Article Type: General
- Product: Aleph
- Product Version: 18.01
Description:
Last several days our high CPU alert has been tripped on our Aleph front end box.
Current snapshot:
load averages: 4.74, 4.77, 4.84; up 22+22:43:31
10:05:45
199 processes: 192 sleeping, 3 running, 4 on cpu
CPU states: 0.5% idle, 24.1% user, 75.4% kernel, 0.0% iowait, 0.0% swap
Memory: 9216M phys mem, 2661M free mem, 16G total swap, 16G free swap
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
3140 aleph 1 0 0 2464K 1496K run 125:36 22.72% compress_seq_fi
3135 aleph 1 0 0 2464K 1496K run 125:37 22.49% compress_seq_fi
25425 aleph 1 0 0 2464K 1504K cpu/1 172:54 22.23% compress_seq_fi
25413 aleph 1 0 0 2464K 1480K cpu/2 172:27 22.01% compress_seq_fi
It appears the problem starts around 4 am -- and continues throughout the day.
Resolution:
I see that one of the jobs running at this time is the abc50 p_arc_01:
* W1 03:30:00 N csh -f
/exlibris/aleph/a18_1/aleph/proc/p_arc_01 ABC50,ALL,N,04,I
p_arc_01 calls p_arc_01_c, which calls aleph_load:
$aleph_proc/aleph_load $arc_user $table_name FIX NONDIRECT
$aleph_proc/aleph_load $arc_user $table_name FIX DIRECT
aleph_load calls p_file_11:
$aleph_proc/p_file_11 ${LIB},T,${TABLE},$in_fl,$out_fl,$schema_name
which executes compress_seq_file:
$aleph_exe/compress_seq_file $length_fl $in_fl $out_fl
We see four compress_seq_file processes in the Snapshot, each using 22% of the CPU. p_arc_01 is started with 4 processes and each of these processes generates a compress_seq_file process.
The p_arc_01 process is a background process run at night to transfer data from the Aleph Oracle tables to the Aleph Reporting Center database. These processes *should* be lower priority than other processes (such as OPAC, Circulation, etc.). If they are not, I suggest that you make them so.
I see that you are running the job in p_run_mode I (Incremental); that shouldn't take as long ... but obviously it is. The p_arc_01 Incremental is writing out only changed records.
I believe that the p_arc_01 is a lower-priority process. Perhaps the CPU it is using is "leftover" CPU, unused by other processes. What difference do you see in GUI and OPAC response time do you see when this job is running vs. when it is not?
It looks as though it's at least 3/4 done. If it is impacting the GUI and OPAC, then I would suggest that you try
(1) Having the job start at midnight rather than 4 am;
(2) *Increasing* the number of processes (so that the job will complete faster, during off-hours)
- Article last edited: 10/8/2013