Multiple p_cir_nn jobs (for different sublibraries) execute simultaneously
- Article Type: General
- Product: Aleph
- Product Version: 20, 21, 22, 23
Description:
All six of our abc50 p_cir_12 jobs (each for a different sublibrary) have incorrect data (from other sublibraries) starting on Sunday, 10/25. For example: AA requested books were under BB output: the first two books are XE; the others are BB.
Resolution:
This problem was caused by the fact (seen in util c/1) that two lib_batch processes were running in the abc50 library. The first lib_batch process started the p_cir_12 job for sublibrary AA; the second, started p_cir_12 for sublibrary BB. These two jobs were overwriting each other's work files, in the abc50 $data_scratch.
The multiple lib_batch processes were due to the fact that the que_batch_lock file was deleted by util x/3. (que_batch_lock prevents a second lib_batch process from being started.)
We see this in the abc50 $data_files:
-rw-r--r-- 1 aleph aleph 1322748 Oct 26 13:26 que_batch.old
-rw-r--r-- 1 aleph aleph 9 Oct 26 13:27 que_batch_lock
So the generation of the que_batch.old file and the que_batch_lock occur at the same time.
The $aleph_proc/org_que proc has this line:
mv $data_files/que_batch $data_files/que_batch.old
and org_que is called by start_library_batch:
source $aleph_proc/org_que
We see that the util_c_02 procedure ("Start Library Batch Queue") executes start_library_batch, but that the unlock_library proc *also* calls start_library_batch.
The chronology:
Aug. 18 abc50 lib_batch started (with que_batch_lock file dated Aug. 18)
Oct. 22 abc50 que_batch_lock deleted (by util x/3)
Oct. 23 abc50_util_a_13_b.56001 does "unlock_library" which (since it finds no que_batch_lock):
(a) writes que_batch to que_batch.old and creates a que_batch_lock file
(b) starts a 2nd abc50 lib_batch process
Oct. 24 00:46 the first "file locked" error occurs in abc50: abc50_p_cir_10.56029.court_bb
Oct. 26 you kill both lib_batch's and restart que_batch (lib_batch)
We recommend not running util x/3 at all: not much is written to the libraries' $data_files directories and you can individually clean up what is written there.
See also the article: " File locked" error; job being started before preceding job has finished ".
- Article last edited: 10/8/2013