ue_01 dies with ue_01_*.gnt execution error when started automatically between p_manage_nn batch jobs

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Article Type: General
Product: Aleph
Product Version: 20, 21, 22, 23

Description:
Ue_01 consistently fails if it starts up (after a p_manage job unlocks the library) before the next p_manage job (that does not lock the library) begins.

From XXX30 $data_scratch -- Sep 29 04:08 run_e_01.23609, Sep 29 04:08 run_e_01_word.23609
...
START UE-01 A 00:41:32
...
Execution error : file '/exlibris/aleph/a19_2/aleph/exe/ue_01_word_index.gnt'
error code: 115, pc=0, call=1, seg=0
115 Unexpected signal (Signal 10)

Execution error : file '/exlibris/aleph/a19_2/aleph/exe/ue_01_a.gnt'
error code: 115, pc=0, call=1, seg=0
115 Unexpected signal (Signal 10)

This execution error for ue_01 is showing up in some of the run_e_01 logs almost every day in 2 libraries. It's not the same library or institution every morning. It seems to occur between two p_manage jobs run from the job_list, one that locks/unlocks the library and one following that does not lock the library.

1) between p_manage_07 and p_manage_12 for course reserves
2) and between p_manage_19 and p_manage_13 for holdings

If ue_01 starts before the non-locking library job begins, then any part or all of ue_01 fails with the execution error. If ue_01 starts up after the non-locking library job begins, then no errors and ue_01 is up and running.

Resolution:
I think it is better for p_manage_12 to lock the library -- and, certainly, doing so can do no harm....

Add four lines after the source and start_p_proc lines at the beginning of $aleph_proc/p_manage_12:

# p_manage_12
source $aleph_proc/def_local_env
start_p_proc

lock_library b
if ($lock_lib_exc_st == not_locked) then
abort_exit
endif

And then insert an unlock line near the end:

#
ex_p_manage_12:
bl_end
unlock_library <--- insert this line
rm_f_symbol
exit
#
ex_p_manage_12_fail:
bl_end
rm_f_symbol
exit

This changed p_manage_12 proc should be saved as p_manage_12.save (in the unlikely event that some rep_change overwrites p_manage_12).

In the case of p_manage_19 / p_manage_13 for holdings, I suggest that the order be reversed, so that p_manage_13 is run first and p_manage_19 is run second.

[From Karen Schneider, FCLA:]

In addition to the suggestions you made in the SI, I tried several other ways to get the ue_01s to stop failing.

I added the changes to p_manage_12 to lock and unlock the library, but ue_01 still fails.

As for changing the order p_manage_19/p_manage_13 are run, p_manage_19 can introduce some records we want deleted by p_manage_13. So, we needed to keep the order of running p_manage_19 followed by p_manage_13.

I created a script that called the jobs with some sleep time between the last library locking job and the non-locking job to give ue_01 time to start completely before the non-locking job begins, but there were still some ue_01 failures every day.

Since I couldn't prevent them from dying, instead I wrote a script to stop/restart them some time after the p_manage jobs finish. It stops the ue_01s, "sleeps" for 20-30 seconds and then starts the ue_01s. It ran each morning this week. The ue_01s function normally after the restart.

Article last edited: 10/8/2013