Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    ue_01 dies with ue_01_*.gnt execution error when started automatically between p_manage_nn batch jobs

    • Article Type: General
    • Product: Aleph
    • Product Version: 20, 21, 22, 23
    •  

    Description:
    Ue_01 consistently fails if it starts up (after a p_manage job unlocks the library) before the next p_manage job (that does not lock the library) begins.

    From  XXX30 $data_scratch -- Sep 29 04:08 run_e_01.23609, Sep 29 04:08 run_e_01_word.23609
    ...
    START UE-01 A 00:41:32
    ...
    Execution error : file '/exlibris/aleph/a19_2/aleph/exe/ue_01_word_index.gnt'
    error code: 115, pc=0, call=1, seg=0
    115 Unexpected signal (Signal 10)

    Execution error : file '/exlibris/aleph/a19_2/aleph/exe/ue_01_a.gnt'
    error code: 115, pc=0, call=1, seg=0
    115 Unexpected signal (Signal 10)

    This execution error for ue_01 is showing up in some of the run_e_01 logs almost every day in 2 libraries. It's not the same library or institution every morning. It seems to occur between two p_manage jobs run from the job_list, one that locks/unlocks the library and one following that does not lock the library.

    1) between p_manage_07 and p_manage_12 for course reserves
    2) and between p_manage_19 and p_manage_13 for holdings

    If ue_01 starts before the non-locking library job begins, then any part or all of ue_01 fails with the execution error. If ue_01 starts up after the non-locking library job begins, then no errors and ue_01 is up and running.

    Resolution:
    I think it is better for p_manage_12 to lock the library -- and, certainly, doing so can do no harm....

    Add four lines after the source and start_p_proc lines at the beginning of $aleph_proc/p_manage_12:

    # p_manage_12
    source $aleph_proc/def_local_env
    start_p_proc

    lock_library b
    if ($lock_lib_exc_st == not_locked) then
    abort_exit
    endif


    And then insert an unlock line near the end:

    #
    ex_p_manage_12:
    bl_end
    unlock_library <--- insert this line
    rm_f_symbol
    exit
    #
    ex_p_manage_12_fail:
    bl_end
    rm_f_symbol
    exit

    This changed p_manage_12 proc should be saved as p_manage_12.save (in the unlikely event that some rep_change overwrites p_manage_12).

    In the case of p_manage_19 / p_manage_13 for holdings, I suggest that the order be reversed, so that p_manage_13 is run first and p_manage_19 is run second.

    [From Karen Schneider, FCLA:]

    In addition to the suggestions you made in the SI, I tried several other ways to get the ue_01s to stop failing.

    I added the changes to p_manage_12 to lock and unlock the library, but ue_01 still fails.

    As for changing the order p_manage_19/p_manage_13 are run, p_manage_19 can introduce some records we want deleted by p_manage_13. So, we needed to keep the order of running p_manage_19 followed by p_manage_13.

    I created a script that called the jobs with some sleep time between the last library locking job and the non-locking job to give ue_01 time to start completely before the non-locking job begins, but there were still some ue_01 failures every day.

    Since I couldn't prevent them from dying, instead I wrote a script to stop/restart them some time after the p_manage jobs finish. It stops the ue_01s, "sleeps" for 20-30 seconds and then starts the ue_01s. It ran each morning this week. The ue_01s function normally after the restart.


    • Article last edited: 10/8/2013