Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    File locked" error; job being started before preceding job has finished

    • Article Type: General
    • Product: Aleph
    • Product Version: 16.02

    Description:
    Certain p_cir_51 and p_cir_10 runs which should produce output do not. Some of the $alephe_scratch logs have this error at the end:

    I/O error : file 'TP1'
    error code: 9/065 (ANS74), pc=0, call=1, seg=0
    65 File locked

    And, looking at the timestamps of the files in $alephe_scratch, we see that, despite the fact that the job_list entries have "Y" in column 4, indicating that they should be queued, -- in certain cases a job is being started before the preceding job for the same library (ABC50) has completed.

    For instance, we see this:

    abc50_p_cir_10.09209.dllaw_circ
    Sat Oct 7 00:01:10 2006
    42890 END READING AT 00:01:23

    abc50_p_cir_10.09210.dlsxt_circ
    Sat Oct 7 00:01:14 2006

    I/O error : file 'TP1'
    error code: 9/065 (ANS74), pc=0, call=1, seg=0
    65 File locked

    abc50_p_cir_10.09211.dlwkk_circ
    Sat Oct 7 00:01:17 2006

    I/O error : file 'TP1'
    error code: 9/065 (ANS74), pc=0, call=1, seg=0
    65 File locked

    We see that abc50_p_cir_10.09209.dllaw_circ doesn't complete until 00:01:23, but abc50_p_cir_10.09210.dlsxt_circ starts at 00:01:14 and abc50_p_cir_10.09211.dlwkk_circ at 00:01:17; that is, they start before abc50_p_cir_10.09209.dllaw_circ has completed.

    Resolution:
    It seems that this problem was caused by multiple lib_batch processes running due to aleph being started as "root" (rather than aleph).

    I saw this in util c/1:

    root 19228 1 0 05:10:29 ? 1:26 /exlibris/aleph/a16_1/aleph/exe/rts32 ue_11_a ABC50.a16_1
    root 17433 1 0 05:04:51 ? 7:45 /exlibris/aleph/a16_1/aleph/exe/rts32 ue_06_a ABC50.a16_1
    root 15807 1 0 05:03:27 ? 0:00 /exlibris/aleph/a16_1/aleph/exe/lib_batch ABC50


    The following command shows all the processes on your system which are owned by root (because aleph was started as root) which should not have been:

    ps -ef | grep root | grep aleph/exe


    I suggest that you:

    (1) run $alephe_root/aleph_shutdown;

    (2) do the above command to verify that these root/aleph processes have been stopped;

    (3) change the ownership of the que_batch_lock file in each library's $data_files (from root to aleph);

    (4) run $alephe_root/aleph_startup .


    If the queue_batch_lock file were somehow deleted, that could also cause this problem.

    If the problem occurs after doing this restart as aleph, check if there are more than one lib_batch processes running:

    ps -ef | grep a16_1 | grep lib_batch | grep xxx50

    <where "a16_1" is this specific aleph instance and "xxx50" is your adm library>

    Another cause of multiple lib_batch processes could be the deletion of the $aleph_files/que_batch_lock file. If que_batch_lock is deleted, the system lets you start a second que_batch (lib_batch). When que_batch_lock is present, the system does *not* permit a second lib_batch to be started: you get the message: "lib_batch is already running".

    If the above don't apply, see also KB 3923.

    Additional Information

    file locked, faq


    • Article last edited: 2/12/2014