Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    File locked" error; job being started before preceding job has finished

    • Article Type: General
    • Product: Aleph
    • Product Version: 20, 21, 22, 23

    Description:
    Certain p_cir_51 and p_cir_10 runs which should produce output do not. Some of the $alephe_scratch logs have this error at the end:

    I/O error : file 'TP1'
    error code: 9/065 (ANS74), pc=0, call=1, seg=0
    65 File locked


    And, looking at the timestamps of the files in $alephe_scratch, we see that, despite the fact that the job_list entries have "Y" in column 4, indicating that they should be queued, -- in certain cases a job is being started before the preceding job for the same library (ABC50) has completed.

    For instance, we see this:
    abc50_p_cir_10.09209.dllaw_circ
    Sat Oct 7 00:01:10 2006
    42890 END READING AT 00:01:23

    abc50_p_cir_10.09210.dlsxt_circ
    Sat Oct 7 00:01:14 2006

    I/O error : file 'TP1'
    error code: 9/065 (ANS74), pc=0, call=1, seg=0
    65 File locked

    abc50_p_cir_10.09211.dlwkk_circ
    Sat Oct 7 00:01:17 2006

    I/O error : file 'TP1'
    error code: 9/065 (ANS74), pc=0, call=1, seg=0
    65 File locked


    We see that abc50_p_cir_10.09209.dllaw_circ doesn't complete until 00:01:23, but abc50_p_cir_10.09210.dlsxt_circ starts at 00:01:14 and abc50_p_cir_10.09211.dlwkk_circ at 00:01:17; that is, they start before abc50_p_cir_10.09209.dllaw_circ has completed.

    Resolution:

    There are two situations:

    1.  Where the que batch (lib_batch) process was started as root

    It seems that this problem was caused by multiple lib_batch processes running due to aleph being started as "root" (rather than aleph).

    We saw this in util c/1:
    root 19228 1 0 05:10:29 ? 1:26 /exlibris/aleph/a16_1/aleph/exe/rts32 ue_11_a ABC50.a16_1
    root 17433 1 0 05:04:51 ? 7:45 /exlibris/aleph/a16_1/aleph/exe/rts32 ue_06_a ABC50.a16_1
    root 15807 1 0 05:03:27 ? 0:00 /exlibris/aleph/a16_1/aleph/exe/lib_batch ABC50



    The following command shows all the processes on your system which are owned by root (because aleph was started as root) which should not have been:
    ps -ef | grep root | grep aleph/exe

    We suggest that you:

    (1) run $alephe_root/aleph_shutdown;

    (2) do the above command to verify that these root/aleph processes have been stopped;

    (3) change the ownership of the que_batch_lock file in each library's $data_files (from root to aleph);

    (4) run $alephe_root/aleph_startup .



    2.  Where the $aleph_files/que_batch_lock file was deleted and the system has let a second que_batch (lib_batch) be started.  (When que_batch_lock is present, the system does *not* permit a second lib_batch to be started: you get the message: "lib_batch is already running".)

    Check in util c/1 to see if there is more than one lib_batch process running for the same Aleph instance.  (There should *not* be.)

    Also check with "ps -ef |grep":

    ps -ef | grep a22_1 | grep lib_batch | grep XXX50
    <where "a22_1" is this specific (v22) aleph instance and "XXX50" is your adm library>

    Such as this:

    aleph@aleph-bib(a22_2) XXX50> ps -ef | grep a22_2 | grep lib_batch | grep XXX50
    aleph     8497     1  0  2017 ?        00:12:33 /exlibris/aleph/a22_2/aleph/exe/lib_batch XXX50
    aleph    16498     1  0 Aug11 ?        00:00:00 /exlibris/aleph/a22_2/aleph/exe/lib_batch XXX50

    If so, kill both of them:

    aleph@aleph-bib(a22_2) XXX50> kill -9 8497 16498

    Then do util c/2 to restart the batch que (lib_batch) process.

     


    • Article last edited: 13-Aug-2018
    • Was this article helpful?