Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Batch que won't shut down; lib_batch_log at 2 Gig; "lib_batch already running"

    • Article Type: General
    • Product: Aleph
    • Product Version: 20

    Description:
    Since a v20 instance was installed on the server with our existing v17 instance, we have been having the following (batch queue) problems:
    (1) aleph_shutdown does not complete. The log shows that it is stopping in trying to stop the batch queue for the abc30 library.
    (2) The abc30 UTIL-C-1 shows that some job is always running repeatedly;
    (3) After killing this job and stopping the queue, when we try starting the abc30 batch queue (UTIL-C-2), there is no error message but it doesn't actually start either.
    (4) I find that if I scroll up from the screen where I did the UTIL-C-2, that a message (which apparently flashed by very quickly) appears:
    /exlibris/aleph/u18_1/abc30/files/lib_batch_log
    Filesize limit exceeded
    "ls -lrt" in the ./abc30/files/ directory shows that the lib_batch_log is at 2 Gig (2147483647 bytes).
    (5) When I connected to v17 and tried to start the abc50 batch queue (UTIL-C-2), I got the message "lib_batch already running for ABC50" -- even though UTIL-C-1 did *not* show any lib_batch running.
    >> ps -ef | grep lib_batch | grep ABC50
    showed that the only lib_batch ABC50 process running was the v20 process.

    Thinking it might be confused, I killed this v20 process.

    When I then did UTIL-C-2 for the v17 abc50, the (v17) lib_batch process started.

    Now when I do
    >> ps -ef | grep lib_batch | grep ABC50

    I see the v17 lib_batch process but (of course) no v20.

    When I do UTIL-C-2 in v20 to try to start the lib_batch it says "lib_batch already running for ABC50" -- even though the only ABC50 lib_batch running is the v17.

    Resolution:
    The problem was that the v20 scratch and files directories have had symlinks to the v17 instance:
    lrwxrwxrwx 1 aleph exlibris 27 Jun 15 13:52 scratch -> /wrkspc/u17_1/abc50/scratch/
    lrwxrwxrwx 1 aleph exlibris 25 Jun 15 13:52 files -> /wrkspc/u17_1/abc50/files/

    Thus, there was an unhealthy connection between these two instances.

    For instance, when you would start the v20 lib_batch writes a que_batch_lock file to the /wrkspc/u17_1/abc50/files/ directory. The v17 UTIL-C-2 would then find this que_batch_lock and conclude that lib_batch was already running.

    The solution was to remove these symlinks.


    • Article last edited: 10/8/2013