- Article Type: General
- Product: Aleph
- Product Version: 18.01
Prod's opac was not running Sat AM (Apr 26) when our libraries opened.
We saw that the PC server, while running, was extremely sluggish, and that several processes (various services) had been running in excess of 500 minutes each. We tried to restart the WWW and PC servers. That didn't help so we ran aleph_shutdown then aleph_startup. aleph_startup was running extremely slowly, so we eventually physically rebooted the server hardware. (The hardware hadn't been rebooted in over 12 months.)
Aleph started up quickly following the reboot and has been running smoothly since. Just prior to the reboot, server monitors showed that the load on the application server was as high as we've ever seen it, while there was virtually no load on the database server.
Investigation this week showed that 6 processes called uh_01_a were running early Sat when Aleph was so sluggish. From the services documentation, it appears that uh_01 is basically a house-keeping routine, but we don't really understand what it does and why it would have been running redundant simultaneous processes.
Does uh_01 have anything to do with the ARC ETLs we run several nights/week? Or is it just coincidental that we saw uh_01_a start shortly after the Fri night ETL? Does uh_01 routinely run after particular other services and we've just never noticed it? Or it is something we should consciously be running on a regular basis?
As noted in KB 3993: "The ./util/uh_01_a.cbl program generates the 'setenv base_demo_lib'." KB 16384-3714 also touches on its function.
The uh_01_a is called by only one $aleph_proc proc: def_lib_env:
def_lib_env, in turn, is executed only by the $data_root/prof_library proc:
prof_library is, of course, referenced by a large number of routines.
It seems that uh_01_a is executed as part of prof_library to get the base_demo_lib for the library. It is a very simple function which should take only a second or two. I don't know why it would be hanging like this.
Since this is the first time that it has happened -- at *any* site, that we know of --, I suggest that we monitor the situation and wait and see if it happens again, and what similarities there are in the circumstances (ETL, no-ETL, etc.).
- Article last edited: 10/8/2013