pc_server doesn't restart (via p_sys_01); que_batch_lock shows "success" rather than "lockfile"
- Product: Aleph
- Product Version: 20, 21, 22
- Relevant for Installation Type: All
Description:
Each night the www_server and pc_server are taken down (via the "server_monitor -tks" job_list entries) prior to running clear_vir01 and then brought back up with p_sys_01.
Last night both were taken down, but only the p_sys_01 run for the www_server occurred; p_sys_01 for the pc_server did not run -- and the pc_server is still down (as of 9:00 AM). Staff who try to connect with the GUI are getting: "Password not verified on connectable host."
Resolution:
Do this:
1. util w/1/4 to confirm that the pc_server is not running.
2. util w/3/3 to start it
3. Assuming that you don't get any error message, then do util w/1/4 to confirm that is now running
4. Check the $LOGDIR pc_server_nnnn.log file for error messages and/or successful activity
If util w/3/3 did start the pc_server and it is continuing to run successfully, the question is: why wasn't it started by p_sys_01 last night like it normally would be?
First, look in $alephe_scratch for p_sys_01 jobs. You will normally see two in the middle of each night, one for WWW and one for PC. If there's a successful run for the WWW p_sys_01, but if the PC p_sys_01 is not seen there or failed, that confirms the diagnosis.
If it failed, examine the error messages.
If it didn't occur at all, do:
> dlib vir01
> util c/1
If there's no lib_batch process running, then do "util c/2" to start it.
If the lib_batch process *is* running, then do "util c/7". If a p_sys_01 process is seen there, then the question is: why didn't it run earlier when it was supposed to?
It would seem that this indicates a defect in the batch queue (lib_batch) processing.
Do util c/3 to stop the batch queue.
If it stops successfully, then do util c/2 to restart it.
If it doesn't stop, then kill it as described in Article 000033361 ("util c/3 doesn't stop batch queue").
If the above doesn't help, then perform the steps described in Article 000045870 ("Services-submitted jobs don't run; batch queue doesn't work").
[Note: in one case like this, the ./vir01/files/que_batch_lock file showed a value of: "success". It *should* show "lockfile". Restarting the batch queue changed it to the expected value of "lockfile".]
Additional Information
Insert additional information content here, if there is no content please delete the header for this section
Attachment
If there is an attachment add link to the attachment according to steps below, if there is no content please delete the header for this section