aleph_startup, aleph_start and aleph_start.private questions
- Article Type: General
- Product: Aleph
- Product Version: 20
Description:
Here are, this morning, the processes that are active on PROD. I restarted them manually.
aleph@aleph-bib(a20_1) ABC01> ps -ef | grep 'ue_'
aleph 28780 1 0 May31 ? 00:32:35 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_01_z0102_index ABC01.a20_1
aleph 28781 1 0 May31 ? 01:39:21 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_01_word_parallel ABC01.a20_1
aleph 29099 1 0 May31 ? 00:25:44 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_21_a ABC01.a20_1
aleph 29727 1 0 May31 ? 00:00:42 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_01_z0102_index ABC10.a20_1
aleph 29728 1 0 May31 ? 00:00:30 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_01_word_parallel ABC10.a20_1
aleph 30511 1 0 May31 ? 00:00:06 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_01_z0102_index ABC30.a20_1
aleph 30512 1 0 May31 ? 00:00:24 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_01_word_parallel ABC30.a20_1
aleph 30574 1 0 May31 ? 00:00:17 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_21_a ABC30.a20_1
There are several processes missing:
ue_01_a ABC01.a20_1
ue_08_a ABC01.a20_1 C
Below are what util/c/1 is and should be.
*** util_c_01 - check ABC01 batch queue ***
16047 ? S 0:00 /exlibris/aleph/a20_1/aleph/exe/lib_batch ABC01
28780 ? S 32:35 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_01_z0102_index ABC01.a20_1
28781 ? S 99:21 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_01_word_parallel ABC01.a20_1
29099 ? Sl 25:45 /exlibris/aleph/a20_1/aleph/exe/rts32 ue_21_a ABC01.a20_1
Enter to continue
should be like
lib_batch ABC01
ue_01_a ABC01.a20_1
ue_01_z0102_index ABC01.a20_1
ue_01_word_parallel ABC01.a20_1
ue_08_a ABC01.a20_1 C
ue_21_a ABC01.a20_1
What could make an indexing process stop? Are there watchdogs or processes that check if all the indexing daemons are up? Does aleph_startup launch itself if deamons die?
Resolution:
Here is what I see at the end of the abc01 $data_scratch/run_e_01.24204 file:
...
...
HANDLING DOC NO. - ABC01.002035564 2012-05-28 18:17:40
HANDLING DOC NO. - ABC01.000310617 2012-05-28 19:46:14
Oracle error: update_cursor1 z07
ORA-03135: connection lost contact
Process ID: 1341
Session ID: 340 Se
Oracle error: update_cursor1 z07
ORA-03114: not connected to ORACLE
***************************************************
* No ORACLE connection - process is terminated... *
***************************************************
The timestamp on the run_e_01.24204 file is "May 29 02:32". Thus, it seems that Oracle went down sometime between 19:46 and 02:32, and this ue_01 process was not terminated prior to Oracle coming down. It should have been.
The run_e_08.24204 file shows that the same thing happened with ue_08.
If this was a scheduled shutdown of Oracle, aleph_shutdown should have been run prior to the Oracle shutdown. Please change your procedures to make sure this happens.
If it was an unscheduled shutdown, aleph_shutdown followed by aleph_startup should be run after Oracle is brought back up.
- Article last edited: 10/8/2013