Badly formed number" error in Linux, running multi-process batch job *MASTER*
- Article Type: General
- Product: Aleph
- Product Version: 18.01
Description:
Case A:
(V18) p_manage_nn indexing job (or other job) fails with a "Badly formed number" message.
We see this in the $alephe_scratch log:
No match.
No match.
@: Badly formed number.
Case B:
In v20, the p_manage_01 job ran to completion -- including successful building of the z9n Oracle indexes --, but certain z98 records were found to be missing. Investigation showed the following in the middle of the p_manage_01 log:
end - handling manage_01_4.9
-rw-r--r-- 1 aleph exlibris 56514552 Jan 4 16:59 manage_01_6.1
@: Badly formed number.
file_name /exlibris/aleph/a20_1/tmp/ABC30_check_index_20324.lst
<etc.>
Resolution:
The problem has occurred at Linux sites in versions 18, 19, and 20.
The occurrence of this error in the main process (Case A) was corrected by v20 rep_change 2511 ("Batch jobs containing parameter "Processes to create" did not work on Linux machines when the chosen number of processes was 8 or 9.")
But sites were still encountering the error in indexing sub-scripts (Case B). That problem has been corrected by version 20 rep_change 3302 (to be included in the Februrary 2011 Service Pack.) {"Rebuild Word Index (manage-01): the service did not work properly on Linux machines when the value of the parameter 'Processes to create' is 8 or 9.... This is a continuation to rpc #2511, since p_manage_01 uses also the number of processes in sub-scripts."}
**If you are running Linux and are on version 18 or 19, or on version 20 and not ready to implement a v20 Service Pack containing rep_change's 2511 and 3302, run the job with either 7 or 10 processes.**
[Old Answer:]
This had to do with the “number of processes” parameter.
I found some examples of problems with how csh scripts handle numbers like 08 and 09. It interprets them as bad octals – that is, there is no 08 or 09 in octal numbers, they are base 1-7 (See: http://www.phwinfo.com/forum/archive/index.php/t-212039.html ). The script would need to convert 09 into 9 and then pass that as a parameter into the command. Right now, when you enter "9" for processes in the p_manage_102 services window in the GUI, it actually converts the number into "09", which causes the error. As you can see in the log header:
setenv p_no_process_x "09"
When we ran into this problem originally (with p_manage_32 back in August of last year), I thought that it had to do with a conflict between processes which were hanging. But now that I found these troubleshooting clues online, I see that it is most likely caused by a processes parameter value of 08 or 09.
The upshot: we should use 7 or 10 processes…
[Note from Jerry S:] Four other sites saw this problem. Reducing the number of processes from 8 to 7 eliminated the problem.
- Article last edited: 10/8/2013