marc_to_docx needs to insert $$6 at beginning, when breaking long 880 field

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Article Type: General
Product: Aleph
Product Version: 18.01

Description:
This is a follow-up to SI 16384-114332. That SI was the problem of fix_doc_880 creating mal-formed fields when a long 880 $$6505... was broken into multiple fields by p_file_02 (marc_to_docx.cbl).

The "solution" in v20 rep_change 2206 was the following:

"If there is a long 880 field (longer than 2000 characters), it is split into several 880 fields. Only the first one contains the $$6 subfield. This caused a problem when running fix_doc_880 which assumes $$6 at the beginning of the field.
Solution: This has been corrected. Now the split parts that do not contain the $$6 are ignored. "

The result of this is that the data in the additional 880 pieces is dropped from the record.

Of course, this is not acceptable. If the problem isn't (or can't?) be addressed in fix_doc_880, then We need to correct marc_to_docx so that, when it breaks the long 880 into pieces, it gives each piece the required $$6 field.

I sent the changed fix_doc_880.cbl to the customer. They analyzed it as follows:

It’s better, in that it doesn’t generate the “^$$” tags anymore, but still not working correctly. I think a real solution would have to involve changes to the long-field split program. Here’s what I think is happening:

When record is converted from MARC to docx (aleph sequential), the long fields are split, with the subfield 9 added.
When a long 880 is split, the added fields (with the subfield 9) don’t have the linking subfield 6 from the original field
fix_doc_880 converts the first part of a long 880 to the tag from subfield 6. Subsequent 880’s are ignored, since they don’t contain any linking information.

I’ve attached a zip file [attached to Step 2] with files containing the output from converting a record to aleph sequential. It’s a Russian record with a long 505 field, with Cyrillic (I think) in the linked 880 field. Files are:

long_880.marc: original marc record, downloaded from oclc
long_880.pf01: output from p_file_01
long_880.pf02: output from p_file_02
long_880.pm22: output from p_manage_22 (convert utf decomposed to precomposed)
long_880.pm25: output from p_manage25, applying fix_doc_880

You see that in long_880.pf02, the 505 field has been split into 3 fields and the associated 880 has been split into 5. In long_880.pm25, there’s the first part of the latin 505, followed by the first part of the linked Cyrillic 505, followed by the other two parts of the latin 505. The additional parts of the Cyrillc 880 are gone.

<end customer analysis>

If you compare the long_880.pm22 file (the file before fix_doc_880 has been run on it) (attached to Step 23) to the long_880.pm25, you will see that the long_880.pm22 contains a

000000001 88000 L $$6505-05

followed by three

000000001 88000 L $$9

fields. There are two possible approaches to this problem:

1. Change fix_doc_880 so that it handles the 880 $$9 fields (without dropping them); or

2. Change the marc_to_docx program so that it when it splits the long 880 into multiple fields each of the additional fields begins with a $$6 so that it will be recognized by the fix_doc_880 program.

It seems (in SI 16384-114332) that the first approach has been rejected. Therefore, we need to use approach #2 to fix this problem.

Resolution:
This is Ex Libris' response:

This is not a bug. Currenltly Aleph does not support this functionality for long fields. The limit of 2000 character remains and the split of the fields and the addition of subfield $9 was done in order not to lose information but it does not mean that Aleph programs will take these fields as one. In general terms, the user should go and decide whether they want a manual correction such as deleting text, etc.

Article last edited: 10/8/2013