p_manage_36 not matching on authority records with caret ("^") in 010 field

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Article Type: General
Product: Aleph
Product Version: 18.01

Description:
We are working on loading a large batch of new and updated authoirty records using manage 36. The numbers for matched records is lower than expected, so I have been doing some testing on our test server. I have discovered that the missing matches are because of an extra ^ in the index entry for the LCC -

example:

01 z11_index \
02 z11_rec_key \
03 ind_code .......LCC
03 filing_text ....sh 00000029^
03 sequence_1 .....000949472
02 z11_doc_number ...000949472
02 z11_alpha ........L
02 z11_text .........$$ash 00000029^

When I removed the ^ via the cataloguing module, the record finally matched against the incoming record.

01 z11_index \
02 z11_rec_key \
03 ind_code .......LCC
03 filing_text ....sh 00000029
03 sequence_1 .....000949472
02 z11_doc_number ...000949472
02 z11_alpha ........L
02 z11_text .........$$ash 00000029

Questions:
Is there any reason why this ^ needs to be there?

Is there are way to get manage 36 to ignore this final ^ if it is present?

If I have to fix and reindex all of these, I assume I would edit section 21 in ABC10's tab_filing by adding N compress ^

If I have to reindex, any ideas how I can ret-01 the records with these extra ^s?

Resolution:
> Is there any reason why this ^ needs to be there?

<<js MARC21 says that the LCCN 010 field is a 12-byte field:

prior to 2001: 3-byte prefix, 2-byte year, 6-byte serial#, 1-byte "supplement number" [which is normally blank]
2001 and later: 2-byte prefix, 4-byte year, 6-byte serial#

ALEPH doesn't allow consecutive blank spaces in doc record fields and it doesn't allow a space at the end of the field. Thus, the caret is used to preserve the proper format in the pre-2001 records.

The name authority records look like this:

$$an^^00092964^

and the subject authority records, like this:

$$ash^84230557^

The distributed usm10 records have the carets in the z11_text, as shown above, but the Z11-FILING-TEXT component of the z11_rec_key has both the carets and the "n" and "sh" stripped out: "00092964" and "84230557" (-- "non_numeric" in the tab_filing entry 21 is stripping out any non-numeric characters).

It seems to me that one of two things is happening:

(1) the input file is not in the proper marc21 format and doesn't have the extra spaces; or

(2) the input file has the spaces but the fix_doc_marc21_spaces routine is not being executed for this input file to insert caret(s) in order to preserve the spaces. (Note: the fix_doc_marc21_spaces is hardcoded to operate on the fixed fields and the following variable fields: 010, 260, 310, 321, 362, 515, 525, 533, 76*, 77*, and 78*.)

As you can see from the preceding, it's my feeling that rather than eliminating the ^'s from the existing records you should make sure that the incoming records have them.

Since I think that almost all of the authority records have a caret at the end of the 010, if you make a change to tab_filing, rather than resending records to for indexing with p_ret_01/p_manage_40/ue_01, you should re-run p_manage_05.

> Is there a way to get manage 36 to ignore this final ^ if it is present?

<<js I don't know of a way to get it to ignore it.

[From site:]
We wanted to avoid having to reindex our 5 million+ authority records. So we came up with this workaround:

1) We commented out M-36 in tab_fix, in order to keep the end ^.

!* Used by manage-36 - checking a file of new records against the db for dups
! This runs carets into spaces.
! Comment out to see if the carets match up. Apr 24
! M-36 fix_doc_space_char 010##

We ran a fix on the incoming records - changed the middle ^ after sh to a space.

So the incoming 010s and the index entries then had the same format: sh 00000123^

And the matching worked!

We hope the same technique will work for the name authorities...

We will probably rework the index as you orignally suggested so we do not have to jump through these hoops once we start loading authority updates systematically.

Article last edited: 10/8/2013