- Article Type: General
- Product: Aleph
- Product Version: 18.01
We're testing the load of some Arabic recon records that we outsourced to a vendor (OCLC). About a dozen records are generating character conversion errors, saying that the character is not found in marc8_ext_ara_to_unicode. The characters are hex 31, 53, 5E, and 6D. Checking the Library of Congress page referenced by the header of marc8_ext_ara_to_unicode, I see those characters listed as valid. However, they don't occur in our version of the table.
Each of the Extended Arabic characters can be represented in two ways in MARC8, once in the G0 range (x'21' - x'7E') and once in the G1 range (x'A1' - x'FE'). However our table only has the G1 range versions. I believe I can simply add the table lines for the G0 range at the end of the table and the conversion should be okay in both directions. The G1 range will still be the preferred range when converting back to MARC8.
Do you think this should be OK?
The marc8_ext_ara_to_unicode table header has the same paragraph that all the the marc8_..._to_unicode headers have:
! When converting from UTF to MARC8 the programs will take the first
! occurrence in the second column that matches the Unicode
! input character. It is possible to rearrange the tables so that the
! line with the value desired for export precedes other lines for the
! same Unicode character. marc8 values don't have to be sorted.
So, yes, you can add the G0 lines at the *end* of the table and the conversion should be okay in both directions. The G1 range will still be the preferred range when converting back to MARC8.
- Article last edited: 10/8/2013