Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Diacritics wrong in OCLC records; Error: character X".." not defined

    • Article Type: General
    • Product: Aleph
    • Product Version: 16.02

    Description:
    Records imported from OCLC have the U+FFFD character instead of the appropriate diacritic.

    The oclc_server log has these messages:
    Error: character X"cc" is not defined in marc8_lat_to_unicode.
    Error: character X"81" is not defined in marc8_lat_to_unicode.
    Error: character X"bb" is not defined in marc8_lat_to_unicode.
    Error: character X"a0" is not defined in marc8_lat_to_unicode.
    Error: character X"bb" is not defined in marc8_lat_to_unicode.
    Error: character X"a0" is not defined in marc8_lat_to_unicode.
    Error: character X"9b" is not defined in marc8_lat_to_unicode.
    Error: character X"91" is not defined in marc8_lat_to_unicode.

    I'm getting results like this in my converted set of bib records: "Sh?kan T?y?". These appear in both the Web OPAC and GUI Cataloging as a black diamond with a white question mark inside.

    Resolution:
    These errors indicate that you are trying to convert data to UTF-8 (Unicode) format which is already in UTF-8. ("cc81" is the utf-8 representation of the acute a.)

    OCLC Connexion has a "Format" option which can be either "MARC-8" or "Unicode". When the Format is Unicode, OCLC sets the LDR byte 09 (Character coding scheme) to "a":

    00761ccm a2200229K 45 ...

    and blank when it is MARC-8:

    00761ccm 2200229K 45 ...

    Note: ALEPH version 16 and lower do not use this LDR byte 09 (though its value can be useful in diagnosing this problem). The OCLC character conversion is controlled by the OCLC_TO_UTF line in $alephe_unicode/tab_character_conversion_line. As delivered, the procedure associated with OCLC_TO_UTF is line_marc8_2_line_utf . This procedure is executed regardless of what the input looks like. Thus, in version 16 and lower, you need to set the OCLC Connexion Format to MARC-8.

    ALEPH version 17.01 and up do consult LDR byte 09. (v17 rep_change 678 / v18 rep_ver 11425.) When it is "a", the OCLC_UTF_TO_UTF line in tab_character_conversion_line is used. Note: a value which may be used for the column 4 "Procedure to run", not noted in the header, is "line_no_translate".

    As described in KB 8192-4007, the line_no_translate program no longer exists, in version 18-up. Commenting out the OCLC_UTF_TO_UTF line (or deleting it) gives the same result as "line_no_translate". That is what you should do.

    Additional Information

    faq


    • Article last edited: 10/8/2013