What character encoding is preferred for bulk importing records into Voyager?
- Product: Voyager
Question
Is MARC21 UTF-8 or MARC21 MARC-8 character encoding preferred for bulk importing records into Voyager?
Answer
Voyager stores records in MARC21 UTF-8, and will convert MARC21 MARC-8 records to MARC21 UTF-8 when they are imported and MARC21 MARC-8 is selected as the Expected Character Set.
Voyager can handle either type of encoding as long as that type is selected as the Expected Character Set in the Bulk Import rule - Voyager will convert any character encoding as necessary1. UTF-8 may be simpler because there is no conversion involved, but either can be selected, as long as the character set matches that of the records in the import file, and the encoding is consistent (that is, all records in the file encoded the same way).
You can determine the character coding scheme by checking byte 9 of the MARC Leader: a blank in byte 9 indicates the record is MARC-8, an a means the record is UCS/Unicode.
Additional Information
Only bibliographic data are stored as Unicode (MARC21 UTF-8) in Voyager, and the rest is Latin1. See the Voyager Technical User's Guide (various sections) for further details.
1Characters that can't be converted will produce "Invalid MARC21 Character" error. See: What does "Invalid MARC21 Character" error mean in cataloging client?
Article last edited: 09-May-2020