Non-breaking space character prevents proper Word indexing
- Article Type: General
- Product: Aleph
- Product Version: 20, 21, 22, 23
Problem Symptoms:
Though for one record, util f/1/28 (Display Word Indexing for a Single Record) shows "Macro?©conomie /" indexed both with and without the slash:
24510 $$aMacro?©conomie /$$cOlivier Blanchard, Daniel Cohen ; avec la collaboration de Cyril Nouveau.
001110018 0364 0001 macroeconomie
001110018 0364 0002 macroeconomie
001110018 0365 0001 /
001110018 0365 0002 /
001110018 0366 0001 macroeconomie/
001110018 0366 0002 macroeconomie/
For another, it shows "Macro?©conomie /" indexed *with* the slash only:
24510 $$aMacro?©conomie /$$cOlivier Blanchard, Daniel Cohen, David Johnson; avec la collab. de Cyril Nouveau.
001220403 0270 0001 macroeconomie/
001220403 0270 0002 macroeconomie/
For this second record, the word "Macro?©conomie" is *not* retrievable in OPAC or GUI Search.
Cause:
The presence of a non-breaking space character, which is not changed to a blank, between the "e" and the "/".
Resolution:
Add "U+00A0" to the xxx01 tab_word_breaking, such as this:
03 # to_blank !@#$%^()_={}[]:";<>,.?|\U+00A0
so that the XA0 will be changed to blank, and the word preceding it indexed properly.
Additional Information
The tab_word_breaking header says this:
! For some of the procedures, characters to be considered are defined
! in column 4 (e.g. in to_blank and compress).
! The character can be keyboard input, or can be in unicode notation,
! by entering U+<hexa value> (e.g. U+002E)
- Article last edited: 7-Mar-2018