Skip to main content
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Non-breaking space character prevents proper Word indexing

    • Article Type: General
    • Product: Aleph
    • Product Version: 20, 21, 22, 23

    Problem Symptoms:
    Though for one record, util f/1/28 (Display Word Indexing for a Single Record) shows "Macro?©conomie /" indexed both with and without the slash:
    24510 $$aMacro?©conomie /$$cOlivier Blanchard, Daniel Cohen ; avec la collaboration de Cyril Nouveau.
    001110018 0364 0001 macroeconomie
    001110018 0364 0002 macroeconomie
    001110018 0365 0001 /
    001110018 0365 0002 /
    001110018 0366 0001 macroeconomie/
    001110018 0366 0002 macroeconomie/

    For another, it shows "Macro?©conomie /" indexed *with* the slash only:
    24510 $$aMacro?©conomie /$$cOlivier Blanchard, Daniel Cohen, David Johnson; avec la collab. de Cyril Nouveau.
    001220403 0270 0001 macroeconomie/
    001220403 0270 0002 macroeconomie/

    For this second record, the word "Macro?©conomie" is *not* retrievable in OPAC or GUI Search.

    The presence of a non-breaking space character, which is not changed to a blank, between the "e" and the "/".

    Add "U+00A0" to the xxx01 tab_word_breaking, such as this:
    03 # to_blank             !@#$%^()_={}[]:";<>,.?|\U+00A0
    so that the XA0 will be changed to blank, and the word preceding it indexed properly.

    Additional Information

    The tab_word_breaking header says this: 

    ! For some of the procedures, characters to be considered are defined
    ! in column 4 (e.g. in to_blank and compress).
    ! The character can be keyboard input, or can be in unicode notation,
    ! by entering U+<hexa value> (e.g. U+002E)


    • Article last edited: 7-Mar-2018