Skip to main content
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Non-breaking space character prevents proper Word indexing

    • Article Type: General
    • Product: Aleph
    • Product Version: 20

    Problem Symptoms:
    Though for one record, util f/1/28 (Display Word Indexing for a Single Record) shows "Macro?©conomie /" indexed both with and without the slash:

    24510 $$aMacro?©conomie /$$cOlivier Blanchard, Daniel Cohen ; avec la collaboration de Cyril Nouveau.
    001110018 0364 0001 macroeconomie
    001110018 0364 0002 macroeconomie
    001110018 0365 0001 /
    001110018 0365 0002 /
    001110018 0366 0001 macroeconomie/
    001110018 0366 0002 macroeconomie/

    For another, it shows "Macro?©conomie /" indexed *with* the slash only:

    24510 $$aMacro?©conomie /$$cOlivier Blanchard, Daniel Cohen, David Johnson; avec la collab. de Cyril Nouveau.
    001220403 0270 0001 macroeconomie/
    001220403 0270 0002 macroeconomie/

    For this second record, the word "Macro?©conomie" is *not* retrievable in OPAC or GUI Search.

    The presence of a non-breaking space character, which is not changed to a blank, between the "e" and the "/".

    Add "U+00A0" to the xxx01 tab_word_breaking, such as this:

    03 # to_blank !@#$%^()_={}[]:";<>,.?|\U+00A0

    so that the XA0 will be changed to blank, and the word preceding it indexed properly.

    Additional Information

    Though the as-delivered tab_filing header includes information about the inclusion of "U+" s, the tab_word_breaking header lacks this information. A request has been made that it be added.

    • Article last edited: 5/12/2014