German diacritics: searching 'ue' for 'u' with umlaut
- Article Type: General
- Product: Aleph
- Product Version: 15, 16, 18, 20, 21, 22, 23
Description:
We are experiencing search disparities with diacritics searching. Particularly with German diacritics. We need our system adjusted so that a user can search with a diacritic, without a diacritic, or with a substitute for a diacritic (like 'ue' for u with umlaut) and get the same results.
Resolution:
For Word searching:
The relevant table is that specified in the WORD-FIX line in $alephe_unicode/tab_character_conversion_line, that is: $alephe_unicode/unicode_to_word_gen .
The header of unicode_to_word_gen says:
! Another example, in order to set an umlauted "u" as "ue",
! set the equivalency of u-umlaut (00FC) to "u" + "e"
! (0075 + 0065).
So, you need to change this line:
00FC 0075 #LATIN SMALL LETTER U WITH DIAERESIS
to this:
00FC 0075 0065 #LATIN SMALL LETTER U WITH DIAERESIS
Then:
a. stop/start ue_01
b. restart the www_server and pc_server
c. resend a record containing the umlaut to the server (with GUI Cataloging or util f/13)
d. check and see if the searching is satisfactory.
For Browse:
When the relevant $data_tab/tab_filing routine contains an entry for:
.. char_conv FILING-KEY-01
the $alephe_unicode/unicode_to_filing_01 file can be used to normalize combinations of characters.
$alephe_unicode/unicode_to_filing_01 has this:
00FC 0055 #LATIN SMALL LETTER U WITH DIAERESIS
You would need to change this line to:
00FC 0055 0045 #LATIN SMALL LETTER U WITH DIAERESIS *
and then perform steps a-d, as shown for the Words, above.
* Rather than normalizing to lower case (0075 0065) this table is normalizing to upper case -- which requires "0055 0045".