Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Search Configurations for Different Languages

    Translatable
    When using the Alma repository search (or when searching users, purchase requests, and fulfillment requests), you can search for special characters and characters with diacritics. Search language configuration (set by Ex Libris) is available in Alma for the below languages. For most of these languages, Alma uses the standard implementation for working with special characters.
    • Alma's handling of special characters is relevant for searching in the institution zone only.
    • Only one language for special characters search can be defined.
    Normalization for the languages listed below is specific to text fields, such as title and author. Normalization is not done for numeric fields or fields that contain a normalized value, such as a call number.

    German Characters

    When your system is configured for German as the default searching language, German language characters are treated by the system as follows:
    German Language Character/Character Combinations Stored in the Alma Database
    ß ss
    ä, Ä ae
    ö, Ö oe
    ü, Ü ue
    ae ae
    oe oe
    ue (when not following a vowel or q) ue
    With your system configured for German as the default language and the special German language characters stored in the system as identified in the above table, you can search using the special umlaut and Eszett characters or the extended Latin version of these characters (as shown in the second column above) and results will be treated equally. So, for example, if you search for Müller, the system will return results for both Müller and Mueller, but not Muller.
    For systems whose default language is not German, a search for Müller will return search results for Müller and Muller but not Mueller.
    When your institution (and Network Zone, if you are working with one) is configured for German as the default searching language, this standard German language special character search capability is available in the Institution Zone, Network Zone, and Community Zone.
    For institutions that have German configured as the default searching language, the repository search results, user search results, and fulfillment request search results are sorted using the DIN 5007-1/2, section 6.1.1.4.1/2 standard. In addition to consideration for the special German language characters, hyphens are ignored when search results are sorted.
    When sorting bibliographic and authority headings content, Alma removes dashes for sorting purposes in institutions that have the searching language parameter set to German.
    When you use the Alma Browse Bibliographic Headings feature, the same sorting standard (DIN 5007-1/2, section 6.1.1.4.1/2) is used to sort the bibliographic headings. See Browsing Bibliographic Headings for more information.

    Spanish and Catalan Characters

    When your system is configured for Spanish as the default searching language (set by Ex Libris), special Spanish language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sort. Standard English characters are not substituted for the special Spanish characters. The following table describes how Alma handles special Spanish characters:
    Letter Search Sort
    Ñ/ñ Searching for Ñ/ñ does not retrieve results for N/n and vice versa. Sorted after n.
    Ç/ç Searching for Ç/ç does not retrieve results of C/c and vice versa. Sorted after c.
    L·L/l·l Searched for as if it were the digraph ll. Sorted as ll.
    Diacritics are sorted in the following order:
    • Without diacritics
    • Acute
    • Grave
    • Dieresis

    Scandinavian Characters (Swedish, Norwegian, Danish)

    Alma normalizes the interchangeable Scandinavian characters æ Æ ä Ä ö Ö ø Ø and folded variants (aa, ao, ae, oe and oo) by transforming them to æ Æ å Å ø Ø.

    Special characters cataloged in non-Scandinavian languages (such as French letters with accents), are normalized during indexing. This means that a search for a term including these special characters now behaves as if the search was done without them. (However, the opposite does not happen: a search for a term without these special characters is not treated as if done with these special characters.) The below table includes the rules for all characters with diacritics for customers of Scandinavian characters:

    Language Upper case Lower case Folded variant The character (in Upper/Lower) is searchable with the following formulas:
    Swedish Å å Aa/aa Å/å/Aa/aa
    Ä ä Ae/ae Ä/Ae/ä/ae
    Ö ö Oe/oe Ø/Ö/Oe/ø/ö/oe
    Æ æ Ae/ae Æ/æ/Ae/ae
    Ø ø Oe/oe Ø/Ö/Oe/ø/ö/oe
    Other accents (e.g È) Other accents (e.g è) Base characters (e.g. E/e) Base characters (e.g. E/e)

    For example: Ö is searchable with Oe, but Oe is not searchable with Ö [Ö and Oe are not equivalent]. This means that a search for the term Edgar Allan Poe will return results for Edgar Allan Pö, but not the opposite: search for the term Pötry does not return results for Poetry.

    Norwegian/Danish Å å Aa/aa

    Å/Aa/å/aa 

    Exception: Å/å is equivalent to Aa/aa, and they are searchable interchangeably. A search for Aalborg returns results for Ålborg, and search for Ålborg returns results for Aalborg as well.

    Æ æ Ae/ae Æ/Ä/Ae/æ/ae/ä
    Ø ø Oe/oe Ø/Ö/Oe/ø/ö/oe
    Ö ö Oe/oe Ø/Ö/Oe/ø/ö/oe
    Ä ä Ae/ae Æ/Ä/Ae/æ/ae/ä
    Other accents (e.g È) Other accents (e.g è) Base characters (e.g. E/e) Base characters (e.g. E/e)

    For example:

    • Search for the term båd returns results for båd/baad, but not for bad
    • Search for the term haan returns results for haan/hån, but not for han
    • Search for the term søn returns results for søn / soen, but not for son or soon
    • Search for the term baer returns results for bær / baer, but not vice versa (bær will not return baer)
    Sorting the Norwegian and Danish special language characters for staff search and for Browse Bibliographic Headings/F3 functionality is handled differently from sorting the Swedish special language characters. See the sorting for each language below:
    Norwegian/Danish Sorting Swedish Sorting
    • a/A-z/Z (with Ü/ü sorted as Y/y)
    • æ/Æ ; ä/Ä
    • ø/Ø ; ö/
    • å/Å ; aa/Aa
    • a/A-z/Z (with æ/Æ sorted as ae/Ae; Ü/ü is sorted as Y/y)
    • å/Å
    • ä/Ä
    • ö/Ö ; ø/Ø
     
    Normalization for Norwegian and Danish is handled in the manner described in the Scandinavian Normalization Filter. For Swedish, Alma normalizes the Scandinavian characters in the same manner.

    Icelandic Characters

    When your system is configured for Icelandic as the default searching language (set by Ex Libris), Icelandic language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sorting. Standard English characters are not substituted for the special Icelandic characters. For example, a does not return á (and vice versa) in search results (when searching for “sál”, Alma does not return “sal"). They are not considered the same characters. 

    The following characters are converted as follows: 

    • The character Ø/ø is converted to Ö/ö. It is sorted after Ó/ó.
    • The character Å/å is converted to AA/aa. 
    •  All other special characters with accents and umlauts, such as ä, ë, ü, û, è, are converted to their default values (a, e, u, etc.)
    The results list is sorted based on the Icelandic alphabetical order (see the Icelandic Characters table below). This feature also applies to staff search for users, purchase and fulfillment requests, and deposits.
    Icelandic Characters
    Uppercase Lowercase Diacritics
    A a  
    Á á acute
    B b  
    C c  
    D d  
    Ð ð eth
    E e  
    É é acute
    F f  
    G g  
    H h  
    I i  
    Í í acute
    J j  
    K k  
    L l  
    M m  
    N n  
    O o  
    Ó ó acute
    P p  
    Q q  
    R r  
    S s  
    T t  
    U u  
    Ú ú acute
    V v  
    W w  
    X x  
    Y y  
    Ý ý acute
    Z z  
    Þ þ thorn
    Æ æ ae
    Ö ö Diaeresis 

    CJK Languages

    Chinese and Korean Characters

    Alma does hiragana to katakana transliteration, traditional Chinese to simplified Chinese transliteration, and splits words into bigrams and unigrams. See the ICU Transform Filter for more information.
    Alma also does Hanja to Hangul transliteration. The sorting is unique to the Korean language.

    Japanese Characters

    For institutions that have the Japanese searching setup for Repository Search, Browse Bib Headings, and Browse Auth Headings, Alma performs the following:
    • Punctuation removal
    • Normalization between Hiragana and Katakana
    • Iterated character normalization
    • Normalization of variant Kanji characters

    CJK Punctuation Handling

    For institutions that have the Chinese, Hong Kong, Japanese, or Korean searching setup, all punctuation marks are removed during indexing when they appear within CJK text. However, the punctuation remains when you are searching. This helps to ensure that the best results are retrieved. Note that the display of CJK content continues to show the punctuation.
    See the known search issue related to punctuation that is described in the note in the Using Advanced Search section.

    Hong Kong TSVCC

    Alma implements the Hong Kong Innovative Users Group (HKIUG) TSVCC (Traditional, Simplified, and Variant Chinese Characters) standard Version 1.0. released on 18 July 2006. In addition to handling the traditional and simplified Chinese characters, Alma also handles the variant Chinese characters when doing the following:
    • Searching metadata records
    • Browsing bibliographic headings
      This includes searching for TSVCC characters entered as a value in Browse Bibliographic Headings and properly sorting headings that appear for browsing. When the same title occurs in different Chinese forms (including variant Chinese characters), all titles that are equivalent are sorted together in the headings list for browsing.
    • Searching for Chinese user names
    TSVCC Chinese character handling is available for institutions that have the Alma searching language parameter set for Hong Kong. Contact Ex Libris Support if you need to have this institution parameter enabled.
    For the complete HKIUG TSVCC table (UNICODE version), see http://hkiug-archive.lib.hku.hk/unicode/hkiug_tsvcc_table-UnicodeVersion-1.0.html.

    Polish Characters

    When your system is configured for Polish as the default searching language (set by Ex Libris), Polish language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sorting. Standard English characters are not substituted for the special Polish characters. For example, C does not return Ć (and vice versa) in search results. They are not considered the same characters.
    The results list is sorted based on the Polish alphabetical order (see the Polish Characters table below). For example, być comes after bycie. This feature also applies to staff search for users, purchase and fulfillment requests, and deposits.
    Polish Characters
    Uppercase Lowercase Diacritics
    A a  
    Ą ą ogonek
    B b  
    C c  
    Ć ć acute
    D d  
    E e  
    Ę ę ogonek
    F f  
    G g  
    H h  
    I i  
    J j  
    K k  
    L l  
    Ł ł stroke
    M m  
    N n  
    Ń ń acute
    O o  
    Ó ó acute
    P p  
    Q q  
    R r  
    S s  
    Ś ś acute
    T t  
    U u  
    V v  
    W w  
    X x  
    Y y  
    Z z  
    Ź ź acute
    Ż ż

    dot

    Czech Characters

    When your system is configured for Czech as the default searching language (set by Ex Libris), Czech language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sort. Standard English characters are not substituted for the special Czech characters. For example, C does not return Ć (and vice versa) in search results. They are not considered the same characters.
    The results list is sorted based on the Czech alphabetical order (see the Czech Characters table below). This means that words starting with the digraph ch (chemie) are sorted between H and I. This feature also applies to staff search for users, purchase and fulfillment requests, and deposits.
    Czech Characters
    Uppercase Lowercase Diacritics
    A a  
    Á á acute
    B B  
    C c  
    Č č caron
    D d  
    Ď ď acute
    E e  
    É é acute
    Ě ě caron
    F f  
    G g  
    H h  
    Ch ch  
    I i  
    Í í acute
    J j  
    K k  
    L l  
    M m  
    N n  
    Ň ň caron
    O o  
    Ó ó acute
    P p  
    Q q  
    R r  
    Ř ř caron
    S s  
    Š š caron
    T t  
    Ť ť acute
    U u  
    Ú ú acute
    Ů ů ring
    V v  
    W w  
    X x  
    Y y  
    Ý ý acute
    Z z  
    Ž ž caron

    Lithuanian Characters

    When your system is configured for Lithuanian as the default searching language (set by Ex Libris), Lithuanian language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sorting. Standard English characters are not substituted for the special Lithuanian characters. This pertains to all the following Lithuanian characters: ą č ę ė į š ų ū ž Ą Č Ę Ė Į Š Ų Ū Ž.

    Lithuanian characters can be searched and found using the corresponding Latin letters in queries: 

    • Aa => Ąą
    • Cc => Čč
    • Ee => ĘĖęė
    • Ii => Įį
    • Ss => Šš
    • Uu =>  ŲŪųū
    • Zz => Žž
    • The same rule applies to all nonstandard Latin-based letters: German, Polish, Latvian, etc. 

    For example: Š is indexed as Š and S (and lower-case options); the text “Šarūnas“ can be found with all of the following search queries: Šarūnas, Sarūnas, sarunas, saruNAS.

    Letters that are not part of the official Lithuanian alphabet (Q/W/X) are sorted by their natural places in Latin. For example:  

    • Q is sorted between P and R
    • W  is sorted between V and Z

    The sorting order of letters is as follows:

    • Lithuanian letters (includes both Latin and special Lithuanian letters above) with other Latin-bases letters that are not pure Latin (e.g. Polish, German, Scandinavian).
    • Cyrillic and all non-Latin based alphabets 
    • Chinese is at the end 
    The results list is sorted based on the Lithuanian alphabetical order (see the Lithuanian Characters table below). This sorting also applies to staff search for users, purchase and fulfillment requests, and deposits.
    Lithuanian Characters
    Uppercase Lowercase
    A a
    Ą ą
    B b
    C c
    Č č
    D d
    E e
    Ę ę
    Ė ė
    F f
    G g
    H h
    I i
    Į į
    Y y
    J j
    K k
    L l
    Ł ł
    M m
    N n
    O o
    P p
    R r
    S s
    Š š
    T t
    U u
    Ų ų
    Ū ū
    V v
    Z z
    Ż ż

    Lithuanian quotation marks

    • Lithuanian quotation marks are interpreted in the same way as Latin. So for indexing, search, and ordering, the terms "Great Britain" and Great Britain“ are interpreted the same.
    • In search query, when using quotation marks to specify that exact phrase should be searched, only the Latin quotation marks are interpreted in Alma as exact phrase search. The Lithuanian quotation marks do not indicate an exact phrase search. User is expected to use the regular quotation mark for exact search phrase.

    Russian letters transliteration

    Alma supports Russian letters transliteration, so that results include both Lithuanian and Russian phrases. For example: when the actual query text is kaunas, Alma will find both kaunas (Latin) and каунас (Cyrillic).

    Arabic and Persian Characters

    Similar Arabic/Persian characters are treated as the same characters for repository search, browse bibliographic headings/F3, and sorting. For example, ڤ returns ف (and vice versa) in search results. 
    The following character groups are treated as the same character and are interchangeable:
    ا – أ – إ – آ
    ى – ي - ئ
    ه - ة - ۀ
    و - ؤ
    ك – گ – ک
    ف - ڤ
    ز - ژ
    ب - پ
    ج - چ
    ق - ڨ
    • Was this article helpful?