Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Linguistic Features for Primo VE

    This section describes the various linguistic features that Primo VE supports.

    Language Detection

    In order to offer language-based services, Primo VE must first detect the language of the indexed text and the query. Currently, Primo VE can detect the following languages:

    • Latin-based: English, Spanish, Italian, German, French, and Danish.

    • Asian: Chinese, Japanese, and Korean. If the character is Chinese and the locale of Primo VE is Japanese or Korean, Primo VE uses the locale of the selected language.

    • Other languages that have a specific character range: Hebrew, Arabic, and so forth.

    Language detection is based on comparing the words of the record and the query with a dictionary. If fifty percent or more of the words match, the language is identified.

    Stop Words

    Stop words are included in phrase searches and omitted from keyword searches. For example, if a user searches for the adventures of huckleberry finn, Primo VE performs the following searches:

    the adventures of huckleberry finn

    or

    adventures huckleberry finn

    Primo VE uses stop word lists during indexing and searching.

    For a list of supported stop words, see the following files per language: Danish, English, German, Spanish, French, Italian, and Chinese.

    Author Names

    Primo VE treats words with O' apostrophe as a stop word in many Latin languages and indexes them as two separate words. This happens also for authors such as O'Leary, which is indexed as o and leary. As a result, a search for Oleary will not retrieve the same number of results as O'Leary. When users search for names that typically include apostrophes but do not include the apostrophe, Primo VE will also search for the name as if the users had included the apostrophe. For example, if the user's query is Oleary, Primo VE will change the query to search for oleary or o leary.

    Stemming

    Stemming is a process that reduces inflected (or sometimes derived) words to their stem, base, or root form. When stemming is activated, the stemmed form of the search term is added to the query with a very low boost to improve the search results.

    • Stemming works independently of the smart search and ranking mechanism that boosts adjacent query words. This allows the system to boost stemmed versions of search terms when they are not adjacent to other search terms.

    • Primo VE does not unstem terms with the exception of pluralizations. If stemming is activated, Primo VE will include the plural form of the term and give its results lower ranking. For example, a search for wild flower expands to wild AND (flower OR flowers^0.5).

    The add_keywords_stemming_to_query parameter on the Discovery Customer Settings page (Configuration Menu > Discovery > Other > Customer Settings) allows you to activate/deactivate the use of stemming.

    Synonyms

    Primo VE adds the following types of synonyms to a search query:

    • Numbers – when a search contains a digit, Primo VE adds the spelled out number to the search query. For example, Primo VE adds the word ninth to a search query for 9th.

    • US or British spelling – when a search contains a word spelled according to US or British spelling, Primo VE adds the corresponding synonym to the search query. For example, Primo VE adds the word colour to a search query for color.

    • Commonly misspelled words – for commonly misspelled words, Primo VE adds the word spelled correctly to the search query.

    • Hyphenated search terms – Local repository searches that include a search term with a hyphen will return additional results by including the term's compound word in the search. For example, searches for the term chat-room, will also include results for chat room and chatroom. Previously, the system only added results for chat room. The supported terms are based on the same terms used for CDI results (see Supported Compound Words)

    In addition to the synonym, Primo VE includes the original search term in the query. For example, if the query is fifth dimension, Primo VE searches for (fifth OR 5th) AND dimension.

    Primo VE applies a different set of Synonyms lists based on the language recognition.

    The following parameter on the Discovery Customer Settings page (Configuration Menu > Discovery > Other > Customer Settings) allows you to disable the use of synonyms:

    • disable_synonyms – When set to true, this parameter disables the use of synonyms in search queries. By default, this parameter is set to false.

    For a list of supported synonyms, refer to the following files per language: German, English, French, Hebrew, and Chinese.

    Did You Mean

    Did You Mean (DYM) suggestions improve search queries by correcting typographical errors and common misspellings in search terms to return expected search results to users. DYM suggestions are provided when the original query returns less than the threshold of 15 search results, which is not configurable.

    In the following example, the search term leukemia is missing a single character and returns no results. Users can select the suggestion that appears below the search box if they want to see results for that suggestion.

    PVE_DYM_Example.png

    Did You Mean Example

    How does DYM work?

    DYM is invoked when the original search query returns less than 15 results. If invoked, the DYM algorithm performs the following:

    1. For each search term in the original query:

      1. The following sources are checked for a match:

        • DYM index – This index is created by applying the Levenshtein distance, which is the distance between two words using a minimum number of single-character edits (such as insertions, deletions, or substitutions) to the regular titles index. For DYM, the index limits edits to a single character.

          For example, if the word leukemia is indexed in the regular title index, the following terms could return a suggestion for leukemia:

          • lekemia - The letter u is missing.

          • leekemia - The letter u has been replaced with the second e.

          • aleukemia - The letter a has been added to the beginning of the term.

        • Dictionary – The dictionary contains commonly misspelled words from which to check.

      2. For each match found, a candidate query is created by replacing the term in the original query with its match.

    2. Each candidate query is tested, and the highest-ranking candidate that returns enough results is used for the suggestion.

    Configuration Options

    This functionality is enabled out of the box, and it is not configurable. If you want to hide this functionality, add the following stanza to your CSS:

    #mainResults > div.margin-bottom-medium > md-card {
        visibility:hidden;

    Normalization of Special Characters

    Based on the configured indexing language, Primo VE normalizes special characters and characters with diacritics in the search index. Primo VE supports the following indexing languages: German (de), Icelandic (is), Lithuanian (it), Norwegian/Danish (no), Swedish (sv), Spanish (es), Polish (po), Korean (ko), Chinese (zh), and Japanese (ja).

    Specifically for Hong Kong, your library can decide to use either Chinese or TSVCC for the character conversion.

    If you want to change your indexing language to one of the supported languages, please open a Salesforce support ticket. Note that this will require your data to be re-indexed. After re-indexing is complete, searches will use the language-specific character conversions described below, regardless of the selected UI language.

    Arabic and Farsi

    Character Conversion
    0627 (آ) 0622 (ا)
    0623 (إ) 0622 (ا)
    0625 (أ) 0622 (ا)
    0649 (ئ) 0626 (ى)
    064A (ي) 0626 (ى)
    0647 (ۀ) 06C0 (ه)
    0629 (ة) 06C0 (ه)
    0642 (ڨ) 06A8 (ق)
    0648 (ؤ) 0624 (و)
    062C (چ) 0686 (ج)
    0628 (پ) 067E (ب)
    0643 (گ) 06AF (ك)
    0632 (ژ) 0698 (ز)
    0641 (ڤ) 06A4 (ف)

    Danish

    Character Conversion
    00E4 (ä) 0061 0065 (ae)
    00C4 (Ä) 0061 0065 (ae)
    00E5 (å) 0061 0061 (aa)
    00C5 (Å) 0061 0061 (aa)
    00D8 (Ø) 006F 0065 (oe)
    00F8 (ø) 006F 0065 (oe)
    00D6 (Ö) 006F 0065 (oe)
    00F6 (ö) 006F 0065 (oe)
    00E6 (æ) 0061 0065 (ae)
    00E6 (Æ) 0061 0065 (ae)
    • For search, the conversion is not bi-directional. For example, searches for the term Edgar Allan Poe will return results for Edgar Allan Pö, but searches for the term Edgar Allan Pö will not return results for Edgar Allan Poe.

    • For sort and browse, Primo VE uses the following order for the Danish alphabet:

      • a/A-z/Z (with Ü/ü sorted as Y/y)

      • æ/Æ ; ä/Ä

      • ø/Ø ; ö/Ö

      • å/Å ; aa/Aa

    German

    Character Conversion
    00DC (Ü) 0075 0065 (ue)
    00FC (ü) 0075 0065 (ue)
    00C4 (Ä) 0061 0065 (ae)
    00E4 (ä) 0061 0065 (ae)
    00D6 (Ö) 006F 0065 (oe)
    00F6 (ö) 006F 0065 (oe)

    Norwegian

    Character Conversion
    00E4 (ä) 0061 0065 (ae)
    00C4 (Ä) 0061 0065 (ae)
    00E5 (å) 0061 0061 (aa)
    00C5 (Å) 0061 0061 (aa)
    00D8 (Ø) 006F 0065 (oe)
    00F8 (ø) 006F 0065 (oe)
    00D6 (Ö) 006F 0065 (oe)
    00F6 (ö) 006F 0065 (oe)
    00E6 (æ) 0061 0065 (ae)
    00E6 (Æ) 0061 0065 (ae)
    • For search, the conversion is not bi-directional except for the special characters Å and å. For example, searches for the term Edgar Allan Poe will return results for Edgar Allan Pö, but searches for the term Edgar Allan Pö will not return results for Edgar Allan Poe.

    • For sort and browse, Primo VE uses the following order for the Norwegian alphabet:

      • a/A-z/Z (with Ü/ü sorted as Y/y)

      • æ/Æ ; ä/Ä

      • ø/Ø ; ö/Ö

      • å/Å ; aa/Aa

    Icelandic

    In Icelandic, the accents for some characters not only signify different pronunciation, but also signify a different meaning.

    Character Conversions

    The following table lists the characters that require special conversion. All other special characters with accents and umlauts, such as ä, ë, ü, û, è, are converted to their “default” value (a, e, u, and so forth). 

    Character Conversion
    00C5 (Å) 0041 0041 (AA)
    00E5 (å) 0061 0061 (aa)
    00D8 (Ø) 00D6 (Ö)
    00F8 (ø) 00F6 (ö)

    Sort Order

    The following table lists the sort order of characters in Icelandic. The characters highlighted in yellow are not converted and are sorted as indicated, which means that searches for these special Icelandic letters should return matches for that special letter. The 'regular' character should not be included. Examples:

    • When searching for “sál”, the word “sal" should not be returned.

    • When searching for “skola” , the word “skóla" should not be returned.

    Capital Character Small Character
    A (0041) a (0061)
    Á (00C1) (00E1)
    B (0042) b (0062)
    C (0043) c (0063)
    D (0044) d (0064)
    Ð (00D0) ð (00F0)
    E (0045) e (0065)
    É (00C9) é (00E9)
    F (0046) f (0066)
    G (0047) g (0067)
    H (0048) h (0068)
    I (0049) i (0069)
    Í (00CD) í (00ED)
    J (004A) j (006A)
    K (004B) k (006B)
    L (004C) l (006C)
    M (004D) m (006D)
    N (004E) n (006E)
    O (004F) o (006F)
    Ó (00D3) ó (00F3)
    Ø (00D8) ø (00F8)
    P (0050) p (0070)
    Q (0051) q (0071)
    R (0052) r (0072)
    S (0053) s (0073)
    T (0054) t (0074)
    U (0055) u (0075)
    Ú (00DA) ú (00FA)
    V (0056) v (0076)
    W (0057) w (0077)
    X (0058) x (0078)
    Y (0059) y (0079)
    Ý (00DD) ý (00FD)
    Z (005A) z (007A)
    Þ (00DE) þ (00FE)
    Æ (00C6) æ (00E6)
    Ö (00D6) ö (00F6)
    • The search results are sorted using the Icelandic alphabetical sort order above.

    • The filing values for browsing will not normalize the diacritics of the highlighted special characters so that the browsing will correspond to the Icelandic  alphabetical sort order.

    • Data that contains characters or diacritics that are not listed above will behave according to the default English language settings.

    Lithuanian

    The Lithuanian language contains 18 special letters that are indexed as themselves : ą, č, ę, ė, į, š, ų, ū, ž, Ą, Č, Ę, Ė, Į, Š, Ų, Ū, and Ž.

    Polish

    Character Conversion
    0104 (Ą) 0061 0061 (aa)
    0105 (ą) 0061 0061 (aa)
    0106 (Ć) 0063 0063 (cc)
    0107 (ć) 0063 0063 (cc)
    0118 (Ę) 0065 0065 (ee)
    0119 (ę) 0065 0065 (ee)
    0141 (Ł) 006C 006C (ll)
    0142 (ł) 006C 006C (ll)
    0143 (Ń) 006E 006E (nn)
    0144 (ń) 006E 006E (nn)
    00D3 (Ó) 006F 006F (oo)
    00F3 (ó) 006F 006F (oo)
    015A (Ś) 0073 0073 (ss)
    015B (ś) 0073 0073 (ss)
    0179 (Ź) 007A 007A (zz)
    017A (ź) 007A 007A (zz)
    017B (Ż) 007A 0065 (ze)
    017C (ż) 007A 0065 (ze)

    Spanish

    Character Conversion
    00D1 (Ñ) 00F1 (ñ)
    00F1 (ñ) 00F1 (ñ)
    00C7 (Ç) 00E7 (ç)
    00E7 (ç) 00E7 (ç)
    0140 (ŀ) 0140 (ŀ)
    013F (Ŀ) 0140 (ŀ)

    Swedish

    Character Conversion
    00E4 (ä) 0061 0065 (ae)
    00C4 (Ä) 0061 0065 (ae)
    00E5 (å) 0061 0061 (aa)
    00C5 (Å) 0061 0061 (aa)
    00D8 (Ø) 006F 0065 (oe)
    00F8 (ø) 006F 0065 (oe)
    00D6 (Ö) 006F 0065 (oe)
    00F6 (ö) 006F 0065 (oe)
    00E6 (æ) 0061 0065 (ae)
    00E6 (Æ) 0061 0065 (ae)
    • For search, the conversion is not bi-directional. For example, searches for the term Edgar Allan Poe will return results for Edgar Allan Pö, but searches for the term Edgar Allan Pö will not return results for Edgar Allan Poe.

    • For sort and browse, Primo VE uses the following order for the Swedish alphabet:

      • a/A-z/Z (with æ/Æ sorted as ae/Ae; Ü/ü is sorted as Y/y)

      • å/Å

      • ä/Ä

      • ö/Ö ; ø/Ø

    • Was this article helpful?