Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Linguistic Features for Primo VE

    This section describes the various linguistic features that Primo supports.

    Language Detection

    In order to offer language-based services, Primo must first detect the language of the indexed text and the query. Currently, Primo can detect the following languages:

    • Latin-based: English, Spanish, Italian, German, French, and Danish.

    • Asian: Chinese, Japanese, and Korean. If the character is Chinese and the locale of Primo is Japanese or Korean, Primo uses the locale of the selected language.

    • Other languages that have a specific character range: Hebrew, Arabic, and so forth.

    Language detection is based on comparing the words of the record and the query with a dictionary. If fifty percent or more of the words match, the language is identified.

    Stop Words

    Stop words are included in phrase searches and omitted from keyword searches. For example, if a user searches for the adventures of huckleberry finn, Primo performs the following searches:

    the adventures of huckleberry finn

    or

    adventures huckleberry finn

    Primo uses stop word lists during indexing and searching.

    Author Names

    Primo treats words with O' apostrophe as a stop word in many Latin languages and indexes them as two separate words. This happens also for authors such as O'Leary, which is indexed as o and leary. As a result, a search for Oleary will not retrieve the same number of results as O'Leary. When users search for names that typically include apostrophes but do not include the apostrophe, Primo will also search for the name as if the users had included the apostrophe. For example, if the user's query is Oleary, Primo will change the query to search for oleary or o leary.

    Stemming

    Stemming is a process that reduces inflected (or sometimes derived) words to their stem, base, or root form. When stemming is activated, the stemmed form of the search term is added to the query with a very low boost to improve the search results.

    • Stemming works independently of the smart search and ranking mechanism that boosts adjacent query words. This allows the system to boost stemmed versions of search terms when they are not adjacent to other search terms.

    • Primo VE does not unstem terms with the exception of pluralizations. If stemming is activated, Primo VE will include the plural form of the term and give its results lower ranking. For example, a search for wild flower expands to wild AND (flower OR flowers^0.5).

    The maximum_results_for_stemming parameter on the Discovery Customer Settings page (Configuration Menu > Discovery > Other > Customer Settings) allows you to activate/deactivate the use of stemming. Currently, stemming is always active and cannot be deactivated. It is recommended that you set this parameter to 0 to prevent performance issues.

    Synonyms

    Primo adds the following types of synonyms to a search query:

    • Numbers – when a search contains a digit, Primo adds the spelled out number to the search query. For example, Primo adds the word ninth to a search query for 9th.

    • US or British spelling – when a search contains a word spelled according to US or British spelling, Primo adds the corresponding synonym to the search query. For example, Primo adds the word colour to a search query for color.

    • Commonly misspelled words – for commonly misspelled words, Primo adds the word spelled correctly to the search query.

    In addition to the synonym, Primo includes the original search term in the query. For example, if the query is fifth dimension, Primo searches for (fifth OR 5th) AND dimension.

    Primo applies a different set of Synonyms lists based on the language recognition.

    The following parameter on the Discovery Customer Settings page (Configuration Menu > Discovery > Other > Customer Settings) allows you to disable the use of synonyms:

    • disable_synonyms – When set to true, this parameter disables the use of synonyms in search queries. By default, this parameter is set to false.

    Did You Mean

    Did You Mean (DYM) suggestions improve search queries by correcting typographical errors and common misspellings in search terms to return expected search results to users. DYM suggestions are provided when the original query returns less than the threshold of 15 search results, which is not configurable.

    In the following example, the search term leukemia is missing a single character and returns no results. Users can select the suggestion that appears below the search box if they want to see results for that suggestion.

    PVE_DYM_Example.png

    Did You Mean Example

    How does DYM work?

    DYM is invoked when the original search query returns less than 15 results. If invoked, the DYM algorithm performs the following:

    1. For each search term in the original query:

      1. The following sources are checked for a match:

        • DYM index – This index is created by applying the Levenshtein distance, which is the distance between two words using a minimum number of single-character edits (such as insertions, deletions, or substitutions) to the regular titles index. For DYM, the index limits edits to a single character.

          For example, if the word leukemia is indexed in the regular title index, the following terms could return a suggestion for leukemia:

          • lekemia - The letter u is missing.

          • leekemia - The letter u has been replaced with the second e.

          • aleukemia - The letter a has been added to the beginning of the term.

        • Dictionary – The dictionary contains commonly misspelled words from which to check.

      2. For each match found, a candidate query is created by replacing the term in the original query with its match.

    2. Each candidate query is tested, and the highest-ranking candidate that returns enough results is used for the suggestion.

    Configuration Options

    This functionality is enabled out of the box, and it is not configurable.

    Normalization of Special Characters

    Based on the configured indexing language, Primo VE normalizes special characters and characters with diacritics in the search index. Primo VE supports the following indexing languages: German (de), Norwegian (no), Danish (da), Swedish (sv) , Spanish (es), Polish (po), Korean (ko), Chinese (zh), and Japanese (ja).

    Specifically for Hong Kong, your library can decide to use either Chinese or TSVCC for the character conversion.

    If you want to change your indexing language to one of the supported languages, please open a Salesforce support ticket. Note that this will require your data to be re-indexed. After re-indexing is complete, searches will use the language-specific character conversions described below, regardless of the selected UI language.

    German

    Character Conversion

    00DC (Ü)

    0075 0065 (ue)

    00FC (ü)

    0075 0065 (ue)

    00C4 (Ä)

    0061 0065 (ae)

    00E4 (ä)

    0061 0065 (ae)

    00D6 (Ö)

    006F 0065 (oe)

    00F6 (ö)

    006F 0065 (oe)

    Norwegian

    Character Conversion

    00E4 (ä)

    0061 0065 (ae)

    00C4 (Ä)

    0061 0065 (ae)

    00E5 (å)

    0061 0061 (aa)

    00C5 (Å)

    0061 0061 (aa)

    00D8 (Ø)

    006F 0065 (oe)

    00F8 (ø)

    006F 0065 (oe)

    00D6 (Ö)

    006F 0065 (oe)

    00F6 (ö)

    006F 0065 (oe)

    Danish

    Character Conversion

    00E4 (ä)

    0061 0065 (ae)

    00C4 (Ä)

    0061 0065 (ae)

    00E5 (å)

    0061 0061 (aa)

    00C5 (Å)

    0061 0061 (aa)

    00D8 (Ø)

    006F 0065 (oe)

    00F8 (ø)

    006F 0065 (oe)

    00D6 (Ö)

    006F 0065 (oe)

    00F6 (ö)

    006F 0065 (oe)

    Swedish

    Character Conversion

    00E4 (ä)

    0061 0065 (ae)

    00C4 (Ä)

    0061 0065 (ae)

    00E5 (å)

    0061 0061 (aa)

    00C5 (Å)

    0061 0061 (aa)

    00D8 (Ø)

    006F 0065 (oe)

    00F8 (ø)

    006F 0065 (oe)

    00D6 (Ö)

    006F 0065 (oe)

    00F6 (ö)

    006F 0065 (oe)

    Spanish

    Character Conversion

    00D1 (Ñ)

    00F1 (ñ)

    00F1 (ñ)

    00F1 (ñ)

    00C7 (Ç)

    00E7 (ç)

    00E7 (ç)

    00E7 (ç)

    0140 (ŀ)

    0140 (ŀ)

    013F (Ŀ)

    0140 (ŀ)

     Polish

    Character Conversion

    0104 (Ą)

    0061 0061 (aa)

    0105 (ą)

    0061 0061 (aa)

    0106 (Ć)

    0063 0063 (cc)

    0107 (ć)

    0063 0063 (cc)

    0118 (Ę)

    0065 0065 (ee)

    0119 (ę)

    0065 0065 (ee)

    0141 (Ł)

    006C 006C (ll)

    0142 (ł)

    006C 006C (ll)

    0143 (Ń)

    006E 006E (nn)

    0144 (ń)

    006E 006E (nn)

    00D3 (Ó)

    006F 006F (oo)

    00F3 (ó)

    006F 006F (oo)

    015A (Ś)

    0073 0073 (ss)

    015B (ś)

    0073 0073 (ss)

    0179 (Ź)

    007A 007A (zz)

    017A (ź)

    007A 007A (zz)

    017B (Ż)

    007A 0065 (ze)

    017C (ż)

    007A 0065 (ze)

    • Was this article helpful?