This section describes the various linguistic features that Primo supports.
In order to offer language-based services, Primo must first detect the language of the indexed text and the query. Currently, Primo can detect the following languages:
Latin-based: English, Spanish, Italian, German, French, and Danish.
Asian: Chinese, Japanese, and Korean. If the character is Chinese and the locale of Primo is Japanese or Korean, Primo uses the locale of the selected language.
Other languages that have a specific character range: Hebrew, Arabic, and so forth.
Language detection is based on comparing the words of the record and the query with a dictionary. If fifty percent or more of the words match, the language is identified.
Stop words are included in phrase searches and omitted from keyword searches. For example, if a user searches for the adventures of huckleberry finn, Primo performs the following searches:
Primo uses stop word lists during indexing and searching.
Primo treats words with O' apostrophe as a stop word in many Latin languages and indexes them as two separate words. This happens also for authors such as O'Leary, which is indexed as o and leary. As a result, a search for Oleary will not retrieve the same number of results as O'Leary. When users search for names that typically include apostrophes but do not include the apostrophe, Primo will also search for the name as if the users had included the apostrophe. For example, if the user's query is Oleary, Primo will change the query to search for oleary or o leary.
Stemming is a process that reduces inflected (or sometimes derived) words to their stem, base, or root form. Primo uses stemming when a search returns fewer results than 25. If the search returns fewer results than this amount, Primo stems the search terms using the Kstem stemmer.
Primo identifies the language of the query and applies relevant stemming logic which can be different based on the language. In cases where Primo cannot identify the language of the query terms it will look at the user's interface language to define the language logic to apply.
Primo does not unstem terms with the exception of pluralizations. If the result set is lower than the default threshold, Primo will pluralize terms, ranking their results lower. For example, a search for wild flower expands to wild AND (flower OR flowers^0.5).
The following parameter on the Discovery Customer Settings page (Configuration Menu > Discovery > Other > Customer Settings) allows you to disable or limit the use of stemming:
maximum_results_for_stemming – When set to true, this parameter sets the maximum number of results returned before the system uses stemming to return more results. If this parameter is set to 0, stemming is not used to return results. By default, this parameter is set to 25 results.
Primo adds the following types of synonyms to a search query:
Numbers – when a search contains a digit, Primo adds the spelled out number to the search query. For example, Primo adds the word ninth to a search query for 9th.
US or British spelling – when a search contains a word spelled according to US or British spelling, Primo adds the corresponding synonym to the search query. For example, Primo adds the word colour to a search query for color.
Commonly misspelled words – for commonly misspelled words, Primo adds the word spelled correctly to the search query.
In addition to the synonym, Primo includes the original search term in the query. For example, if the query is fifth dimension, Primo searches for (fifth OR 5th) AND dimension.
Primo applies a different set of Synonyms lists based on the language recognition.
The following parameter on the Discovery Customer Settings page (Configuration Menu > Discovery > Other > Customer Settings) allows you to disable the use of synonyms:
disable_synonyms – When set to true, this parameter disables the use of synonyms in search queries. By default, this parameter is set to false.
Did You Mean
Did You Mean (DYM) suggestions improve search queries by correcting typographical errors and common misspellings in search terms to return expected search results to users. DYM suggestions are provided when the original query returns less than the threshold of 15 search results, which is not configurable.
In the following example, the search term leukemia is missing a single character and returns no results. Users can select the suggestion that appears below the search box if they want to see results for that suggestion.
How does DYM work?
DYM is invoked when the original search query returns less than 15 results. If invoked, the DYM algorithm performs the following:
For each search term in the original query:
The following sources are checked for a match:
DYM index – This index is created by applying the Levenshtein distance, which is the distance between two words using a minimum number of single-character edits (such as insertions, deletions, or substitutions) to the regular titles index. For DYM, the index limits edits to a single character.
For example, if the word leukemia is indexed in the regular title index, the following terms could return a suggestion for leukemia:
lekemia - The letter u is missing.
leekemia - The letter u has been replaced with the second e.
aleukemia - The letter a has been added to the beginning of the term.
Dictionary – The dictionary contains commonly misspelled words from which to check.
For each match found, a candidate query is created by replacing the term in the original query with its match.
Each candidate query is tested, and the highest-ranking candidate that returns enough results is used for the suggestion.
This functionality is not configurable, but you must first execute the Build ranking structures job (Admin Menu > Manage Jobs and Sets > Monitor Jobs > Scheduled) to enable it. Ex Libris recommends that you run this job weekly to keep the index current.
Currently, there is no way to disable DYM in Primo VE after it has been enabled.