Summon: Relevance Ranking
- Product: Summon
How does the Summon service determine the order of search results?
Relevance ranking in Summon occurs according to a continuously tuned, proprietary algorithm and is built on a foundation of the following building blocks: Dynamic Rank, Static Rank, and Known Item Searching. Summon provides true relevance ranking across all content in all languages.
Dynamic Rank
This represents how well the user's query matches the record. Dynamic Rank factors include:
-
Field weighting – when a query term or phrase matches in a field of a record, a score is generated according to the importance of the field. For example, Title, Subtitle and SubjectTerms are the highest weighted fields. The Author and Abstract fields are weighted lower than these, but higher than other metadata fields. The FullText field is weighted the lowest.
-
Term weighting – matches on rare terms (words) are weighted higher than matches on common terms. For example, if a given query is yoruba books, the less common term "yoruba" has a higher influence than the common term "book". Users can use the "^" operator to modify the weight of a term. For example, the query yoruba books^2 puts twice as much emphasis as the default weight on the term "book", and the query yoruba books^0.5 reduces the weight of the term "books" to a half of the default weight.
-
Term frequency and field length – the number of a matching term repeated within a field is also considered. For example, if a given query is nanobiotechnology, an abstract that contains five occurrences of the term would score higher than an abstract of the same length that contains the term only once. Similarly, the length of the field where a match occurs is considered in determining the weight of the match.
-
Verbatim match boost – a given query term could match a term in a record via native language search features, such as stemming, lemmatization, character normalization, etc. (non-verbatim matches). Such non-verbatim matches are weighted less than verbatim matches where the query term is exactly the same as the indexed term. For further details see the Verbatim Match Boost section of Native Language Search.
-
Phrase and proximity match boost – if a given query contains multiple terms and double quotes are not used, matches on the exact phrases (phrase match) and close phrase matches (proximity matches) are given a boost in the score. For example, if a given query is american history (without double quotes), the exact phrase match "American history" scores higher than the non-exact phrase match (proximity match) "American automobile history", and "American automobile history" scores higher than a match on "American" and "history" appearing in different fields.
-
Exact title or title+subtitle match boost – the exact title match boost feature boosts scores for cases where a given query matches the title or title+subtitle. This helps known item searches consisting of a title or title+subtitle.
-
Known item search boost – in addition to the exact title match boost feature above, the known item search boost feature emphasizes matches where a given query contains a combination of common elements of known item searches, such as, title, subtitle, author, publication title, and so on. For example, a query an inconvenient truth global warming al gore (without double quotes) boosts matches on the books titled "An Inconvenient Truth: The Planetary Emergency of Global Warming and What We Can Do About It" and "An Inconvenient Truth: The Crisis of Global Warming" authored by Al Gore.
-
Full text proximity boost – if all query terms appear within a 200-word proximity in the full text field, a score boost is applied.
These factors work together to generate a final Dynamic Rank "score" for the match between the query and each record.
Static Rank
This represents the value of each item, and does not pertain to the user's query terms. Static Rank factors include:
-
Content type – items are weighted according to their content types. For example, journal articles are weighted higher than magazine articles or newspaper articles; Books are weighted higher than book reviews; Journals are weighted higher than conference proceedings, and so on.
-
Publication date – recent items are weighted higher than older items. Summon uses carefully designed mathematical functions specific to each content type to maximize the effectiveness of this factor. For example, the penalty for having an old publication date is higher for journal articles than for books.
-
Scholarly/Peer Review – articles from "scholarly" or "peer reviewed" journals are boosted.
-
Highlight local collections – items in the institution's catalog or institutional repositories are boosted.
-
Citation counts – citation counts are used to reward publications with high citation counts.
-
Anonymous author – Anonymous author items are demoted. Anonymous items may include editor's notes, letter's to the editor, obituaries, and other non-primary articles in journals.
Each record's Static Rank score is determined as a combination of scores calculated from these factors, using carefully designed mathematical functions. For example, a journal article published 5 years ago with 100 citations would probably have a higher Static Rank score than a journal article published 6 months ago with 0 citations. In this case, the benefit of the high citation counts of the first record outweighs the benefit of the recency of the second record.
The scores from Dynamic Rank and Static Rank are then combined to determine the relevance score of each record for the given query. In addition, the preferred language feature boosts records in the language that matches the user's UI language choice. This feature is useful for cases where a patron has multilingual content in their holdings.
The ranking of a search result set is determined by the final relevance scores of the records in the result set.
Summon's relevance ranking algorithm is tuned to provide best search experience for both known item searching and other types of searching (e.g., subject searching, exploratory searching, topical searching, existence searching, unknown item searching, etc.). Additionally, there are aspects of Summon relevance that assist the user community comprised of the novice researcher, the professional researcher and all user types in-between. For example, short and general topical queries (for example linguistics, global warming) tend to return more books, eBooks, references and journals among the top results, and long and specific topical queries (for example linguistics universal grammar, global warming Kyoto protocol) tend to return more journal articles among the top results.
Summon overlays this foundation with a regimen of judgments to ensure that relevance as a whole remains strong as individual pieces of the system are improved. The relevance ranking system in Summon is shared by all customers, and is not customizable for individual institutions.
Known Item Searching
Known item searching presents unique challenges that are not encountered in topical searching. For example, users may inadvertently misspell words in the title or author names of an article, or they may paste a portion of a citation into Summon’s search box, including extra terms such as Volume and Issue. In such scenarios, Summon must still prioritize and return the known items among its top search results.
Summon incorporates features to address such challenges and provide exceptional user experience for known item searching. Below are some of its features.
-
Identifier Detection – Summon’s identifier detection mechanism detects specific identifiers like DOI, ISBN and ISSN within a search query, and boosts results containing matches in these identifier fields. For example, when a user searches for 10.1037/xlm0000452, Summon recognizes it as a DOI, and elevates the corresponding article with that DOI in the search results.
-
Author Name Detection – Summon’s author name detection mechanism employs a database of author names and associated information to identify potential author names in a search query, and elevates articles written by those authors. This helps refine the search relevance for known item searches, including those involving "title + author" queries.
-
Citation Search Detection – Summon uses an open source AI tool called Grobid to parse the search query and identify different metadata elements in the citation, including author, title, date, volume and issue. Employing these parsed elements, Summon searches for the cited items, aligning them with relevant fields while allowing for optional matching of certain elements.
FAQs (Frequently Asked Questions)
-
Does Summon's relevance ranking algorithm use publication dates so that recent publications are ranked higher?
Yes, Summon's relevance ranking algorithm uses publication dates so that recent publications are ranked higher if all other factors contributing to the relevance ranking are equal. Such a claim should have the disclaimer "if all other factors contributing to the relevance ranking are equal". In Summon's case, the other factors are almost never equal. For example, citation counts may be different. The "scholarly" attribute may be different. The "holdings" status may be different. How well the query match in the record (Dynamic Rank) may be different. Summon's relevance ranking algorithm takes all of those factors into account, and makes a decision about how to rank the result set.
-
How does Summon's relevance algorithm treat subject terms matching?
The subject terms field is one of the highest weighted fields in Summon's relevance ranking algorithm. When considering field weighting, it's important to note that known item searching and other types of searching (subject searching, topical searching, etc) have different characteristics. For example, a known item search typically contains the whole or a part of the title (and subtitle), and the title matching is very important for known item searching. On the other hand, an exploratory subject or topical search would benefit from the matching either in the title (and subtitle) field or in the subject terms field; items matching the query in the title field and in the subject terms field tend to have different characteristics, and they both are valuable. Summon's relevance ranking algorithm is designed to support all types of searches.
-
How can Summon's relevance ranking algorithm handle all of those factors? Isn't it better to use a smaller number of factors, so that they don't become out of control?
Summon's relevance algorithm uses various approaches to ensure that all the factors are well balanced – namely, each of them can effectively contribute to the ranking, and one factor does not overwhelm other factors. Separating the calculations for the Dynamic Rank factors and the Static Rank factors is one of them. This ensures that top results have high scores from both Dynamic Rank (i.e., the query matches the record well) and Static Rank (i.e., the record has a high value), not just one of them.
-
I understand that the explanations in this page is for English queries. How does Summon handle relevance ranking for (insert your language name)?
Everything explained in this page applies to non-English language searches as well. In addition, the language-specific native language search features, such as stemming/lemmatization, character normalization, spelling normalization, etc., expand the search, and the verbatim match boost feature ensures that the relevance ranking takes those into account. The verbatim match boost feature uses a very detailed system for penalizing various mismatches between the query and the indexed string. For example, the non-verbatim match between the two character variations of the word Yokohama in Japanese – 横浜 and 横濱 – is penalized less than the non-verbatim match between two scripts -- 横濱 (in Kanji) and よこはま (in Hiragana). These penalties are defined for each process in each language, and are applied to each term.
-
How does Summon's developer team use user feedback on relevance ranking?
When a user issues a search on Summon and clicks on any of the results, the user is already providing feedback to Summon's development team. The user query and click through logs are the basis of the relevance metrics, such as query/session abandonment rates, mean reciprocal rank (MRR), and discounted cumulative gain (DCG), used by Summon's developer team, and they provide valuable information. However, there is nothing more useful than user reported relevance issues. As long as a reported search case is reproducible, the team can analyze and pinpoint the cause of a specific issue. All of those reported problematic search cases are analyzed and recorded in the team's relevance issue database, and they will be used in designing and evaluating the future versions of Summon's relevance ranking algorithm.
-
Why do I receive an increased number of results when I append more words to a known item search query?
Summon's known item search feature adjusts known-item searches (such as “title+author” searches and citation searches) to prevent search failures caused by misspellings or variations in formatting. As a result, users may observe an expansion in search results upon adding additional search terms to known item queries.
Providing Feedback
We appreciate your feedback in order to help continue to tune the relevance algorithm. The best way to provide feedback is to send it to summon.relevance.feedback@proquest.com. When submitting feedback, we would appreciate it if you follow the below template:
-
Email subject line: Relevance feedback from <your institution name>
-
In the body of the email please include:
-
Your name and email address
-
Query strings and other information, such as refinement and facet settings
-
URLs of the problematic search cases
-
Screenshots are helpful, additional information
-
-
Explanation of the issue
All reported search cases will be analyzed and added to our relevance issue database, and will be considered in our ongoing and future relevance improvement efforts. Please note that, in general, you will not receive a response to messages sent to the above Summon relevance email address. If you require a response, please report your issue via the Ex Libris Support Portal (accessible via the More Sites drop-down menu above).