Summon: Match and Merge in CDI - How Records Are Combined to Appear in the Search Results as One Enriched Record
- Product: Summon
Introduction
The content in the Central Discovery Index (CDI) comes from a variety of sources. This includes publishers, content aggregators, open-access repositories, institutional repositories, and more. In total we ingest content from over 2,000 data sources. Because of this, we may have more than one record for a given citation. Displaying all of these separately can confuse users and make it harder for them to find what they are looking for in the results.
With the match and merge process we combine all records for a citation together as a single search result. At the same time, we pull all the possible metadata to drive discovery for a better user experience. The new combined record is called the logical record. The original records that contribute to the logical record are called the physical records.
Our metadata librarians work directly with publishers to secure the best possible metadata for each item, map the records to the index schema, then merge metadata elements to create superior records. We provide full-text indexing for better discovery, and we index and preserve subject terms from the source materials indexed. In addition, we provide value-add features such as disciplines, journal authority information, authoritative Ulrich's peer-review data, and Web of Science citation counts -- all to create a single, rich, index record.
Product, Development, Content, and Support teams work together to periodically evaluate and adjust the rules and filters to address any issues. Our content and support librarians can, at times, make edits to correct metadata errors, or work with a provider to address systemic issues.
Content in the Central Discovery Index is updated on a schedule appropriate to the content, ranging from daily, weekly, bi-weekly, monthly, or quarterly, depending on the frequency of updates made available by the provider. Note that when you first activate or change a resource subscription details in Alma, SFX, Client Center, or Intota, it takes about 48 hours for index to recognize your subscription changes.
Content Types Using the Match and Merge Process
Records of many but not all content types are matched and merged, for example audio and video material is not merged. We match and merge records of the following content types:
-
Book
-
Book review
-
Conference proceeding
-
Dissertation
-
eBook
-
eJournal
-
Journal
-
Journal article
-
Magazine
-
Magazine article
-
Newspaper
-
Newspaper article
-
Reference
-
Trade publication article
Conditions for Matching Records to Merge Them
All three of the following conditions must match before we merge two records:
-
ID: At least one of the following unique identification numbers must match:
ISBN to ISBN
EISBN to EISBN
ISSN to ISSN
EISSN to EISSN
DOI to DOI
OCLC to OCLC
LCCN to LCCN -
Publication date year: The year must match within +/- one year and the record must have only one publication year.
Match and Merge logic for ebooks prevents separate records from merging into a single result if the difference in their publication dates is greater than three years. For example, if setting a facet of 2014 - Present, you should not see book citations from the 1990s, though you might see some citations from 2012 or 2013. Three years' difference allows for slight variations in publication dates given to ebooks by different publishers so that the occurrence of duplicate results can be minimized.
-
Title: There must be a legitimate title-match. We use fuzzy logic for title matching to overcome normal title variations such as with the symbol & versus the word and.
Exceptions:
-
Government-document records are not merged with book or e-book records.
-
The URIs are different.
-
The languages are different.
-
The Source type is an Institutional Repository.
-
Overmatch: there is a list of titles that are so common that they are excluded from merging (for example, Poem).
The following content providers have opted out of having their records merged in the index:
-
Artstor
-
CABI Direct content
-
CAIRN International Journals
Transitive Merge
Transitive merges occur when several records are merged, where both records A and C match with record B as a result of metadata in record B that unites records A and C, which would otherwise not be matched.
The following image shows a visual representation of key elements in the merge process:
Match and Merge FAQ
-
Do my local records participate in match and merge?
Local records uploaded by Summon users will match and merge by default. Summon clients can completely opt out of match and merge. See this article for more information.
-
Are print books sitting on the library shelf merged with their electronic version?
Yes, the match and merge functions are intended to match individual catalog records for the same item and merge them together.
-
Does match and merge give preference to certain providers?
No, match and merge is vendor-neutral.
-
Will identical values from a specific field coming from the various physical records comprising the logical records be merged (deduped) for display, search, and so forth?
Yes. For example, identical subject headings will be deduplicated so that you see only one instance of the heading, not multiple instances.
-
How can I submit suggestions for improvement to the match and merge process?
Please submit your ideas to the Idea Exchange. For more information, see Summon: Ideas Exchange Website for Enhancement Requests.