Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Match and Merge in CDI

    Return to menu

    Introduction

    The Central Discovery Index (CDI) content comes from a variety of sources, including publishers, content aggregators, open-access repositories, institutional repositories, and more. In total we ingest content from over 2,000 Data sources. As we get content from a wide variety of sources, we may have more than one record for a given citation. Displaying all of these separately can confuse users and make it harder for the user to find what they are looking for in the results.

    With the match and merge process we combine all records for a citation together as a single search result while at the same time pulling all the possible metadata to drive discovery for a better user experience. The new combined record is called the logical record, the original records that contribute to the logical record are the physical records.

    Our metadata librarians work directly with publishers to secure the best possible metadata for each item, map the records to the index schema, then merge metadata elements to create superior records. We provide full-text indexing for better discovery, and we index and preserve subject terms from the source materials indexed. In addition, we provide value-add features such as disciplines, journal authority information, authoritative Ulrich's peer-review data, and Web of Science citation counts -- all to create a single, rich, index record. We map the content to many different content and resource types, while we review and add more types as content diversifies.

    Product, Development, Content, and Support teams work together to periodically evaluate and adjust the rules and filters to address any issues. Also, our content and support librarians can, at times, make edits to correct metadata errors, or work with a provider to address systemic issues.

    Content in the Central Discovery Index is updated on a schedule appropriate to the content, ranging from daily, weekly, bi-weekly, monthly, or quarterly, depending on the frequency of updates made available by the provider. Note that when you first activate or change a resource subscription details in Alma, SFX, Client Center, or Intota, it takes about 48 hours for index to recognize your subscription changes.

    Content Types Using the Match and Merge Process

    Records of many but not all resource types are matched and merged—for example, audio and video material is not merged. We match and merge records of the following resource types:

    • Article

    • Book

    • Book review

    • Conference proceeding

    • Dissertation

    • Journal

    • Newspaper

    • Newspaper article

    • Reference entry

    Conditions for Matching Records to Merge Them

    All three of the following conditions must match before we merge two records:

    1. ID: At least one of the following unique identification numbers must match:

      ISBN to ISBN
      EISBN to EISBN
      ISSN to ISSN
      EISSN to EISSN
      DOI to DOI
      OCLC to OCLC
      LCCN to LCCN

    2. Publication date year: The year must match within +/- one year and the record must have only one publication year.

      Match and Merge logic for ebooks prevents separate records from merging into a single result if the difference in their publication dates is greater than three years. For example, if setting a facet of 2014 - Present, you should not see book citations from the 1990s, though you might see some citations from 2012 or 2013. Three years' difference allows for slight variations in publication dates given to ebooks by different publishers so that the occurrence of duplicate results can be minimized.

    3. Title: There must be a legitimate title-match. We use fuzzy logic for title matching to overcome normal title variations such as with the symbol & versus the word and.

    Exceptions:

    • Government-document records are not merged with book or e-book records.

    • The URIs are different.

    • The languages are different.

    • The Source type is an Institutional Repository.

    • Overmatch: there is a list of titles that are so common that they are excluded from merging (for example, Poem).

    The following content providers have opted out of having their records merged in the index:

    • Artstor

    • CABI Direct content

    • CAIRN International Journals

    Transitive Merge

    Transitive merges occur when several records are merged, where both records A and C match with record B as a result of metadata in record B that unites records A and C, which would otherwise not be matched.

    CDI_TransitiveMerge.png

    Transitive Merge of Records

    The following image shows a visual representation of key elements in the merge process:

    CDI_VisualRepresentation.png

    Visual Representation of the Merge Process

    Match and Merge FAQ

    1. Do my local records participate in match and merge?

      Local records in the local Primo indexes will not match and merge with records from the central index

    2. Does match and merge give preference to certain providers?

      No. Match and merge is vendor-neutral.

    3. Will identical values from a specific field coming from the various physical records comprising the logical records be merged (deduped) for display, search, and so forth?

      Yes. For example, identical subject headings will be deduped so that you see only one instance of the heading, not multiple instances.

    4. How can I submit suggestions for improvement to the match and merge process?

      Please submit your ideas to the Idea Exchange.

    • Was this article helpful?