Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Match and Merge in CDI

    Return to menu

    Introduction

    Content in the Central Discovery Index (CDI) comes from a variety of sources, including publishers, content aggregators, open access repositories, institutional repositories, and more. In total we ingest content from over 2,000 sources, and as such, we sometimes have more than one record for a given citation. Displaying all of these records separately can confuse users and make it harder for the user to find what they are looking for in the results.

    However, we also strive to present the user with the highest quality, most detailed, most extensive metadata available, as clearly as possible, while also providing the user with the best available method to access the content in question.  In order to leverage as much of the available metadata as possible for this purpose, while also minimizing duplication and ambiguity, we employ a process we call Match & Merge.

    Match & Merge is essentially a set of criteria that controls which records ("physical records") can and cannot be combined into composite records that we call "logical records".  In these logical records, the available metadata from the participating physical records is synthesized such that a single record can be presented to the end user that is more complete and robust than any of the individual physical records.  Below is a picture showing an abstract representation of this process.

    CDI_VisualRepresentation.png

    Visual Representation of the Merge Process
    Match & Merge: rules

    Match & Merge generally relies on various kinds of identifiers: if two or more records share the same identifier, they can be merged, provided that they also satisfy other criteria, depending on the identifier.  Most content types are eligible for Match & Merge, although some are excluded in certain cases and some are excluded altogether (see "Match & Merge filters", below).

    Please note that a "fuzzy title match" is a comparison of two records' combined DocumentTitle and DocumentSubtitle fields, disregarding case, whitespace, punctuation, diacritics, and other special characters.

    External Identifiers:

    • DOI – also requires fuzzy title match; does not apply to Journal & eJournal
    • PMID – also requires fuzzy title match; does not apply to Journal & eJournal
    • ISBN/EISBN – also requires fuzzy title match; years of publication must be within one year of each other
    • ISSN/EISSN – PublicationPlace or publication year must also match; applies to Journal & eJournal only
    • LCCN – for Journal/eJournal, PublicationPlace or publication year must also match; for Book, eBook, Dissertation, and Government Document, fuzzy title match required, and year of publication must also match
    • OCLC – same additional criteria as for LCCN

    Internal identifiers:

    Records with an ISSN, EISSN, ISBN, or EISBN are assigned an identifier, internal to Ex Libris, that corresponds to the relevant title.  Match & Merge is done using this identifier in different circumstances, and the process works differently for title- and publication-level records.

    • Publication level: Applies to Newspaper, Magazine, Journal, eJournal, Book, and eBook.  Only requires the title-level identifiers to match.
    • Article level: Applies to Journal Article, Magazine Article, Newspaper Article, Trade Publication Article, Book Review, and Conference Proceeding.  In addition to the title-level identifier, the DocumentTitle, year of publication, Volume, Issue, and StartPage are all required to match.

    Other scenarios:

    • Reference records can merge based only on a fuzzy title match
    • Dissertation records can merged based only on a URI (direct link) match
    Match & Merge filters

    Filters are essentially the inverse of rules: they dictate under which circumstances records cannot be merged.  Note that filters supersede rules in cases where both can potentially be applied.

    Mismatched metadata: if any of the following data points are mismatched, the records are prevented from merging.

    • DOI
    • PMID
    • URI (if records come from the same set of content, from the same content provider)
    • Language (can be affected by language specifications in the record's source metadata, or the language of the record's metadata as detected by our system)

    Excluded Content Types:

    • Archival Material
    • Image
    • Microform
    • Music Recording
    • Patent
    • Report
    • Technical Report
    • Standard
    • Video Recording

    Other filters:

    • Records identified as being from an Institutional Repository will never be merged
    • Records having the content type Newspaper Article and a date of publication prior to January 1, 2000, will never be merged
    • Record exclusion flag: we can flag specific records to be excluded from Match & Merge; this is typically done at the express request of the content provider or uploading client
      • Note: the following content providers have asked that this record exclusion flag be applied to all their records:
        • Artstor

        • CABI Direct content

        • CAIRN International Journals

    • "Bad Titles": we maintain an internal list of especially short and generic titles that we don't want to merge at all because of the high probability of false positives

    • Overmatch: any record having a title that occurs on more than 4,000 records in the index will not be merge

    Transitive Merge

    A "transitive merge" is a scenario where three or more records are merged, where at least two of the records would not be able to merge on their own.  For example, in the diagram below, Record A can merge with Record B, and Record B can merge with Record C, but Record A would not ordinarily be able to merge with Record C.  But because of the commonalities that both Records A and C share with Record B, all three records can be merged into the same logical record.

    CDI_TransitiveMerge.png

    Transitive Merge of Records
    Match & Merge FAQ
    1. Do my local records participate in match and merge?

      Local records in the local Primo indexes will not match and merge with records from CDI.  Records uploaded to CDI by clients are eligible for merging, however.

    2. Does match and merge give preference to certain providers?

      No. Match and merge is vendor-neutral.

    3. Will identical values from a specific field coming from the various physical records comprising the logical records be merged (deduped) for display, search, and so forth?

      Yes. For example, identical subject headings will be deduped so that you see only one instance of the heading, not multiple instances.

    4. Can records with different Content Types merge?

      Yes, subject to some restrictions: records that meet the criteria of at least one of the rules, and are not blocked by any of the filters, can merge.  Some rules apply only to certain content types, other rules apply to all content types except for special exclusions; please see the Match & Merge: rules and Match & Merge: filters sections above for more specifics.

    5. How can I submit suggestions for improvement to the match and merge process?

      Please submit your ideas to the Idea Exchange.

    • Was this article helpful?