Skip to main content
ExLibris

Knowledge Assistant

BETA
 
  • Subscribe by RSS
  • Back
    Esploro

     

    Ex Libris Knowledge Center
    1. Search site
      Go back to previous article
      1. Sign in
        • Sign in
        • Forgot password
    1. Home
    2. Esploro
    3. Product Documentation
    4. Esploro Online Help (English)
    5. Esploro Smart Harvesting Framework
    6. Author Matching Approval Task List
    7. Author Matching Algorithm

    Author Matching Algorithm

    1. Last updated
    2. Save as PDF
    3. Share
      1. Share
      2. Tweet
      3. Share
    1. Data
    2. Features
      1. Name Features
      2. Embedded Text Features
      3. Semantic Features
      4. Additional Features
    3. Ranking
    4. Additional References

    Esploro has developed a sophisticated algorithm using machine-learning methodologies to match authors. The algorithm was developed and is being continuously improved by a dedicated team of data scientists.

    All types of Smart Harvesting and Smart Expansion make use of the Author Matching (AM) algorithm. The process begins by trying with identifiers or emails, but these are often not available in the data and the AM algorithm is therefore used.

    The following sections will describe the data used by the algorithm and some of the key features it uses. It also explains the ranking in use.

    Data

    The algorithm uses the following data from the Researcher record:

    • Name and name variants
    • Affiliations
    • Research topics
    • Area of expertise
    • Biographical info, e.g. education, honors
    • Metadata from the most recent assets that have already been associated with the researcher:
      • Title
      • Authors (to find co-authors)
      • Subjects
      • Abstract
      • Year
      • Journal title

    The algorithm uses the following data from the candidate assets:

    • Author names
    • Title
    • Subjects
    • Abstract
    • Year
    • Journal title

    Features

    The algorithm uses multiple "features". A feature is an individual measurable property or characteristic of the data. This section outlines the main features in use.

    Name Features

    The algorithm uses several features that match names considering name similarity, name variants, and name frequencies.

    Embedded Text Features

    This group of features extracts concepts and subject entities from the various texts available both on the researcher and asset metadata elements to help us determine how close the author of the asset is to the researcher in terms of subject area. This is done by grouping words semantically and using and creating text vectors.

    Semantic Features

    This group of features that uses state-of the-art Natural Language Processing algorithms in order to determine how close the textual data of the researcher is to that of the candidate asset. We train a Neural Network based on over 100 million abstracts from our Central Discovery Index to create "word embedding". These are vector representations of all words appearing in this corpus in a manner that groups together similar words. These vector representations, in turn, allow us to calculate a "distance" between words and texts in general.

    Additional Features

    The algorithm also makes use of the following features:

    • Matching affiliations of author and researcher – this matching is based on the affiliation Esploro has for the researcher and any affiliation for the author in the record. In addition to specific affiliations, the countries of affiliations are taken into account.
    • Co-author network – researchers tend to collaborate with each other. Esploro has created a network of co-authors for the assets that have already been associated with the researcher.
    • Date matching – the algorithm matches the date of the asset with known dates for the researcher.

    Ranking

    The algorithm gets as input researcher and asset data and runs the data via the features. At the end of this process, the researcher and the specific author the researcher was matched with in the asset is assigned a rank indicating the level of confidence in the match:

    • Matched on ID
    • Very strong match
    • Strong Match
    • Uncertain Match
    • No match

    The rank is used to determine the Smart Harvesting approval workflows.

    Additional References

    • Smart Expansion - load assets known to belong to your researcher (run before Smart Harvesting)
    • Working with Smart Harvesting
    View article in the Exlibris Knowledge Center
    1. Back to top
      • Author Matching Approval Task List
      • Configuration Options for Record Importing
    • Was this article helpful?

    Recommended articles

    1. Article type
      Topic
      Content Type
      Documentation
      Language
      English
      Product
      Esploro
    2. Tags
      This page has no tags.
    1. © Copyright 2025 Ex Libris Knowledge Center
    2. Powered by CXone Expert ®
    • Term of Use
    • Privacy Policy
    • Contact Us
    2025 Ex Libris. All rights reserved