Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Smart Harvesting Author Matching and Approval Flows

    This page describes the various approval flows in Smart Harvesting. For a video on how to approve and review Smart Harvesting author matches, see here. For working with Smart Harvesting see here.

    Approval Flows

    Author Matching Approval 

    Author matching tasks are created for all asset authors and can be approved in the list from Repository > Author Matching Approval Tasks List or the Task List (see here). The active researcher is automatically approved. If the researcher could not be found as an author, he or she is added with the status "Added by System". See also Author Matching Approval Task List .

    Asset Approval

    Assets are approved when the first author-researcher match is approved.

    Author Matching Algorithm

    Esploro has developed a sophisticated algorithm using Machine Learning methodologies to match authors. The algorithm was developed and is being continuously improved by a dedicated team of Data Scientists.

    All types of Smart Harvesting and Smart Expansion make use of the Author Matching (AM) algorithm. The process begins by trying with identifiers or emails, but these are often not available in the data and the AM algorithm is therefore used.

    The following sections will describe the data used by the algorithm and some of the key features it uses. It also explains the ranking in use.

    Data

    The algorithm uses the following data from the Researcher record:

    • Name and name variants
    • Affiliations
    • Research topics
    • Area of expertise
    • Biographical info, e.g. education, honors
    • Metadata from the most recent assets that have already been associated with the researcher:
      • Title
      • Authors (to find co-authors)
      • Subjects
      • Abstract
      • Year
      • Journal title

    The algorithm uses the following data from the candidate assets:

    • Author names
    • Title
    • Subjects
    • Abstract
    • Year
    • Journal title

    Features

    The algorithm uses multiple "features". A feature is an individual measurable property or characteristic of the data. This section outlines the main features in use.

    Name Features

    The algorithm uses several features that match names considering name similarity, name variants, and name frequencies.

    Embedded Text Features

    This group of features extracts concepts and subject entities from the various texts available both on the researcher and asset metadata elements to help us determine how close the author of the asset is to the researcher in terms of subject area. This is done by grouping words semantically and using and creating text vectors.

    Semantic Features

    This group of features that uses state-of the-art Natural Language Processing algorithms in order to determine how close the textual data of the researcher is to that of the candidate asset. We train a Neural Network based on over 100 million abstracts from our Central Discovery Index to create "word embedding". These are vector representations of all words appearing in this corpus in a manner that groups together similar words. These vector representations, in turn, allow us to calculate a "distance" between words and texts in general.

    Additional Features

    The algorithm also makes use of the following features:

    • Matching affiliations of author and researcher – this matching is based on the affiliation Esploro has for the researcher and any affiliation for the author in the record.
    • Co-author network – researchers tend to collaborate with each other. Esploro has created a network of co-authors for the assets that have already been associated with the researcher.
    • Date matching – the algorithm matches the date of the asset with known dates for the researcher.

    Ranking

    The algorithm gets as input researcher and asset data and runs the data via the features. At the end of this process, the researcher and the specific author the researcher was matched with in the asset is assigned a rank indicating the level of confidence in the match:

    • Matched on ID
    • Very strong match
    • Strong Match
    • Uncertain Match
    • No match

    The rank is used to determine the Smart Harvesting approval workflows.

    Author Matching Approval Task List

    You can configure whether you want the system to automatically approve an author-research match or you prefer to do it manually based on the rank. Esploro includes an "Author Matching Approval Task List" for the purpose of either approving/rejecting matching or reviewing automatically approved matches. You can access these tasks by navigating to Repository > Author Matching Approval Task List. 

    Pending Author Matching tasks are also displayed in the Task List in the Smart harvesting – Author Matching section. The tasks are separated by rank; Very strong, Strong, Uncertain.  Clicking on an entry accesses the Author Matching Approval task list filtered by status and rank.

    Author Matching Approval Statuses / Approval of Assets 

    The following statuses are possible:

    • Pending approval – this means that the match has not been approved.
    • Approved manually – this means that the match was manually approved.
    • Approved automatically – this means that the match was approved automatically by the system.  Whether or not the system automatically approves a match is configurable per rank.
    • Approved and reviewed – this means that the match was automatically approved and was later reviewed.   This was added to support the flow in which you may want to get assets in "quickly" and then decide to approve after they have been added.
    • Replaced – the match was incorrect and the original researcher was replaced.

    There is no separate process for approving assets – as soon as one match was approved, the asset will be added to the repository and will be displayed in the Research Portal. The asset is displayed in the Researcher Profiles only after the specific match for the researcher was approved.

    Layout of the Task List

    The layout of the task list is like the assets results list except that every entry in the list represents an approval task – not an asset.

    The list has the following elements:

    • Facets on the left
    • Sort options above the list
    • Search box above the list

    The general search box for searching any Esploro entity is at the very top as in most Esploro pages.

    • List actions – above the list on the right
    • Approval tasks

    Facets

    The following facets are displayed:

    • Task status

     

    • Task Rank
    • Asset Category
    • Asset Type
    • Researcher Academic Unit – the affiliation of the researcher to who matches were made.  All researcher affiliations are included.
    • Researcher name – the name of the researcher to who the matches were made.

    The system defaults to filter by the "Pending approval" status.  You should remove this facet selection if you want to see additional statuses.

     

    Status marked as pending approval.

    Sort

    The default sort is by match rank in descending order and within order by researcher.  The order can be switched to ascending order.

    Search

    The task list can be searched by the following:

    • Researcher – search by researcher name (including variants) and researcher identifiers.
    • Researcher name – search by researcher name (including variants).
    • Asset title – search by the title of the captured asset.
    • Asset publication title – search by the title of publication – i.e  the journal or book in which the asset appeared.
    • Asset identifiers – search by asset identifiers.
    • Smart Harvesting Job ID – search by job ID (the job ID can be found in the Monitor Captures list – it is the job ID of the second job).

    List Actions

    The following actions are available:

    • Request researcher approval – request researcher approval for selected matches. You can select Send email to researchers to notify them. The status is updated to Pending researcher approval. See here for information on researcher approval in the profile. Both claimed and rejected output display in the Tasks list for administrators (see here). 
    • Approve selected matches – approve all selected matches.  Multi-selection of tasks can be done using the check-box next to every task or the "Select All" above the list.
    • Reject selected assets and all matches – reject the matches (keep in mind that there can be multiple matches for a single asset) AND the asset itself.  Multi-selection of tasks can be done using the check-box next to every task or the "Select All" above the list.
    • Expand – expand the Asset Details drawer for all tasks.
    • Export task to excel
    • Customize list view

    Approval Tasks

    Every task represents a match between a researcher and one of the authors in the asset captured by Smart Harvesting.  The system tries to provide enough information to enable the operator to approve (or not) the match without needing to access the full asset form. Every task has the following elements:

    • First row with task includes a line number, check box and the title of the captured asset.
    • Three columns with additional information – from left to right
      • Researcher information including:
        • Researcher name
          • There may be an exclamation mark before the researcher name indicating that the researcher has been matched to the same asset more than once.  This can happen if the researcher is both an author and a contributor.  It can also happen due to an error.
            Message showing researcher has multiple matches on asset.
          • Researcher name variants
          • Researcher affiliations
      • Asset details including:
        • Asset  ID or Provisional asset ID (if the asset has not been approved yet).
        • Asset type
        • Author name – this is the name of the author the researcher was matched to as it appears in the asset.
        • Author affiliation – this is the author affiliation from the record (if available).
        • Publication details
        • Dates
        • DOI. The DOI is linkable.
      • Information related to the Smart Harvesting job and author matching.
        • Status of author approval task (see list above).
        • Rank of match (Very Strong, Strong, Uncertain).
        • Job ID (of the second job).
        • Job date
        • An alert which indicates if there are additional tasks for the same asset in the list. This can happen if the job found a match for several co-authors with affiliated authors. The alert links to the additional tasks.

          In this case you may prefer to edit the asset and check all author-researcher matches.
          Once the additional task has been approved, the message changes:
          Message showing that additional tasks have already been approved. 
    • A "drawer" with additional asset metadata – abstract and full list of authors.

     Approval Task Actions

    A task may have the following actions:

    Edit Asset – use this action to access the asset form with full editing options.

    • Approve – This action to approves the researcher-author match.
    • Replace Researcher – Use this action to replace the match.

    If you want to keep the asset itself it is not possible to reject just the author-researcher match – the researcher to which the author was matched must be replaced with another affiliated or non-affiliated researcher.

    • Delete asset and reject match – This action displays if the asset was matched with a single affiliated researcher. Use this action to reject the match; the entire asset will be deleted. This action should be used if the match is incorrect and the asset is not relevant to your institution.
    • Delete asset and reject all matches - This action displays if the asset was matched with multiple affiliated researchers. Use this match if you decide that the asset is not relevant to your institution or incorrectly matched with all researchers. If the asset is relevant you can use the "Replace" function.
    • (New for September) Change asset type - This action changes the asset type.
    • Confirm reviewed – This action is displayed if the match was automatically approved. Use this action to confirm the match.
    • Was this article helpful?