Skip to main content
ExLibris

Knowledge Assistant

BETA
 
  • Subscribe by RSS
  • Back
    Esploro
    Ex Libris Knowledge Center
    1. Search site
      Go back to previous article
      1. Sign in
        • Sign in
        • Forgot password
    1. Home
    2. Esploro
    3. Product Documentation
    4. Esploro Online Help (English)
    5. Esploro Smart Harvesting Framework
    6. Working with Smart Harvesting

    Working with Smart Harvesting

    1. Last updated
    2. Save as PDF
    3. Share
      1. Share
      2. Tweet
      3. Share
    1. High Level Overview of the Flow when Working with Smart Harvesting
      1. Smart Harvesting Configuration
        1. General Details
        2. Asset Approval
      2. Author Matching Approval Configuration
        1. Select CDI Resource Types to Harvest
      3. Running Retrospective Smart Harvesting
      4. Creating a Set of Researchers
    2. Running Ongoing Smart Harvesting
    3. Monitoring Ongoing Smart Harvesting
    4. Monitoring the Running Jobs in the Monitor Captures Page
    5. Job Report
    6. Additional References

    This page describes how to work with Smart Harvesting and assumes that you have a basic understanding of what Smart Harvesting is. For a general overview of Smart Harvesting, see here. For working with Smart Expansion, see here.

    We recommend running Smart Expansion first, before running Smart Harvesting.

    Watch the Smart Harvesting in Action video (5:42).

    High Level Overview of the Flow when Working with Smart Harvesting

    You use Smart Expansion when you have a list of assets known to belong to researchers, like a list of citations. When you run Smart Harvesting, Esploro brings in assets that are potential matches for a set of researchers that you select.

    A retrospective Smart Harvesting can be run for sets of up to 50 researchers.   There is also an option to run a retrospective Smart Harvesting on a single researcher.   This can be used if a researcher could not be run because too many (over 12,000) records were returned from CDI.   In a single researcher run the limit is 250,000 records.

    The system will first check how many records are returned for the researcher.  If there are more than 250,000 it will notify the operator that the job could not run.  In this case, try to shorten the period and/or add additional topics and other information to the researcher’s record.

     

    The report for this type of run is the same as the standard report. Note however, that because the job splits into years and even months each of which is a separate thread in the job, the success/failure is a report not of the number of researchers that succeeded/failed but the number of threads.

     

    If changing the year and/or adding more information does not help, try to run a Smart Expansion job for the bulk of the years the researcher has been publishing.

     

    There are separate profiles for Retrospective Smart Harvesting and Ongoing Smart Harvesting.  This is to enable you to have different configuration for a retrospective vs. an ongoing run.  The Smart Harvesting job itself is the same for both types.

    Smart Harvesting Configuration

    This section describes the configuration that is relevant to both retrospective and ongoing Smart Harvesting.  You may decide to have different configuration for the different types of Smart Harvesting.  The configuration can be changed at any point.   The configuration is separated into sections in the profiles.

    General Details

    The Smart Harvesting flow attempts to enrich records harvested from CDI using the OpenAlex API.  There are several parameters to control what is enhanced. By default, all the parameters are active.

    •  Enhance author affiliations via OpenAlex – add affiliations from OpenAlex. If selected there is the option to add all affiliations (up to 5) or only the first
    • Override existing affiliation with OpenAlex affiliation in the record – if selected the affiliation from OpenAlex will be preferred to any affiliation/s that may already be in the record  (e.g. the affiliation from CDI).
    • Add Open Access status and link from Unpaywall via OpenAlex – add the Open Access status and a link to an Open Access publication

    clipboard_e81db3a5ed0f883fad37931e4f9a93c2c.png

    Asset Approval

    In the Asset Approval section, you can define if the asset should be automatically approved. Select from one of the following options:

    • Never automatically approve the asset - The asset is never approved. Operators must approve before the assets are displayed in the public portal and profiles (like a manual deposit).
    • Always approve the asset when the first author is approved - This option is selected by default.
    • Conditionally approve the asset when the first author match is approved - The asset is approved only if the specified conditions are met. The following conditions are possible
      • If asset type is any of: Multiple asset types can be selected.
        • AND/OR
      • If asset has a DOI or PMID

    clipboard_e24556aeb4a331323bde1a04223b911a5.png

     

    Author Matching Approval Configuration

    In this section you can determine per rank (Matched on ID, Very Strong, etc.) if the author-researcher match (for each author) should be automatically approved or not. Keep in mind that as soon as one author is approved (manually or automatically), the asset may be approved (depending on what option was selected in the Asset Approval section). In the first runs you may prefer not to automatically approve regardless of rank. The options for each rank are:

    • Automatic – Automatically approve matches with this rank (unless Skip automatic approvals in Smart Harvesting is selected in the researcher's profile; see Working with Researchers).
    • Administrator – Add matches with this rank to the Author Matching Approval Task List for manual handling by an administrator (see Author Matching Approval Task List).
    • Selected Researchers – Automatically send requests for match approval to all researchers who have the Enable Automatic Request for Author Match Approval selected in their profiles (see Working with Researchers). Once the requests have been sent out, the flow is the same as when an administrator requests researcher approval via the Author Matching Task list (see Author Matching Approval Task List). Matches for researchers whose profiles do not have the Enable Automatic Request for Author Match Approval selected in their profiles default to requiring approval by an administrator.

    The Selected Researchers option is most useful for ongoing Smart Harvesting, when relatively few outputs are found for each researcher. Nonetheless, it can be used for retrospective Smart Harvesting as well, if desired.

    It is possible to define automatic approval but suppress it for one or more asset types. This is useful if you feel that specific asset types have problematic metadata and you want to review them before they are automatically added to the repository. For example, you can configure the Smart Harvesting profile to automatically approve "Very Strong" author matches for all asset types except for book chapters and conference proceedings. Select the asset types for which you want to suppress automatic approval under Suppress automatic approval for asset types.

    clipboard_e95520dadb5879f418c1a54c0925eb512.png

    Select CDI Resource Types to Harvest

    By default, Smart Harvesting requests the following specific CDI resource types – Journal Articles, Books, Book Chapters, Reviews, Conference Proceeding, Reports and Datasets.    It is possible to request that only specific types are harvested.

    clipboard_eb50b13551827d883527e0b277d87405e.png

    This may be useful in one of the following scenarios:

    • You prefer not to get a certain resource type
    • A new resource type has been added (e.g., the dataset type that was added recently) and you want to run a retrospective Smart Harvesting for specific researchers for the resource type that can now be harvested

    Running Retrospective Smart Harvesting

    • Select the Retrospective Smart Harvesting profile in in Repository > Smart Harvesting >  Manage Profiles
    • Make sure the status in the General Details is active.
    • Run Smart Harvesting either on a Set of Researchers (see Creating a Set of Researchers, below) or a Single Researcher by selecting the researcher from the list.  In the case of a Run For Single Researcher you can also add a year from which to start Smart Harvesting.  For a set of researchers, the Last Smart Harvesting date from the Researcher record will be used.

    Each researcher has a field called Last Smart Harvesting date. The value of this field determines the point in time from which assets will be brought in. For the first (retrospective) run, this field is empty. For ongoing runs, this date is updated automatically

    • If you want researchers to receive an email with a list of the assets that were harvested for them you can select the “Notify Researcher” option.  By default, this option is disabled because usually institutions do not want to start notifying researchers only once their profiles have already been populated and ongoing Smart Harvesting has commenced.

    clipboard_ec01e6a493a3f4be870231f476db58771.png

    • Select Run Now. The settings are saved, and the process is started.
    • To monitor the progress of the process, select Monitor Captures in the Manage Profiles page (Repository > Manage Profiles).
    • After the process has completed, assets that are pending approval display in the list at Repository > Author Matching Approval Task List (from the top right of the persistent menu). See Author Matching Approval Task List for more information.

    Tasks list including the assets for approval after running Smart Harvesting.

    The time the job takes to run depends on many different parameters and can take anywhere between a couple of minutes to over an hour for a single researcher.

    Creating a Set of Researchers

    First decide who to run Smart Harvesting for. This can be an opportunity to start engaging specific researchers with Esploro. For information on creating search queries and sets in Esploro see Managing Search Queries and Sets.

    Make sure that Include in Smart Harvesting is set to Yes for these researchers. This flag can be set via the user/researcher loader ("SIS loader") or in the Researcher Settings section of a researcher profile (Researchers > Manage Researchers > [select researcher] > Researcher Profile tab).

    Researcher settings dialog pane with Active Researcher Profile option set to yes.

    Since it is possible to manually or bulk update the Last Smart Harvesting Date, a check was added - if the date is more than 3 years in the past, Ongoing Smart Harvesting will not run for the researcher. The check is performed by year and not days. As an example, if we are currently in June 2022 and the date in the record is February 2019, Smart Harvesting will be run, but if the date in the record is December 2018, it will not be run. In this latter case, the job report will contain an event of type Skipped Researcher with the description Last Smart Harvesting date is too old.

    The Skip automatic approvals in Smart Harvesting flag was added in case there are specific researchers who are problematic in terms of Smart Harvesting – for example, if you have two researchers with the same name who work in the same department or research domain, the algorithm will have a very hard time distinguishing them. Therefore, if you want to automatically approve author-researcher matches in general, such as those that are a Very Strong match for most researchers but are problematic for specific ones, you can disable the option for the problematic researchers by turning this flag on for them.

    To create a set:

    1. Access the Admin > Manage Jobs and Sets > Manage Sets option.  Add a new set of type "Itemized". The Set content type should be Researchers.

    Manage Sets window with the "Set name" option as "Set of researchers for Smart Harvesting".

    1. Click on the option Add Members to Set. This will display the list of researchers.
    2. Search for and select the researchers you want to add by selecting the Add Members to Set option. There can be up to 50 researchers per set.
    3. Click Save.

    Running Ongoing Smart Harvesting

    After running the initial Smart Harvesting job for researchers, you can schedule jobs on an ongoing basis to harvest new assets for the researcher. The job runs daily, but updates each specific researcher once a week, based on the "Last Smart Harvesting Date" field. The job runs with the parameters defined in the Smart Harvesting profile.

    To activate the job:

    Select Configuration menu > General > Research Jobs Configuration.

    Scheduled Smart Harvesting Jobs window.

    To activate the ongoing job, select Active and then Save.

    To run the job immediately, select the Run Now button.

    See also Initial and Ongoing Smart Harvesting and Last Smart Harvesting Date.

    For information about setting up ongoing Smart Harvesting via a scheduled job, watch the How to Set Up Ongoing Smart Harvesting video (3:42).

    Monitoring Ongoing Smart Harvesting

    You can monitor the running of the job via Monitor Captures in Smart Harvesting Profiles (see Monitoring the Running Jobs in the Monitor Captures Page). The report and events are the same as for the ad hoc Smart Harvesting job.

    You can also receive an email upon completion of the job run. Pending author matching approval tasks will be listed in the task's widget.

    In order to receive an email notification for the job run status, add your email to the list that can be accessed via Admin > Manage Jobs and Sets > Monitor Jobs. Filter the scheduled Research jobs and run Email notifications from the Smart Harvesting ("New") actions. Adding an email requires the Administrator role.

    Run now option selected for Smart Harvesting (new) job.

    The email indicates how many "records" were processed. These are the number of researchers for whom ongoing Smart Harvesting was run. The email also includes job events – the same events as the ad hoc job. The events indicate the number of researchers for which the job ran and how many researchers with matching candidate assets were found. If this number is greater than 0 – new assets have been added and there may be pending author matching tasks.

    Monitoring the Running Jobs in the Monitor Captures Page

    This page displays a list of the Smart harvesting job runs and can be accessed from Repository > Smart Harvesting > Manage Profiles by selecting Monitor Captures. 
    There are six jobs that are run and are displayed on the Monitor Captures page. All jobs should be completed before access to the approval tasks list.

    Smart Harvesting Monitor Captures.

    Smart Harvesting Monitor Captures

    The Status column displays the status of the currently running job. When the job finishes and the next job starts, the status of the next job is displayed. 

    The operator that invoked the Smart Harvesting run should get an email with a job report (see Job Report), once all the jobs have finished running. Once the job has been completed the author matching approval tasks for the captured assets can be displayed in the Author Matching Approval Task List (see here for more information).

    Job Report

    To view the report on the Monitor Captures page, select View from the row actions menu, or select the job name.

    Smart Harvesting job report.

    The following information is available as "events":

    Job Report Events
    Event Description
    Smart harvesting ran for researcher(N) The number of researchers for whom Smart Harvesting ran. The event includes a list.
    Skipped researcher due to insufficient data(N) The number and a list of the researchers for whom Smart Harvesting could not run because the researcher record does not include sufficient information -  meaningful affiliations, research topics, area of interest or approved assets.
    Skipped researcher due to exceeding allowed candidate assets(N)  The number and list of researchers which were skipped by the job because more than 12,000 candidate assets were found in CDI. In this case try to use the Run for a single Researcher option. 
    Number of assets retrieved from CDI for researcher(N)  The number of candidate assets found in CDI for researchers.
    Matching assets found for researcher(0) The number of matching assets found per researcher.
    Automatically approved author-researcher matches(0) The number of automatically approved author-researcher matches.
    New non-affiliated researchers created(0)  The number of new non-affiliated researchers created by the job.
    Automatically approved assets added(0)  The number of assets that were automatically approved by the job. Assets are approved when one of the author matches is approved.
    Skipped assets - asset already in the repository(0) This is a new event. It counts and lists the number of assets which were not added because they have been found to be duplicates of assets that are already in the repository.
    Skipped assets - too many authors(0) The number and list of candidate assets that were skipped because there were too many authors (over 6000).
    Failed assets(0) The number and list of assets that failed for some reason. This is usually a system reason like a time-out.
    General Error(0)

    The number and list of general errors. For example:

    • Author Matching Engine is not alive – this means that the author matching algorithm is not available. 
    • An error occurred while getting candidate assets for researcher – for some reason CDI did not send any assets. 

    Additional References

    • Video: Smart Harvesting in Action
    • How to Approve and Review Smart Harvesting Author Matches
    • Video: How to Setup Ongoing Smart Harvesting Using a Scheduled Job
    • Esploro Smart Harvesting Framework
    • Esploro Smart Expansion
    • Working with the Esploro Research Hub
    • Quick Guide for Administrators in Esploro
    View article in the Exlibris Knowledge Center
    1. Back to top
      • General Overview of Smart Harvesting Framework
      • Working with Smart Expansion
    • Was this article helpful?

    Recommended articles

    1. Article type
      Topic
      Content Type
      Documentation
      Language
      English
      Product
      Esploro
    2. Tags
      This page has no tags.
    1. © Copyright 2026 Ex Libris Knowledge Center
    2. Powered by CXone Expert ®
    • Term of Use
    • Privacy Policy
    • Contact Us
    2025 Ex Libris. All rights reserved