Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Working with Smart Harvesting

    This page describes how to work with Smart Harvesting and assumes that you have a basic understanding of what Smart Harvesting is. For a general overview of Smart Harvesting see here. For a video on Smart Harvesting in action see here. For working with Smart Expansion see here.

    High Level Overview of the Flow when Working with Smart Harvesting

    You use Smart Expansion when you have a list of assets known to belong to a researcher/s, e.g., as a list of citations. When you use Smart Harvesting, Esploro brings in assets that are potential matches for a set of researchers that you selected.

    The first time you run Smart Harvesting/Expansion, the process is called a retrospective run. After that, Smart Harvesting runs on an ongoing basis to bring in new assets for all the researchers that were used in the retrospective run.

    1. Select the Central Discovery Index job in Repository > Manage Profiles.
    2. Select a set of researchers (see Creating a Set of Researchers).
    3. Leave the General Details section as is.
    4. For the Author Matching Approval Configuration section, see Running the Smart Harvesting Job on a Set of Researchers.
    5. Each researcher has a field called Last Smart Harvesting date. The value of this field determines the point in time from which assets will be brought in. For the first (retrospective) run, this field is empty. For ongoing runs this is updated automatically.
    6. Select Run Now. The settings are saved and the process is started. 
    7. To monitor the progress of the process, select Monitor Captures in the Manage Profiles page (Repository > Manage Profiles).
    8. After the process has completed, assets that are pending approval display in the list at Repository > Author Matching Approval Task List (from the top right of the persistent menu). See Author Matching Approval Task List for more information.
      Tasks list including the assets for approval after running Smart Harvesting.

    See here for the list of asset types that are supported by Smart Harvesting.

    Creating a Set of Researchers

    First decide who to run Smart Harvesting for. This can be an opportunity to start engaging specific researchers with Esploro. For information on creating search queries and sets in Esploro see Managing Search Queries and Sets.

    Make sure that these researchers are flagged to be included in Smart Harvesting. This flag can be set via the user/researcher loader ("SIS loader") or in the Researcher Settings section of a researcher (Researchers > Manage Researchers).

    Researcher settings dialog pane with Active Researcher Profile option set to yes.

    Since it is possible to manually or bulk update the Last Smart Harvesting Date, a check was added - if the date is more than 3 years in the past, Ongoing Smart Harvesting will not run for the researcher. The check is performed by year and not days. As an example if we are currently in June 2022 and the date in the record is February 2019, Smart Harvesting will be run. In this case the job report will contain an event of type Skipped Researcher with the description Last Smart Harvesting date is too old.

    The "Skip automatic approvals in Smart Harvesting" flag was added if there are specific researchers who may be problematic in terms of Smart Harvesting – for example, if you have two researchers with the same name and working in the same department/research domain, the algorithm will have a very hard time distinguishing them. Therefore, if you want to automatically approve author-researcher matches in general, such as those that are a Very Strong match for most researchers but are problematic for specific ones, you can disable the option for them.

    To create a set:

    1. Access the Admin > Manage Jobs and Sets > Manage Sets option.  Add a new set of type "Itemized". The Set content type should be Researchers.

    Manage Sets window with the "Set name" option as "Set of researchers for Smart Harvesting".

    1. Click on the option Add Members to Set. This will display the list of researchers.
    2. Search for and select the researchers you want to add by selecting the Add Members to Set option. There can be up to 50 researchers per set.
    3. Click Save.

    Running the Smart Harvesting Job on a Set of Researchers

    1. Access the Smart Harvesting profile: Repository > Smart Harvesting > Manage Profiles. Enable and edit the Central Discovery Index profile.
    2. Edit the profile via the row actions button and enable the profile by setting it to Active.
    3. Leave the information in the General Details section as is - do not edit this section.
    4. In the Asset Approval section, select the relevant option:
    • Never automatically approve the asset - The asset is never approved. Operators must approve before the asset will display in the public portal and profiles (similar to a manual deposit).
    • Always approve the asset when first author is approved - This option is selected by default.
    • Conditionally approve the asset when the first author match is approved - The asset is approved only if the specified conditions are met. The following conditions are possible:
       
      • If asset type is any of: Multiple asset types can be selected.
        AND/OR
      • If asset has a DOI or PMID

    Author matching tasks are created for all asset authors. Assets that are pending approval can be approved in the task list at Repository > Author Matching Approval Task List and in the Asset Approval page at Repository > Smart Harvesting > Asset Approval (see here for more information).In addition, these assets can be accessed via the Smart Expansion via CSV  – Asset approval task in the Tasks Widget (see Managing Widgets).

    1. Enter the information for the Author Matching Approval Configuration section. In this section you can determine per rank if the author-researcher match (for each author) should be approved or not.  Keep in mind that as soon as one author is approved (manually or automatically) the asset is approved (depending on what option was selected in the Asset Approval section). In the first runs you may prefer to not automatically approve regardless of rank.

    Author matching approval configuration.

    It is possible to define automatic approval but suppress for it for one or more asset types.  This is useful if you feel that specific asset types have problematic metadata and you want to review them before they are automatically added to the repository.  For example, – you can configure the Smart Harvesting profile to automatically approve 'Very Strong' author matches for all asset types except for books and book chapters.

     

    Suppress automatic approval for asset types.

    1.  In order to run the job, add the set to the Run section.
      Run smart harvesting with "Select set of researchers" set to "SH set".
    2. Select the set you created and click Run Now.
    3. Select the Notify Researchers checkbox in order to notify researchers about new assets that were added to their profile. Note that this option is only enabled when the NEW ASSETS ADDED TO RESEARCHER PROFILE NOTIFICATION job is active. See Letter for New Assets Added.

    See also Initial and Ongoing Smart Harvesting and Last Smart Harvesting Date.

    For information on manually approving imported assets or referring them to the researcher for approval, see Author Matching Approval Task List.

    The time the job takes to run depends on many different parameters and can take anywhere between a couple of minutes to over an hour for a single researcher.

    Running Ongoing Smart Harvesting

    After running the initial Smart Harvesting job, you can schedule jobs on an ongoing basis to harvest new assets for the researcher. The job runs daily, but updates each specific researcher once a week, based on the "Last Smart Harvesting Date" field.

    To activate the job:

    Select Configuration menu > General > Research Jobs Configuration.

    Scheduled Smart Harvesting Jobs window.

    To activate the ongoing job select Active and then Save.

     To run the job immediately click the Run Now button.

    See also Initial and Ongoing Smart Harvesting and Last Smart Harvesting Date.

    For a video showing how to setup ongoing Smart Harvesting via a scheduled job see here.

    Monitoring Ongoing Smart Harvesting

    You can monitor the running of the job via Monitor Captures in Smart Harvesting Profiles (see Monitor Captures Page). The report and events are the same as for the ad hoc Smart Harvesting job.

    You can also receive an email upon completion of the job run. Pending author matching approval tasks will be listed in the tasks widget.

    In order to receive an email notification for the job run status, add your email to the list that can be accessed via Admin > Manage Jobs and Sets > Monitor Jobs. Filter the scheduled Research jobs and run Email notifications from the Smart Harvesting ("New") actions. Adding an email requires the Administrator role.

    Run now option selected for Smart Harvesting (new) job.

    The email indicates how many "records" were processed. These are the number of researchers for whom ongoing Smart Harvesting was run. The email also includes job events – the same events as the ad hoc job. The events indicate the number of researchers for which the job ran and for how many researchers with matching candidate assets were found. If this number is greater than 0 – new assets have been added and there may be pending author matching tasks.

    Monitor Captures Page

    This page displays a list of the Smart harvesting job runs, and can be accessed from Repository > Smart Harvesting > Manage Profiles. There are in fact two jobs that run. One job runs on the affiliated author ("Smart Harvesting") and then a separate job checks any additional authors in the record and attempts to match them also to affiliated or non-affiliated researchers (Smart Harvesting Co Authors). Both jobs are displayed in the monitoring page. Both should be completed before access to the approval tasks list.

    The list has the following columns:

    • Job ID
    • Job Name
    • Status
    • User – the user who ran the job
    • Time started
    • Time Ended
    • Number finished – the number of researchers for who the Smart Harvesting job ran successfully.
    • Number Failed – the number of researchers for who the Smart Harvesting job failed.

    The operator that invoked the Smart Harvesting run should get an email – one per job – when the job has been completed. Once the job has been completed the author matching approval tasks for the captured assets can be displayed in the Author Matching Approval Task List (see here for more information).

    Job Report

    To view the report, click View from the job run actions.

    Smart Harvesting job report.

    The following information is available as "events":

    Job Report Events
    Event Description
    Smart harvesting ran for researcher(N) The number of researchers for whom Smart Harvesting ran. The event includes a list.
    Skipped researcher due to insufficient data(N) The number and a list of the researchers for whom Smart Harvesting could not run because the researcher record does not include sufficient information -  meaningful affiliations, research topics, area of interest or approved assets.
    Skipped researcher due to exceeding allowed candidate assets(N)  The number and list of researchers which were skipped by the job because more than 12,000 candidate assets were found in CDI.
    Number of assets retrieved from CDI for researcher(N)  The number of candidate assets found in CDI for researchers.
    Matching assets found for researcher(0) The number of matching assets found per researcher.
    Automatically approved author-researcher matches(0) The number of automatically approved author-researcher matches.
    New non-affiliated researchers created(0)  The number of new non-affiliated researchers created by the job.
    Automatically approved assets added(0)  The number of assets that were automatically approved by the job. Assets are approved when one of the author matches is approved.
    Skipped assets - asset already in the repository(0) This is a new event. It counts and lists the number of assets which were not added because they have been found to be duplicates of assets that are already in the repository.
    Skipped assets - too many authors(0) The number and list of candidate assets that were skipped because there were too many authors (over 6000).
    Failed assets(0) The number and list of assets that failed for some reason. This is usually a system reason like a time-out.
    General Error(0)

    The number and list of general errors. For example:

    • Author Matching Engine is not alive – this means that the author matching algorithm is not available. 
    • An error occurred while getting candidate assets for researcher – for some reason CDI did not send any assets. 
    • Was this article helpful?