This page describes how to work with Smart Harvesting and assumes that you have a basic understanding of what Smart Harvesting is. For a general overview of Smart Harvesting see here. For a video on Smart Harvesting in action see here. For working with Smart Expansion see here.
High Level Overview of the Flow when Working with Smart Harvesting
You use Smart Expansion when you have a list of assets known to belong to a researcher/s, e.g., as a list of citations. When you use Smart Harvesting, Esploro brings in assets that are potential matches for a set of researchers that you selected.
The first time you run Smart Harvesting/Expansion, the process is called a retrospective run. After that, Smart Harvesting runs on an ongoing basis to bring in new assets for all the researchers that were used in the retrospective run.
- Select the Central Discovery Index job in Repository > Manage Profiles.
- Select a set of researchers (see Creating a Set of Researchers).
- Leave the General Details section as is.
- For the Author Matching Approval Configuration section, see Running the Smart Harvesting Job on a Set of Researchers.
- Each researcher has a field called Last Smart Harvesting date. The value of this field determines the point in time from which assets will be brought in. For the first (retrospective) run, this field is empty. For ongoing runs this is updated automatically.
- Select Run Now. The settings are saved and the process is started.
- To monitor the progress of the process, select Monitor Captures in the Manage Profiles page (Repository > Manage Profiles).
- After the process has completed, assets that are pending approval display in the list at Repository > Author Matching Approval Task List (from the top right of the persistent menu). See Author Matching Approval Task List for more information.
See here for the list of asset types that are supported by Smart Harvesting.
Creating a Set of Researchers
First decide who to run Smart Harvesting for. This can be an opportunity to start engaging specific researchers with Esploro. For information on creating search queries and sets in Esploro see Managing Search Queries and Sets.
Make sure that these researchers are flagged to be included in Smart Harvesting. This flag can be set via the user/researcher loader ("SIS loader") or in the Researcher Settings section of a researcher (Researchers > Manage Researchers).
Since it is possible to manually or bulk update the Last Smart Harvesting Date, a check was added - if the date is more than 3 years in the past, Ongoing Smart Harvesting will not run for the researcher. The check is performed by year and not days. As an example if we are currently in June 2022 and the date in the record is February 2019, Smart Harvesting will be run. In this case the job report will contain an event of type Skipped Researcher with the description Last Smart Harvesting date is too old.
The "Skip automatic approvals in Smart Harvesting" flag was added if there are specific researchers who may be problematic in terms of Smart Harvesting – for example, if you have two researchers with the same name and working in the same department/research domain, the algorithm will have a very hard time distinguishing them. Therefore, if you want to automatically approve author-researcher matches in general, such as those that are a Very Strong match for most researchers but are problematic for specific ones, you can disable the option for them.
To create a set:
- Access the Admin > Manage Jobs and Sets > Manage Sets option. Add a new set of type "Itemized". The Set content type should be Researchers.
- Click on the option Add Members to Set. This will display the list of researchers.
- Search for and select the researchers you want to add by selecting the Add Members to Set option. There can be up to 50 researchers per set.
- Click Save.
Running the Smart Harvesting Job on a Set of Researchers
- Access the Smart Harvesting profile: Repository > Smart Harvesting > Manage Profiles. Enable and edit the Central Discovery Index profile.
- Edit the profile via the row actions button and enable the profile by setting it to Active.
- Leave the information in the General Details section as is - do not edit this section.
- In the Asset Approval section, select the relevant option:
- Never automatically approve the asset - The asset is never approved. Operators must approve before the asset will display in the public portal and profiles (similar to a manual deposit).
- Always approve the asset when first author is approved - This option is selected by default.
- Conditionally approve the asset when the first author match is approved - The asset is approved only if the specified conditions are met. The following conditions are possible:
- If asset type is any of: Multiple asset types can be selected.
- If asset has a DOI or PMID
- If asset type is any of: Multiple asset types can be selected.
Author matching tasks are created for all asset authors. Assets that are pending approval can be approved in the task list at Repository > Author Matching Approval Task List and in the Asset Approval page at Repository > Smart Harvesting > Asset Approval (see here for more information).In addition, these assets can be accessed via the Smart Expansion via CSV – Asset approval task in the Tasks Widget (see Managing Widgets).
- Enter the information for the Author Matching Approval Configuration section. In this section you can determine per rank if the author-researcher match (for each author) should be approved or not. Keep in mind that as soon as one author is approved (manually or automatically) the asset is approved (depending on what option was selected in the Asset Approval section). In the first runs you may prefer to not automatically approve regardless of rank.
It is possible to define automatic approval but suppress for it for one or more asset types. This is useful if you feel that specific asset types have problematic metadata and you want to review them before they are automatically added to the repository. For example, – you can configure the Smart Harvesting profile to automatically approve 'Very Strong' author matches for all asset types except for books and book chapters.
- In order to run the job, add the set to the Run section.
- Select the set you created and click Run Now.
- Select the Notify Researchers checkbox in order to notify researchers about new assets that were added to their profile. Note that this option is only enabled when the NEW ASSETS ADDED TO RESEARCHER PROFILE NOTIFICATION job is active. See Letter for New Assets Added.
For information on manually approving imported assets or referring them to the researcher for approval, see Author Matching Approval Task List.
Running Ongoing Smart Harvesting
After running the initial Smart Harvesting job, you can schedule jobs on an ongoing basis to harvest new assets for the researcher. The job runs daily, but updates each specific researcher once a week, based on the "Last Smart Harvesting Date" field.
Select Configuration menu > General > Research Jobs Configuration.
To activate the ongoing job select Active and then Save.
To run the job immediately click the Run Now button.
For a video showing how to setup ongoing Smart Harvesting via a scheduled job see here.
Monitoring Ongoing Smart Harvesting
You can monitor the running of the job via Monitor Captures in Smart Harvesting Profiles (see Monitor Captures Page). The report and events are the same as for the ad hoc Smart Harvesting job.
You can also receive an email upon completion of the job run. Pending author matching approval tasks will be listed in the tasks widget.
In order to receive an email notification for the job run status, add your email to the list that can be accessed via Admin > Manage Jobs and Sets > Monitor Jobs. Filter the scheduled Research jobs and run Email notifications from the Smart Harvesting ("New") actions. Adding an email requires the Administrator role.
The email indicates how many "records" were processed. These are the number of researchers for whom ongoing Smart Harvesting was run. The email also includes job events – the same events as the ad hoc job. The events indicate the number of researchers for which the job ran and for how many researchers with matching candidate assets were found. If this number is greater than 0 – new assets have been added and there may be pending author matching tasks.
Monitor Captures Page
This page displays a list of the Smart harvesting job runs, and can be accessed from Repository > Smart Harvesting > Manage Profiles. There are in fact two jobs that run. One job runs on the affiliated author ("Smart Harvesting") and then a separate job checks any additional authors in the record and attempts to match them also to affiliated or non-affiliated researchers (Smart Harvesting Co Authors). Both jobs are displayed in the monitoring page. Both should be completed before access to the approval tasks list.
The list has the following columns:
- Job ID
- Job Name
- User – the user who ran the job
- Time started
- Time Ended
- Number finished – the number of researchers for who the Smart Harvesting job ran successfully.
- Number Failed – the number of researchers for who the Smart Harvesting job failed.
The operator that invoked the Smart Harvesting run should get an email – one per job – when the job has been completed. Once the job has been completed the author matching approval tasks for the captured assets can be displayed in the Author Matching Approval Task List (see here for more information).
To view the report, click View from the job run actions.
The following information is available as "events":