Reports and Notifications for the Smart Harvesting Framework
Overview
This page describes how to work with reports in Smart Harvesting. You can see a video showing how to approve and review Smart Harvesting author matches here. In addition to the author matching report, there is also a letter for new assets added to the researcher profile - see the New Research Outputs Added to Profile Letter for more details.
For working with researchers see here. For working with Smart Harvesting see here. For working with Smart Expansion see here.
Author Matching Report
This report lists author-researcher matches for assets captured by Smart Harvesting as well as those imported via an Import Profile (migration) or via SWORD. The report can be used to automatically approve assets via the import option. Any rejections must be managed manually in the Administrative interface - see Author Matching Approval Task List for more information. For information on SWORD deposits see here.
The report is accessed via Researchers > Researchers > Author Matching Report/Update Approved Matches.
There are two operations:
- Export report – export to Excel a list of author matches.
- Update approvals - use the updated report to automatically approve matches.
These options are further described in the following sections. In addition to the author matching report, there is also a letter for new assets added to the researcher profile - see Letter List for more details.
Export Reports
The report has the following parameters:
- Researcher population – the report can include all researchers or only researchers included in a set of researchers.
- Include non-affiliated researcher flag. The default is to exclude non-affiliated.
- Input method - Smart Harvesting , Import Profile, SWORD, Smart Harvest via Citation Lists.
- Export Format:
- Full – includes all fields
- Researcher – a format intended for a researcher in which most of the researcher info (which the researcher already knows) is not included.
- Asset import dates – from/to
- Email report to – add the email to which the report should be sent. The system defaults to the email of the operator.
After selecting the parameters, invoke the Run now option. A job ID will display. The job will be displayed in Monitor jobs. The report will be emailed to the email defined.
The format of the report is described in the following section.
Report Format
The report is sorted by the researcher's internal ID and has the following columns:
- Control info – used by the update approvals operation. All of these fields must be present for the update:
- Researcher User ID – this is the internal researcher ID.
- Author type (CREATED/CONTRIBUTED) – this indicates if the author is a creator or contributor. This information is important for the update process.
- Asset ID – this is the asset ID.
- Researcher info – most of these columns are not included in the "Researcher" format:
- Researcher name
- Researcher Type (affiliated/non-affiliated)
- Researcher email
- Researcher Primary ID
- Researcher affiliation/s
- Researcher keywords - research topics and keywords from the researcher record.
- Researcher Area of Interest – from the researcher record.
- Status – this is the match status.
Approved matches are not included in the report.
- Asset info
- Author name – the name of the author the research has been matched to.
- Asset Type
- Title
- Publication details
- Publication date
- DOI
- PMID
- Research topics/keywords
- Abstract
- Approval column – must be present for the update.
- Add an X to approve a match. Any other text in this column will not be considered as an approval.
Update Approvals
The input format requires at least the four mandatory fields:
- Researcher ID
- Author Type
- Asset ID
- Approval column
See below for an example.
Checking and Bulk Merging for Duplicate Researchers
Duplicate researchers can be created by the various ingest flows. This can happen due to problematic data and incorrect matching by the Author Matching algorithm. There are two scenarios:
- A new non-affiliated researcher is created instead of correctly matching to an affiliated researcher
- A new non-affiliated researcher is created instead of correctly matching to an existing non-affiliated researcher
It is recommended to run the Potentially Duplicate Researcher Report on an ongoing basis to check if the potential researchers are indeed duplicate, and then merge them. For general information on managing researchers see Working with Researchers.
For video showing this functionality see How to Bulk Merge Duplicate Researchers.
Running the Potentially Duplicate Researcher Report
Options for Running the Report
There are 2 ways to run the report:
- Use the Move all assets and grants option (available only from non-affiliated researchers) to move all assets and other entities from the non-affiliated researcher to an affiliated researcher or another non-affiliated researcher, and then delete the non-affiliated researcher. See Managing a Researchers Assets for instructions on how to use this option.
- Use the option to Bulk Merge Duplicate Researchers in the Duplicate Researchers/Bulk Merge Duplicate Researchers feature. This performs the same actions as the Move all assets and grants option but in bulk mode.
What the Report Checks
The report checks for duplicates among affiliated and non-affiliated researchers using:
- Identifier
- Normalized names (case insensitive, punctuation ignored, middle names ignored unless different, wildcards used for initials, and diacritics compared to ASCII characters)
Running the Report
- Navigate to Research > Researchers > Duplicate Researcher Report.
-
Optionally enter the job name.
-
Select the type of researcher to check
-
Check affiliated researchers – check affiliated against non-affiliated
-
Check non-affiliated – check non-affiliated against non-affiliated
-
Check both
It is recommended to first check affiliated researchers, merge as needed and only then check non-affiliated researchers.
-
Optionally filter the list of researchers by affiliation (relevant to affiliated researchers).
-
Select whether to check researchers added from the last time this job was run, from a specific date, or for all researchers. If you select to run from a specific date, enter the date.
-
Select Submit to run the job. You are redirected to the Running tab on the Monitor Jobs page. You can select Refresh to view the latest status of the job.
-
When the job has completed, select the History tab on the Monitor Jobs page.
-
Open the report by selecting the report Name, or by selecting Report from the actions menu. The report shows the number of duplicates found. If this number is more than one, select Click to download report to list the potential duplicates in a CSV file.
Report Structure
The report contains the following columns of information:
- Group Number
- Researcher name
- Affiliation type (affiliated/non-affiliated/previously affiliated)
- Academic units
- Primary ID
- Number of assets
- Subjects from the Researcher record including Research topics and Keywords
- Asset information from one of the assets belonging to the researcher, research topic and keywords or title
- Internal Researcher ID
- ORCID
- Other Identifiers
- Merge Status – this is required for the bulk Merge option (it is explained in the following section).
Performing a Bulk Merge for Duplicate Researchers
When to Perform a Bulk Merge
The report can be used to check groups of researchers for duplicates. If you decide that any groups do include duplicates, you can then use the Bulk Merge Duplicate Researcher job to merge them in bulk.
What Fields to Merge
There are three mandatory fields for the bulk merge:
- Group ID – do not change this column.
- Internal Researcher ID – do not change this column.
- Merge Status - there are three possible values:
- K or k – indicates “Keep” this researcher i.e., merge other researchers to this researcher. Note that a group can have only one "Keep" researcher.
- M or m – indicates “Merge” this researcher i.e., merge this researcher to the “Keep” researcher. Only a non-affiliated researcher can be merged.
- Blank – no value is added i.e., do not change this researcher.
Performing the Merge for Duplicates
-
Navigate to Research > Researchers > Duplicate Researcher Report.
-
Select the Bulk merge option.
-
Load the file.
-
Select Submit - the Monitor jobs page displays.
-
The job report will report on the groups that were successfully merged and invalid groups. Groups can be invalid if:
-
Any of the mandatory columns is missing.
-
The internal ID is invalid.
-
Merge status is invalid.
-
A group has more than one keep.
-
The same researcher is a Keep in one group and a Merge in another.
Report of Author Matching Approval Task List
You can export a report of the Author Matching Approval Task List. For details see List Actions in Author Matching Approval Task List.