This page describes how to work with the following reports in Smart Harvesting:
- Author Matching report
- Potential Duplicate Researchers report
- Bulk Merge Duplicate Researchers report
For background information, see:
Author Matching Report
The Author Matching report lists author-researcher matches for assets captured by Smart Harvesting or Smart Expansion, as well as those imported via an import profile (migration) or via SWORD. The report can be used to automatically approve assets, in accordance with the following workflow:
- Export a list of potential author-researcher matches as an Excel file; see Exporting the Report, below.
- In the Excel file, mark all the matches that you want to approve, and then save the file; see Approving Matches, below.
- Import the marked file back into Esploro to implement the approvals; see Updating Approvals, below.
Any matches that are not marked for automatic approval in the imported file must be managed manually in the Administrative interface; see Author Matching Approval Task List for more information. For information about exporting a report of the Author Matching Approval Task List, see List Actions in Author Matching Approval Task List.
When new assets are approved for a researcher and added to their profile, the New Research Outputs Added to Profile Letter is sent to the researcher.
The Author Matching report can be managed at Researchers > Researchers > Author Matching Report/Update Approved Matches. Both the export and import operations are managed from this page.
Exporting the Report
- Under Operation, select Export report.
- Select the parameters, as follows:
- Researchers population – The report can include all researchers or only researchers included in a set of researchers.
- Include non-affiliated researcher – Select this checkbox to include non-affiliated researchers in the report. The default is to exclude non-affiliated researchers.
- Input method – The source from which the assets were imported into Esploro: Smart Harvesting, Smart Expansion via Alma , Import profile, Smart Expansion via CSV/Excel, Smart Harvest via Citation Lists, SWORD. (For information on the various types of Smart Expansion, see Working with Smart Expansion. For information on SWORD deposits, see SWORD Deposits in the Developer Network.)
- Export Format:
- Full – Includes all fields
- Researcher – A format intended for a researcher in which most of the researcher information (which the researcher already knows) is not included.
- From/To – Select the range of asset import dates to include in the report.
- Email report to – Add the email address to which the report should be sent. By default, the email of the operator is used.
- Select Run now. A job ID is displayed. The job can be monitored in the Monitor Jobs page (Admin > Manage Jobs and Sets > Monitor Jobs). When the job finishes running, the report is emailed to the email address specified. See the next section for information about the format of the report.
Author Matching Report Format
The Author Matching report lists all the potential matches between asset authors and researchers that were identified by Esploro and not yet approved. It is sorted by the researcher's internal ID, and has the following columns:
- Control info – Information required in order to perform the update approvals operation (see Approving Matches, below):
- Researcher User ID – The internal researcher ID
- Author type (CREATED/CONTRIBUTED) – Indicates if the author is a creator or contributor; this information is important for the update process
- Asset ID – this is the asset ID.
- Researcher info – most of these columns are not included in the "Researcher" format:
- Researcher name
- Researcher Type (affiliated/non-affiliated)
- Researcher email
- Researcher Primary ID
- Researcher affiliation/s
- Researcher keywords – Research topics and keywords from the researcher record
- Researcher Area of Interest – From the researcher record.
- Status – The status of the match: whether the researcher is affiliated or unaffiliated, and the strength of the match (Matched on ID, Very Strong match, Strong match, Uncertain match)
- Asset info
- Author name
- Asset Type
- Publication details
- Publication date
- Research topics/keywords
- Approved column – The column in which you can approve a match (initially blank); see Approving Matches, below
Once you have received the exported report, review it, and, in the Approved (add X if approved) column, insert an X (uppercase or lowercase) for each match you want to approve. (Any other text in this column is not treated as an approval.) When you are finished, save the file (as an Excel file).
In order for the import of the file to be performed successfully, the column titles must not be modified. In addition, the input file must contain at least the following four mandatory columns: Researcher User ID, Author Type, Asset ID, Approved. The other columns can be removed, if desired.
In order to apply the approvals that you marked in the Approved column of the Excel file, you must import the file back into Esploro.
- Under Operation, select Update Approvals.
- Under File, select the file.
- Select Run now. A job ID is displayed. The job can be monitored in the Monitor Jobs page (Admin > Manage Jobs and Sets > Monitor Jobs). An email containing information about the job and the number of records processed by it, is sent to the operator upon completion.
Checking and Bulk Merging for Duplicate Researchers
Duplicate researchers can be created by the various ingest flows. This can happen due to problematic data and incorrect matching by the Author Matching algorithm. There are two scenarios:
- A new non-affiliated researcher is created instead of correctly matching to an affiliated researcher
- A new non-affiliated researcher is created instead of correctly matching to an existing non-affiliated researcher
It is recommended to run the Potential Duplicate Researcher Report on an ongoing basis to check if the potential researchers are indeed duplicate, and then merge them. For general information on managing researchers see Working with Researchers.
For video showing this functionality see How to Bulk Merge Duplicate Researchers.
Running the Potential Duplicate Researcher Report
What the Report Checks
The report checks for duplicates among affiliated and non-affiliated researchers using:
- Normalized names (case insensitive, punctuation ignored, middle names ignored unless different, wildcards used for initials, and diacritics compared to ASCII characters)
Running the Report
Navigate to Research > Researchers > Potential Duplicate Researcher Report.
Potential Duplicate Researchers Report
Optionally enter the job name.
Select the type of researcher to check
Check affiliated researchers – check affiliated against non-affiliated
Check non-affiliated – check non-affiliated against non-affiliated
It is recommended to first check affiliated researchers, merge as needed and only then check non-affiliated researchers.
Optionally filter the list of researchers by affiliation (relevant to affiliated researchers).
Select whether to check researchers added from the last time this job was run, from a specific date, or for all researchers. If you select to run from a specific date, enter the date.
Select Submit to run the job. You are redirected to the Running tab on the Monitor Jobs page. You can select Refresh to view the latest status of the job.
When the job has completed, select the History tab on the Monitor Jobs page.
Open the report by selecting the report Name, or by selecting Report from the actions menu. The report shows the number of duplicates found. If this number is more than one, select Click to download report to list the potential duplicates in a CSV file.
The report contains the following columns of information:
- Group Number
- Researcher name
- Affiliation type (affiliated/non-affiliated/previously affiliated)
- Academic units
- Primary ID
- Number of assets
- Subjects from the Researcher record including Research topics and Keywords
- Asset information from one of the assets belonging to the researcher, research topic and keywords or title
- Internal Researcher ID
- Other Identifiers
- Merge Status – this is required for the bulk Merge option (it is explained in the following section).
Options for Merging Duplicate Researchers
There are 2 ways to merge researchers that were identified as duplicate:
- Use the Move all assets and grants option (available only from non-affiliated researchers) to move all assets and other entities from the non-affiliated researcher to an affiliated researcher or another non-affiliated researcher, and then delete the non-affiliated researcher. See Managing a Researchers Assets for instructions on how to use this option.
- Use the option to Bulk Merge Duplicate Researchers in the Duplicate Researchers/Bulk Merge Duplicate Researchers feature. This performs the same actions as the Move all assets and grants option but in bulk mode.
Performing a Bulk Merge for Duplicate Researchers
When to Perform a Bulk Merge
The report can be used to check groups of researchers for duplicates. If you decide that any groups do include duplicates, you can then use the Bulk Merge Duplicate Researcher job to merge them in bulk.
What Fields to Merge
There are three mandatory fields for the bulk merge:
- Group ID – do not change this column.
- Internal Researcher ID – do not change this column.
- Merge Status - there are three possible values:
- K or k – indicates “Keep” this researcher i.e., merge other researchers to this researcher. Note that a group can have only one "Keep" researcher.
- M or m – indicates “Merge” this researcher i.e., merge this researcher to the “Keep” researcher. Only a non-affiliated researcher can be merged.
- Blank – no value is added i.e., do not change this researcher.
Performing the Merge for Duplicates
Navigate to Research > Researchers > Duplicate Researcher Report.
Select the Bulk merge option.
Load the file.
Select Submit - the Monitor jobs page displays.
The job report will report on the groups that were successfully merged and invalid groups. Groups can be invalid if:
Any of the mandatory columns is missing.
The internal ID is invalid.
Merge status is invalid.
A group has more than one keep.
The same researcher is a Keep in one group and a Merge in another.
After the Bulk Merge job has completed, it will trigger the Delete redundant non-affiliated researchers job that completes the deletion of all merged researchers. Once that job has completed, you can run the Potential Duplicate Researchers Report again. The report should reflect the latest changes.