Configuring Import Profiles for Primo VE
Adding a Discovery Import Profile
An Import profile enables you to define an external data source, apply normalization rules, configure delivery links, and schedule the execution of the import profile job.
Primo VE Administration Certification > External Data Sources (12 min)
During the import, the contents of any field must not exceed the maximum field size of 32,766 characters (which is set by Apache Solr).
-
Open the Import Profiles page (Configuration Menu > Discovery > Loading External Data Sources > Discovery Import Profiles), which lists all the Discovery import profiles.
-
Select Add New Profile to open the Import Profile Details page.
Import Profile - Choose Discovery Type (Step 1) -
Ensure that the Discovery option has been selected and then select Next.
Import Profile Details Page (Step 2 - Profile Details Tab) -
Specify the following fields and then select Next.
Discovery Import Profile - Profile Details Field Description Profile Details Profile name (required)
The unique profile name. After it has been defined, it cannot be changed. This name appears in the list of import profiles and also appears at the top of the page.
Profile description
A free text description of the profile that shows on the list of import profiles.
Data Source Code (required)
The unique code used to identify the external data source to the system. After it has been defined, it cannot be changed.
Data Source Label
Defines the data source's display name in the UI and its custom scope name on the Define a Custom Scope page (Configuration > Discovery Search Configuration > Search Profiles). See Defining a Local Data Scope to add search conditions for your external data source.
Search Scope for External Data Sources-
You must save and reopen new search profiles before you can use the Globe icon to translate this field.
-
Changes to this label update its associated code in the Data Source code table.
File name patterns
A file name pattern (such as *.xml) filters out records that do not conform to the pattern you specify. Use this when the FTP directory contains additional files that should not be imported.
Alma looks for this pattern as a sub-string in the file name, exactly like a regular expression, except for two changes:
-
A period . matches only a period (as if you had entered \\.).
-
An asterisk * matches zero or more characters of any type (as if you had entered .*).
Otherwise, use regular expressions.
Ensure that the regular expression you use exactly matches only the files that you want to import. For example:
-
The following retrieves all files ending in ".xml" (without the quotes): .xml$
-
The following retrieves all files containing the string “yLk” (without the quotes): yLk
-
The following retrieves all files beginning with “YLK” (without the quotes): ^YLK
-
The following retrieves all files beginning with “YLK” (without the quotes) followed by a space: ^YLK\s
-
The following retrieves any file that contains at least one of the following words (without the quotes): “harry” or “potter” or “rowling”: \b(harry|potter|rowling)\b
-
The following retrieves any file that contains all the following words (without the quotes): “harry” and “potter” and “rowling”: (?=*?\bharry\b)(?=*?\bpotter\b)(?=*?\browling\b)
Originating system (required)
The type of system from which the records originated.
When using the MARC21 Bibliographic source format, set this field to Other.
Import protocol
The protocol used to retrieve the file containing the records. The options are:
-
FTP – Retrieve by FTP. You must enter fields under Import Processing and FTP Information (see below).
-
Upload File/s – Upload the file from a local or network drive.
-
OAI (Repository and Authority profiles only) – Retrieve using OAI. You must define additional fields associated with the OAI server (see below).
Physical source format
The file format of the import file. The available options are:
-
XML
-
Binary – Used with MARC21 Bibliographic source format only.
Source format
The format of the source records in the import file. Ensure that you base your selection on the source format of the incoming records, not the physical format of the import file or OAI feed:
-
Dublin Core – Dublin Core records encoded in oai-dc format.
-
Generic XML – Generic XML records (including Dublin Core records encoded in oai-dc format).
-
MARC21 Bibliographic – MARC21 records encoded in XML or binary format. With this format, your source records must include either an 035.a or 001 field, which is used as the Source control number to identify the record.
-
UNIMARC Bibliographic – UNIMARC records encoded in XML or binary format. With this format, your source records must include either an 035.a or 001 field, which is used as the Source control number to identify the record.
MARCXML files must be inside an OAI envelope because Discovery Import Profiles use an OAI File Splitter.
Target format
The format in which to save the records in Primo VE. The available options are:
-
Dublin Core – Select this option if Source format is Dublin Core or Generic XML.
-
MARC21 Bibliographic – This option is selected for you if Source format is MARC21 Bibliographic.
-
UNIMARC Bibliographic – This option is selected for you if Source format is UNIMARC Bibliographic.
Status
The default is Active. Select Inactive if you do not want the import profile to be available for use at this time.
Share with Network
For consortia environments only, this parameter indicates whether the external resource is included in the 'Entire network' search scope and shared with other member institution's catalogs.
Enabling/disabling this feature requires you to republish the external data source.
File Splitter Parameters (Generic XML only) Root element tag
The name of the first tag in the file before the list of the records to process (for example, ListRecords).
Record elements tag
The name of the tag representing the beginning of a record. A file may contain one or more records.
XPath to the identifier tag
The XPath to the identifier tag of the record. This should be the tag that contains the unique identifier for the record.
XPath to the location of the deleted status
The XPath to the location of the deleted status of the record. Default value for a record is: deleted = false.
Delete record regular expression
This regular expression is applied to the value found under the XPath to the location of the deleted status field. If the regular expression matches the value found, the record is marked as deleted.
Scheduling Appears if you selected FTP or OAI. When the profile is scheduled, a Metadata Import job appears in the list of scheduled jobs in Alma. For more information on this job, see Viewing Scheduled Jobs and Viewing Running Jobs.
Files to import
Select All for all files found in the FTP location. Select New to select only those files that have not yet been imported.
Scheduler status
Whether the scheduling is active or inactive.
Scheduler
Select one of the scheduling options from the drop-down list. Times depend on your time zone and the server you are using.
Email Notifications
Which users and email addresses receive email notifications when the publishing profile completes. Opens the Email Notifications for Scheduled Jobs page. You can choose whether to send the notifications for successful jobs and/or jobs that contain errors.
FTP Information Appears only if you selected FTP as the Import Protocol.
Description
A description of the FTP submission format that is defined in this section.
Server/Port
The IP address and port of the FTP server sending or receiving the files.
Username/Password
The username and password for logging on to the server that is sending or receiving the files.
Input directory
The path of the submission format’s input directory.
Max. number of files
Not in use. Accept the default value.
Max file size/Size type
Not in use. Accept the default value.
FTP server secured
Whether to use a secure FTP transfer (SFTP)
FTP passive mode
Whether to use FTP passive mode or not. This depends on the setting in your FTP server.
Test Connection
Select to run a test of the FTP connection
OAI DetailsAppears only if you selected OAI as the Import Protocol.
OAI Base URL
The OAI provider’s URL that is used to harvest the metadata from the OAI repository.
Authentication
Indicates whether the OAI server requires you to enter a username and password.
Username
The username if authentication is required to access the server.
Password
The password if authentication is required to access the server.
Connect and Edit
Select to refresh the page with the associated OAI fields after entering the base URL of the server and authentication information if necessary.
Before saving an OAI import profile, you must first select this button to test the OAI connection when adding or updating fields in the OAI Details section.
The following fields appear after you select Connect and Edit. This enables you to specify information specific to the OAI repository, which is provided by the OAI provider.
Repository Name
The name of the repository.
Granularity
Indicates the granularity of the repositories date stamp.
Earliest Date Stamp
The earliest that data exists in the OAI provider records.
Admin's E-Mail
The e-mail address of an administrator of the OAI repository.
Metadata Prefix
The metadata prefix from the OAI provider. Currently, Primo VE supports oai_dc and qdc formats only. There is no current plan to support oai_marc.
Set
If selective harvesting by group is necessary, specify the set name.
Identifier Prefix
The shared prefix that appears before the actual unique record identifier when harvesting by the identifier.
Harvest Start Date
When submitting a new import job and after the job completes successfully, the Harvest Start Date is updated automatically with the job’s ending time.
For contentDM harvests, it is recommended that you leave this field empty due to an issue with contentDM.
Encode Date
Whether the repository supports encoding dates.
Encode data
Whether the repository encodes the data.
Open Test Page
Select to test the OAI connection and flow. See Testing OAI Import Protocol Flow.
-
-
In the Normalization tab, select a normalization rules process if this data source requires additional normalization. For more information, see Creating Normalization Processes for External Data Sources.
Import Profile Details Page (Step 3 - Normalization Tab) -
In the Delivery tab, configure the following types of delivery links in their respective sections:
-
Link to Resource – Displays digital resources that include a link to the digital repository file. This configuration provides unrestricted access to full text and displays the link to full text in both the View Online and Links section on the Full Display page. In addition, records of this type display the Available online status on the Brief Results page and are included with the Available online facet.
When both this option and the Link to Resource - Other Statuses option are configured, the import uses only one configuration. For more details, see Link to Resource - Other Statuses. -
Link to Resource - Other Statuses – Displays digital resources that may or may not include a link to the digital repository file. This configuration provides either restricted access to full text or no access to full text. When restricted access is given to the record, a link to full text appears in both the View Online and Links sections on the Full Display page. In addition, records of this type display one of the following restricted availability statuses on the Brief Results page: Not available online, Check for availability, Not available online + Check holdings, or Check for availability + Check holdings. For more information regarding these statuses, see Configuring Display Labels for Availability Statuses and Full Text Links.
You can configure both the Link to Resource and Link to Resource - Other Statuses options, but the import uses only one option to create the link to full text and to determine which availability status appears for the external record. The following table shows which configuration is given preference when both are configured. If the conditions for the preferred configuration are not met, the import uses the other configuration option if its conditions are met.
Link to Resource Link to Resource - Other Statuses Method Used to Create Link to Full Text Template Template If its conditions are met, use only the template from the Link to resource section. Template Static URL from source If its conditions are met, use only the static URL from the Link to resource - Other Statuses section. Static URL from source Template If its conditions are met, use only the static URL from the Link to resource section. Static URL from source Static URL from source If its conditions are met, use only the static URL from the Link to resource - Other Statuses section. -
Link to Request – Displays physical records that are loaded from an external data source and includes a link to the source system for the physical item's delivery options. This type of link appears in the Links section on the Full Display page, and records with this type of link display the Check holdings status on the Brief Results page and are included with the Held by library facet.
-
Link to Thumbnail – Displays the thumbnails for resource type images.
The link must return an image file of a valid type (such as .bmp, .png, .jpg, and so forth).
Import Profile Details Page (Step 4 - Delivery Tab)-
For each type of link (resource, request, and thumbnail), specify the following information:
-
Select the method used to create the delivery links:
-
Template – Select this option if it is necessary to generate the delivery links using a combination of static text and linking parameters, which include normalized data taken from the source record. In addition to the source record's ID ($$SourceRecordID), the linking parameters ($$LinkingParameter1 - $$LinkingParameter5) are defined in the Linking Parameters section.
-
Static URL from source – This method enables you to use the source record's static URL for delivery, which may be stored in a Dublin Core tag (such as dc:uri) or a MARC21 field.
-
-
For the Template method only, specify the relevant linking parameters and static text to build the delivery link in the Template field. For example:
http://hdl.handle.net/$$LinkingParameter1 -
For the Static URL method only, enter the following fields:
-
Dublin core tag – (DC targets only) Specify the Dublin Core tag from which to retrieve the record's static URL.
-
Field, Ind1, Ind2, and Sub field – (MARC targets only) Specify the MARC field, matching indicators, and the MARC subfield from which to retrieve the record's static URL.
-
-
Link Label – For resource links only, specify the label that displays for all resource links in Primo VE.
You must save and reopen new search profiles before you can use the Globe icon to translate this field.
-
Create a link – This checkbox appears only for the for the Link to Resource - Other Statuses option. If you are using the Static URL method, you can omit the full text links in the record's full display by clearing this checkbox. There is no option to omit the full text links when using the Template method.
-
-
In the Linking Parameters section, edit each linking parameter that was assigned to a template above. The Edit Linking Parameter dialog box opens:
Edit Linking Parameter Dialog Box -
Specify the following fields and then select Save.
-
Source Tag (DC only) - Search for and select the DC source tag from which to get the delivery URL.
-
Field, Ind1, Ind2, and Sub field (MARC only) - Search for and select the MARC field, matching indicators, and the MARC subfield from which to get the delivery URL.
-
Use source tag - Select one of the following options to indicate whether the tag must contain a specific value:
-
Always - No matching string is necessary. The value is always used.
-
Matching string - The value of the tag must match the string specified in the Matching String field.
-
Matching string using a regular expression - The value of the tag must match the result of the regular expression specified in the Matching string field.
-
-
Matching String - If matching is necessary, specify the string that you want to match.
-
Normalization source tag - Select one of the following options to indicate whether normalization is necessary before loading the value into Primo VE.
-
No normalization - No normalization is needed. The value of the tag is taken as is.
-
Normalization using a regular expression - A regular expression is used to extract part of the value before saving to Primo VE.
-
-
Normalization pattern - If you need to extract parts of the tag, use regex groups to capture the necessary parts, which are concatenated to form the returned value from the source tag.
The following configuration matches dc:identifier fields with values that begin only with http or https, and then concatenates the values extracted with groups part1 and part2. For example, if dc:identifier contained the URL https://knowledge.exlibrisgroup.com/part1=whatever&part2=_you_want, the returned value would be whatever_you_want.
Field Value Source Tag
dc:identifier
Use source tag
Matching string using a regular expression
Matching String
^(?:http(s)?:\/\/).*
Normalization source tag
Normalization using a regular expression
Normalization pattern
.*part1=(.+)&part2=(.+)
-
-
-
Select Save to save the import profile.
Configuring Display Labels for Availability Statuses and Full Text Links
The Calculated Availability Text Labels code table enables you to customize/translate the display labels used for restricted/unrestricted availability statuses and full text links. The following table lists the supported codes.
Code | Description |
---|---|
Link to Resource - Other Statuses (Restricted): For more information, see the Link to Resource - Other Statuses option. | |
delivery.code.ext_restrictedNoLink |
This code displays the following availability status label when there is no access to full text: Not available online |
delivery.code.ext_restrictedWithLink |
This code displays the following availability status label when there is restricted access to full text: Check for availability |
delivery.code.ext_restrictedNoLink_and_physical |
This code displays the following availability status label when there is no access to full text, but there may be access to physical inventory: Not available online + Check holdings |
delivery.code.ext_restrictedWithLink_and_physical |
This code displays the following availability status label when there is restricted access to full text, but there may be access to physical inventory: Check for availability + Check holdings |
delivery.code.ext_restrictedNoLabel |
This code displays the following label for full text links if no label is configured for restricted resources: Link to external resource |
Link to Resource (Unrestricted): For more information, see the Link to Resource option. | |
delivery.code.ext_not_restricted |
This code displays the following availability status label when there is unrestricted access to full text: Available Online |
delivery.code.ext_not_restricted_and_physical |
This code displays the following availability status label when there is unrestricted access to full text, and there may be access to physical inventory: Available online + Check holdings |
Running a Discovery Import Profile
Discovery import profiles enable you to import records from external sources or update records that you have already imported from an external source.
-
Because the import assigns a unique MMS ID to each record based on the specified identifier in the import profile, subsequent imports of the same record (regardless of the import profile used) updates the existing record and does not create a new record.
-
If records are imported with one discovery import profile and updated with a second discovery import profile, the search scope for those records are changed to reflect the data source code and label of the second discovery import profile. If you had created conditions for the original data source in a local data scope, they must be updated to reflect the change to the data source.
To run a discovery import profile:
-
Open the Import Profiles page (Configuration Menu > Discovery > Loading External Data Sources > Discovery Import Profiles), which lists all of the Discovery import profiles.
-
From the list of profiles, select one of the following row actions for the external source:
-
Run – Execute this command the first time that you want to load external data into Primo. If you use this option to reload data, permalinks for the records are not retained.
-
Reload – This action enables you to reload all records from an external data source without having to reharvest files from the external data source. This is useful if you need to apply indexing changes to existing records.
Any records added before the Primo VE July 2020 Release will not be processed with this option.
-
Reload and Delete – This action enables you to reload existing data and retain the permalinks for each existing record. This option removes any records that are not included in the harvest or import files.
-
Any records added before the Primo VE July 2020 Release will not be processed with this option.
-
For OAI, the records that are imported is based on the Harvest Start Date field. Any record that is harvested before that date is not included in the import. To import all records from the repository, clear the Harvest Start Date field.
-
-
-
If you are loading records from files stored to your workstation, specify the files and select Submit. For example:
Select Files for Import -
On the Job History page, select Refresh to view the progress of the import. For example:
Monitor Discovery Import Profile