Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Configuring Pipes

    Harvesting source records and creating PNX records are managed by the Publishing Platform. The publishing platform supports scheduled and unattended harvesting and processing of various data formats, allowing interactive monitoring and control over the entire set of activities.
    Within the publishing platform, PNX records are created by publishing pipes. Every data source has its own pipe. Each data source may have its own set of normalization rules, or several data sources may be linked to one set of normalization rules.
    This section covers the following aspects of pipes:

    Defining a Pipe

    The Define Pipe page allows you to add and update pipes. After you have created or updated a pipe, you will need to execute the pipe to create or update the PNX records. For information on executing and monitoring pipes, see Monitoring Pipe Status.
    To create an effective pipe for your system, first create your data sources, normalization mapping sets, and enrichment sets.
    DefinePipe.png
    Define Pipe Page
    To define a new pipe:
    1. Click Pipe Configuration Wizard on the Ongoing Configuration Wizard page.
      The Pipe Configuration Wizard page opens.
      You can also access the Define Pipe page by clicking Create new pipe on the Primo Home > Monitor Primo Status > Pipe Monitoring page.
    2. Click Pipes Configuration.
      The Pipes Configuration page opens.
      Pipe Configuration Page
    3. Click Define Pipe.
      The Define Pipe page opens (see Define Pipe Page).
    4. Select the name of the institution from the Owner drop-down list. For institution-level staff users, your institution will already be selected.
      For installation-level users, you must select an institution before the associated values appear in the drop-down lists that display the Select Institution value.
    5. In the Pipe Name field, enter the name of the new pipe.
      The Pipe name is composed of letters, numbers, and/or the underscore character.
    6. In the Pipe Description field, enter a description for the new pipe.
    7. Enter the remaining fields as described in the following table.
      Define Pipe Details
      Field name Description
      Pipe Type
      Indicates the type of pipe. The following types are valid:
      • Regular – This type of pipe uses records harvested from the data source to create, update, and delete PNX records. For more information on the stages of pipe execution, see Configuring the Publishing Platform Pipe Flow.
      • Delete Data Source – This type of pipe is used to delete a data source from the Primo database, including data from dedup and FRBR groups. It removes all previously harvested records from the P_PNX and P_SOURCE_RECORD tables for the specified data source. In addition, it removes all tags and reviews.
      • No Harvesting – Update Data Source – This pipe is similar to a “Regular” pipe, but records are not harvested from the data source. It uses all of the previously harvested source records from the P_SOURCE_RECORD table instead of the data source. This type of pipe is typically used when it is necessary to re-normalize and/or enrich all records from a specific data source (for example, due to a change in normalization rules).
      • Delete Data Source and Reload – This pipe is similar to the Regular pipe, but if first removes all harvested records from the P_PNX and P_SOURCE_RECORD tables before reloading the PNX records from the data source. This option is intended for data sources (such as MetaLib) that have to harvest the entire database each time. This ensures that deleted records from the data source are removed from Primo.
      The default value is Regular.
      When running pipes (such as pipes set to No Harvesting - Update Data Source) that add or change a large amount of data, it is recommended that you stop Oracle archiving, as this slows down the process and fills up the disk. Immediately after the process is complete, perform a full cold backup and then turn archiving back on.
      Records that are deleted and re-inserted using the Delete Data Source and Reload option may be included with the tally of the updated records (instead of the deleted and inserted records) in the pipe’s log.
      Data Source
      The data source of the pipe.
      Normalization Mapping Set
      The normalization set used to map the source records to the PNX.
      Priority
      This field defines the priority of the pipe: Low, Medium, High, and Critical.
      Pipes with the highest priority run first. The default setting is Medium.
      Maximum error threshold
      The maximum percent of errors allowed until the system stops running the pipe.
      Harvesting method
      The method used to harvest the source information. The following methods can be selected: FTP, Copy, OAI, and SFTP.
      If Copy is selected, the user must have read permission for the directory.
      Enrichment Set
      The enrichment set used to enrich the records.
      Harvested File Format
      Indicates the format of the harvested file. The following values are valid: *.tar.gz, *.tar, *.gz, *.warc, *.warc.gz, and *.zip.
      This field is not available with all types of pipes, such as Delete Data Source.
      The *.gz, *.warc, *.warc.gz, and *.zip formats require the data source to use the WARC file splitter.
      Start harvesting files/records from
      The date from which to harvest the records.
      • For FTP/Copy this is the date and time of the file to harvest. Following harvesting, this date is updated with the date of the latest harvest file.
      • For OAI this is the date and time on which the file is to be updated. Following harvesting this is updated with the date of the request.
      This date is updated after each successful run of the pipe to ensure that all harvested files have been processed completely.
      Start time
      The time from which to harvest the records.
      System Last Stage
      This field allows you to change the last stage that is run during the execution of a pipe. By default, this field is set to FRBR, the last stage of pipe execution. The following values are valid:
      • PERSISTENCE – This option stops the execution of the pipe after loading records to the database. Note that the Dedup and FRBRization stages are not executed.
      • DEDUP – This option stops the execution of the pipe after the Dedup stage. Note that the FRBRization stage is not executed.
      • FRBR – This default option stops the execution of the pipe after the FRBRization process completes.
      • FRBR WITHOUT DEDUP – This option skips the Dedup stage and stops the execution of the pipe after the FRBRization process completes.
      This field does not display when the Parallel Processing of Pipes mode is set to Harvesting, NEP on the General Configuration page.
      Include DEDUP
      Indicates whether the Dedup stage will be executed when the Parallel Processing of Pipes mode is set Harversting, NEP on the General Configuration page.
      Include FRBR
      Indicates whether the FRBR stage will be executed when the Parallel Processing of Pipes mode is set Harversting, NEP on the General Configuration page.
      Force DEDUP
      Indicates whether Dedup processing is performed on PNX records that have no changes to the dedup section. This allows you to apply changes made to the Dedup rules.
      If the pipe is not configured to run the Dedup stage, Dedup processing will not be forced regardless of this setting.
      Force FRBR
      Indicates whether FRBR processing is performed on PNX records that have no changes to the frbr section. This allows you to apply changes made to the FRBR rules.
      If the pipe is not configured to run the FRBR stage, FRBR processing will not be forced regardless of this setting.
      Server
      The IP used to access the server.
      This field appears only if the harvesting method is OAI, FTP, or SFTP.
      For OAI, the system supports the HTTPS protocol for harvesting.
      Username
      The user name used to access the server.
      This field appears only if the harvesting method is FTP or SFTP.
      Password
      The password used to access the server.
      This field appears only if the harvesting method is FTP or SFTP.
      Metadata format (OAI only)
      All OAI-PMH compliant repositories can return records in Dublin Core format. The Dublin Core format is usually expressed as oai_dc, but some repositories use a different code. Enter the term used by your repository.
      This field appears only if the harvesting method is OAI.
      Set (OAI only)
      OAI repositories may organize items into sets, allowing you to selectively harvest information. Specify the name of the set if you want to harvest only a specific part of the OAI repository.
      This field appears only if the harvesting method is OAI.

      Encode Resumption Token (OAI only)

      Indicates whether to encode the resumption token (such as characters like @) within the OAI protocol. The valid values are true and false. The default value is false.

      This field appears only if the harvesting method is OAI.

      Source directory
      The directory of the source record. This is used for copy only.
      This field appears only if the harvesting method is Copy, FTP, or SFTP.
      Delete after copy
      Indicates whether the system should delete the source files after the harvest. If selected, the files are deleted as follows, per Harvesting method:
      • Copy – The files are removed from the directory on the Primo server.
      • FTP/SFTP – The files are removed from the directory on the source server. If the staff user does not have write permissions to the source files, the system will stop the pipe and log the following error:
      stop harvest error
      If this check box is not selected, the source files are not removed from their respective directories after harvesting.
      After the harvest, the system stores a copy of the source files in the harvest directory. To view the harvested files, enter the following commands:
      • be_pipes
      • cd <pipe_name>/<data_source>/<timestamp-of-the-pipe_run>/harvest
      Configure Server Locale
      When this field is selected, this page opens the Server Locale field.
      This field appears only if the harvesting method is FTP.
      Server Locale
      Select a locale from the drop-down list.
      This field appears only if the harvesting method is FTP and the Configure Server Locale check box is selected.
      By default, the harvester assumes the locale of the server is English. If the locale of your server is different, you must select the relevant locale.
    8. For FTP, OAI, and SFTP harvesting methods, click Test Connection to verify the connection to the server.
    9. Click Save.

    Editing a Pipe

    You can edit the pipe details if the pipe is not running.
    To edit a Pipe:
    1. On the Primo Home > Monitor Primo Status > Pipe Monitoring page, click Edit next to the pipe that you want to update.
      The Define Pipe page opens, showing the details of the specified pipe (see Define Pipe Page).
    2. Edit the fields according to Define Pipe Details.
    3. Click Save to update the pipe's settings.

    Deleting a Pipe

    You can delete a pipe that has not been executed. After it has been executed, you must open a Support ticket to have it deleted.
    When a pipe is deleted, the system will also delete any schedules created for the pipe.
    To delete a Pipe:
    1. On the Primo Home > Monitor Primo Status > Pipe Monitoring page, click Edit next to the pipe that you want to delete.
      The Define Pipe page displays the specified pipe's details (see Define Pipe Page).
    2. Click Delete Pipe to delete the pipe.