OAI-PMH Harvester best practices

Question

What are the OAI-PMH Harvester best practices, limitations and known issues?

Prior to setting up OAI-PMH Harvester for migration legacy repository see article on the developer network:

https://developers.exlibrisgroup.com/blog/Migrating-from-Your-Legacy-Digital-Repository-to-Rosetta

Make sure you understand the purpose of the OAI-PMH Harvester use. You can use it:

To synchronize the metadata of IEs already ingested in Rosetta with an external system:
- Objects originating from Rosetta
- Objects created in external system which still manage the metadata
- From v5.3. the OAI-PMH harvester can match records by any DC or DCTERMS identifier, not only based on external origin identifier (OAI Header ID) or the Rosetta origin (dc:identifier)
- Schedule an Update Metadata Job to perform the on-going updates
To load new objects (files + metadata) into Rosetta for the first time:
- The OAI-PMH Harvester will create SIPs (Dublin Core or METS xml with metadata)
- Use Do not match (duplicate) in Match parameter in the OAI-PMH Harvester configuration
- When user name and password are left empty in the OAI-PMH Harvester configuration no authentication will be performed
- The Submission job will upload the files referenced in these SIPs into Rosetta
- File references in the SIP xml must point to an actual file (not a resolver, e.g. DigiTool's Delivery Manager).
  If using an URL, it must contain a legitimate filename.
  Note that the "Migrating Your Digital Repository to Rosetta" article includes XSL transformation examples for the dc:identifier (e.g. DSpace, DigiTool, DigitalCommons, ContentDM)
- In the Content structure be sure to define a stream source origin (typically dc:identifier)

Testing phase:

Use the OAI-PMH Harvester configuration Test area to verify the connectivity and your XSL Transformation
Use 'ignore last run time' checkbox when repeating test ingests
Test the submission job on larger sets to see if the source files can be downloaded

Other recommendations:

Downloading files via http can take time.
For larger migrations the file references in SIP xml should point to a NFS location mounted to Rosetta
Rosetta will access all URLs provided the SIP xml stream origin (dc:identifier).
Be sure that there are valid objects on these URLs.
Think about how the source system handles versioning or delete of the files.
Placing the set and metadata prefix configuration into the BaseURL is not recommended.
The set and metadata prefix will be appended to the base URL from the configuration form fields.
Rosetta cannot harvest multiple sets at a time.

Note: OAI Transformation xsl currently supports XSLT 1.0