FTP/SFTP Harvesting

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

In order to enable FTP/SFTP harvesting, the source system must be able to extract the entire database for the initial load. For ongoing harvesting, it must be able to extract new, changed, and deleted records. In addition, for ongoing harvesting you must have the ability to schedule the harvesting.

When performing FTP/SFTP harvesting:

Every record should be extracted as a separate XML file, structured using the OAI-PMH protocol ListRecords response format (see OAI-PMH ListRecords and Header Format).
All files should be added to a .tar file, which should be gzipped.

Do not include more than 10 MG in a single .tar file. This is the size before it is compressed.
File names should be unique.
It is recommended to add the timestamp to the file name.
It is recommended to process the files in a separate directory and, only when the file has been fully processed, transfer it to a dedicated directory from which Primo will FTP/SFTP the file. If the site has an NFS server, it is recommended to place the file on the NFS server.

In order for Primo to FTP/SFTP files, the Publishing Platform requires access to the server/directory – that is, it needs the server IP address, directory name, and user name/password. The Publishing Platform harvests all files with a server timestamp greater than the last harvesting date. Optionally, the file can be deleted once it is successfully harvested.