How to – Force records from external data sources to be updated or deleted in Primo VE
When harvesting external data sources with OAI, it can for example happen:
- that records are not exactly processed as expected (some Normalization Rule has not been applied as expected;
- that your Normalization Rule has to be updated and improved in order to come to better display or indexing results;
- that records are deleted from the source in a way that it is not reflected in Primo VE (so you keep in Primo a record for an item that does not exist anymore in the source database).
In these cases, you may not necessarily want to run a full reload of the entire source (full reload from Earliest Date Stamp or delete job + full reload), especially if the harvested collection is a large source with hundreds of thousands of records. This how-to explains how to run a Discovery Import Profile job (Import Data to Primo VE) on a particular record or on a small group of records.
Step 1: Prepare your file
Scenario A: Force a record to be removed from Primo discovery
Normally, a deleted record must include a header with the attribute status="deleted" and must not include metadata or about parts. If a repository does not keep track of deletions, then such records will simply vanish from responses and there will be no way for a harvester to discover deletions through continued incremental harvesting. As a consequence, records deleted from the source repository will remain visible in Primo. How to force a record to be removed from Primo discovery?
(1) Find the source record ID that you want to remove from Primo (here: oai:myrepository.be:1234/208835).
(2) Create an XML file like this one:
<ListRecords>
<record>
<header status="deleted">
<identifier>oai:myrepository.be:1234/208835</identifier>
</header>
</record>
</ListRecords>
The <header> should contain status="deleted" and <identifier> should contain the ID of the record to remove from Primo.
If your Source format is "Dublin Core", the structure of example here above should be the one to use. If your Source format is "Generic XML", the structure of your XML file should match the File splitter parameters. Here:
(In my case, the structure of the Generic XML is identical to Dublin Core since we process DC records as XML because XML Normalization Rule are more powerful.)
(3) Save the file on your Desktop (.txt or .xml extension).
(4) Move to Step 2.
Scenario B: Force a record or a small group of records to be updated
When loading external data, it has sometimes happened to me that a record was not processed as expected (especially with local discovery fields) or that source record contained incorrect data that it was worth to correct.
In this scenario, we use incorrect language codes as an example. For example, a repository did not always use correct language codes in all records:
- <dc:language>pa</dc:language> was sometimes used for <dc:language>spa</dc:language> (Instead of being in Spanish, the item was announced to be in Pendjabi)
- <dc:language>ta</dc:language> was sometimes used for <dc:language>ita</dc:language> (Tamil instead of Italian)
- <dc:language>ng</dc:language> was sometimes used for <dc:language>eng</dc:language> (Ndonga instead of English)
In these cases, you can either:
- Create a file with the concerned records, correct the language codes directly in the file and upload the file for an update as described under Step 2
- Update your Normalization Rule, then create a file with the concerned records and finally reload records as they are (with the incorrect language codes) as described under Step 2
Option 2 is probably better on the long term because any new or updated record containing any of the unwanted language codes will be harvested and processed as expected and a correct language will be in Primo. The process given here under is valid for records whose Source format is considered as "Generic XML".
(1) Identify the records that should be corrected.
(2) Copy and paste each of these records from the OAI server (ex. https://myrepository.be/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:myrepository.be:1234/208835) into a an XML file structured like this one:
<ListRecords> <record> ... </record> <record> ... </record> <record> ... </record> ... </ListRecords>
The structure of your XML file should match the File splitter parameters. If needed, adapt the hierarchy of the file to match the File splitter parameters that you have configured.
(3) Update of the Normalization Rule:
I replaced my default language NR rule:
rule "Copy language" when true then copy "//*[local-name()='language']/text()" to "dc"."language" end
by these new lines:
rule "Correct language ng > eng" when exist "//*[local-name()='language'][contains(.,'ng')]" then set "eng" in "dc"."language" end rule "Correct language ta > ita" when exist "//*[local-name()='language'][contains(.,'ta')]" then set "ita" in "dc"."language" end rule "Correct language pa > spa" when exist "//*[local-name()='language'][contains(.,'pa')]" then set "spa" in "dc"."language" end rule "Copy language ALL" when exist "//*[local-name()='language'][not(contains(., 'ng')) and not(contains(., 'ta')) and not(contains(., 'pa'))]/text()" then copy "//*[local-name()='language']/text()" to "dc"."language" end
(4) Save the file on your Desktop.
(5) Move to Step 2.
Step 2: Run the job
(1) In your Discovery Import Profile configuration, switch your Import Protocol from "OAI" to "Upload File/s".
Make sure first to copy and save your "OAI Base URL" stored in section "OAI Details" since moving the import protocol from "OAI" to "Upload File/s" will empty the "OAI Details" section.
Save the new configuration.
(2) Click on "Run" in the Action list.
You can now select the file(s) to upload.
Click on Submit.
Once the "Import Data to Primo VE" job is completed, you can check the Report and see that your record(s) has/have been updated.
Clear your cache if necessary and check in Primo.
Set your Import Protocol back to "OAI" and put your "OAI Base URL" back in section "OAI Details", ready for the next scheduled job.