Structure of the XML File
If you are working with Primo VE and not Primo, see Understanding the Dedup and FRBR Processes (Primo VE).
-
Handlers—these are the fields or group of fields that the algorithm uses for matching. The handler includes the program ("class"), which compares the fields and calculates match points by adding and subtracting points based on the comparisons.
-
Thresholds—this section defines the matching stages. The standard algorithm includes two thresholds: quick match and full match.
-
Steps—this section specifies the stages of the match.
-
Common title list—this section defines a list of additional files used by the algorithm to determine matches.
Handlers
-
<handler id> – the handler ID. This ID is used in the steps section.
-
<fieldid> – this is the field or group of fields from the PNX dedup section. Multiple fields should be separated by a comma
-
<name> – the name of the program used to match the fields. The programs are explained below.
-
<arguments> – the parameters of the specified program. These parameters also assign/subtract match points.
<handlers>
<handler id="CDLID">
<fieldID>f1,f2,f3,f4</fieldID>
<name>com.exlibris.primo.publish.platform.dedup.cdlimpl.CDLIDComparator
</name>
<arguments>
<argument name="recID_match">+200</argument>
<argument name="recID_recIDInvalid_match">+100</argument>
<argument name="recIDInvalid_match">+50</argument>
<argument name="recID_mismatch">-470</argument>
<argument name="recID_recIDInvalid_mismatch">-50</argument>
<argument name="ISBN_match">+85</argument>
<argument name="ISBN_ISSN_match">+30</argument>
<argument name="ISSN_ISSN_match">+10</argument>
<argument name="ISSN_ISBN_mismatch">-225</argument>
</arguments>
</handler>
.
.
.
</handlers>
Thresholds
-
<upper_threshold>—records with point totals that meet or exceed this value are considered a duplicate and processing is stopped.
-
<lower_threshold>—records with point totals that meet or exceed this value are considered a duplicate and processing is stopped.
<threshold id="tr1">
<upper_threshold>+850</upper_threshold>
<lower_threshold>0</lower_threshold>
</threshold>
<threshold id="tr2">
<upper_threshold>+875</upper_threshold>
</threshold>
</thresholds>
Steps
<step type="handler">CDLID</step>
<step type="handler">CDLShortTitle</step>
<step type="handler">CDLDate</step>
<step type="threshold">tr1</step>
<step type="handler">CDLSubShortTitle</step>
<step type="handler">CDLLongTitle</step>
<step type="handler">CDLCountryOfPub</step>
<step type="handler">CDLPagination</step>
<step type="handler">CDLPublisher</step>
<step type="handler">CDLMainEntry</step>
<step type="handler">PhysicalFormat</step>
<step type="handler">Edition</step>
<step type="threshold">tr2</step>
</steps>
Common Title List
<file_name>CDLSeCommonTitleList.txt</file_name>
</common_title_list>