Matching Programs
If you are working with Primo VE and not Primo, see Understanding the Dedup and FRBR Processes (Primo VE).
General Programs
DedupStringComparator
<handler id="CDLSubShortTitle">
<fieldID>f9</fieldID> <name>com.exlibris.primo.publish.platform.dedup.comparator.DedupStringComparator
</name>
<arguments>
<argument name="match">-450</argument>
</arguments>
</handler>
-
If the values for both records are null, return a value of both_missing. Otherwise, continue with next step.
-
If one of the values is null, return a value of one_missing. Otherwise, continue with next step.
-
If both values match exactly, return a value of match (-450). Otherwise, continue with next step.
-
If one of the values is a substring of the other, return a value of within. Otherwise, return a value of mismatch (if not specified, return 0).
CDLMainEntrySerialComparator
<fieldID>f11</fieldID> <name>com.exlibris.primo.publish.platform.dedup.cdlimpl.CDLMainEntrySerial
Comparator</name>
<arguments>
<argument name="match">+200</argument>
<argument name="keywords_weight_factor" param="59">75</argument>
<argument name="keywords_order_base_weight" param="59">25</argument>
<argument name="mismatch">-250</argument>
</arguments>
</handler>
-
If one of the values is null, then return 0.
-
If both values are identical, return the value of match.
-
Count the number of words that are equal in both entries and perform the following checks:
-
If more then 60% the words are equal, take the ratio between the equal words and the number of words in the longest title and multiply it by the value of keywords_weight_factor (+75).
-
If the words that are common between the two titles are a substring to the short title, or vice versa, then add the value found in the previous check with the value of keywords_order_base_weight (+25).
-
-
If no match is found, return the value of mismatch (-600).
DedupNumericComparator
<fieldID>f6</fieldID>
<name>com.exlibris.primo.publish.platform.dedup.comparator.
DedupNumericComparator</name>
<arguments>
<argument name="match">+200</argument>
<argument name="within" param="2">-25</argument>
<argument name="mismatch">-250</argument>
</arguments>
</handler>
-
If one of the values is null, then return 0.
-
If both values match, return the value of match (+200).
-
If the difference between the two date fields is within the value of the parameter’s attribute (2), then return the value of within (-25).
-
Otherwise, return the value of mismatch (-250).
CDLMainEntryComparator
<fieldID>f11</fieldID>
<name>com.exlibris.primo.publish.platform.dedup.cdlimpl.
CDLMainEntryComparator</name>
<arguments>
<argument name="match">+125</argument>
<argument name="both_missing">+75</argument>
<argument name="one_missing">+25</argument>
<argument name="keywords_weight_factor" param="49">80</argument>
<argument name="keywords_order_base_weight" param="49">10</argument>
<argument name="mismatch">-200</argument>
</arguments>
</handler>
-
If both of the values are missing, return the value of both_missing (+75). If a value is not specified, return 0.
-
If one of the values is missing, return the value of one_missing (+25). If a value is not specified, return 0.
-
If both values match, return the value of match (+125).
-
Count the number of words that are equal in both entries and perform the following checks:
-
If more then 60% of the words are equal, take the ratio between the equal words and the number of words in the longest title and multiply it by the value of keywords_weight_factor (+80).
-
If the words that are common between the two titles are a substring of the short title, or vice versa, then add the value found in the previous check to the value of keywords_order_base_weight (+10).
-
-
If no matches are found, return the value of mismatch (-200).
Specific Programs
CDLIDSerialComparator
This is a complex program that compares the record ID (usually the LCCN for MARC data sources) and the ISSN of a candidate record with the corresponding fields of the original record and assigns a point value based on the checks performed.
<fieldID>f1,f2,f3,f4,f5</fieldID>
<name>com.exlibris.primo.publish.platform.dedup.cdlimpl.CDLIDSerialComparator
</name>
<arguments>
<argument name="recID_match">+200</argument>
<argument name="recID_recIDInvalid_match">+100</argument>
<argument name="recIDInvalid_match">+50</argument>
<argument name="recID_mismatch">-470</argument>
<argument name="recID_recIDInvalid_mismatch">-50</argument>
<argument name="ISSN_match">+200</argument>
<argument name="ISSNInvalid_match">+50</argument>
<argument name="ISSNCanceled_match">+10</argument>
<argument name="ISSN_ISSNInvalid_match">+100</argument>
<argument name="ISSN_ISSNCanceled_match">+50</argument>
<argument name="ISSNInvalid_ISSNCanceled_match">+30</argument>
<argument name="ISSN_ISSN_mismatch">-250</argument>
</arguments>
</handler>
-
The program performs the RECID comparisons listed in the following table.
# Original Candidate Return Value (# of Points) 1RECID (f1)RECID (f1)recID_match (+200)2RECID (f1)RECID_INVALID (f2)recID_recIDInvalid_match (+100)3RECID_INVALID (f2)RECID (f1)recID_recIDInvalid_match (+100)4RECID_INVALID (f2)RECID_INVALID (f2)recIDInvalid_match (+50) -
If the program finds a match, it saves the corresponding value from the Return Value column and continues with Step 5 to check the ISSNs. Otherwise, the program continues with the next step.
-
If the original RECID (f1) and the candidate RECID (f1) exist, the program saves the value recID_mismatch (-470) and continues with Step 5 to check the ISSNs. Otherwise, the program continues to the next step.
-
If either of the following statements is true, the program saves the value recID_recIDInvalid_mismatch (-50) and continues with the next step to check the ISSNs.
-
The original RECID (f1) and the candidate RECID_INVALID (f2) exist.
-
The original RECID_INVALID (f2) and the candidate RECID (f1) exist.
Otherwise, the program continues to the next step.
-
-
The program performs the ISSN comparisons listed in the following table:
Test Original Candidate Return Value
(# of Points)1ISSN (f3)ISSN (f3)ISSN_match (+200)2ISSN_INVALID (f4)ISSN_INVALID (f4)ISSNInvalid_match (+50)3ISSN_CANCELED (f5)ISSN_CANCELED (f5)ISSNCanceled_match (+10)4ISSN (f3)ISSN_INVALID (f4)ISSN_ISSNInvalid_match (+100)5ISSN (f3)ISSN_CANCELED (f5)ISSN_ISSNCanceled_match (+50)6ISSN_INVALID (f4)ISSN_CANCELED (f5)ISSNInvalid_ISSNCanceled_match (+30) -
If the program finds a match, it saves the corresponding value from the Return Value column and continues with Step 8. Otherwise, the program continues with the next step.
-
If the original ISSN (f3) and the candidate ISSN (f3) exist, the program saves the value ISSN_ISSN_mismatch (-250).
-
The program compares the return values from the RECID and ISSN checks and returns the highest value, disregarding the sign of the number (for example, a return value of -650 is higher than +50).
CDLIDComparator
<fieldID>f1,f2,f3,f4</fieldID>
<name>com.exlibris.primo.publish.platform.dedup.cdlimpl.CDLIDComparator
</name>
<arguments>
<argument name="recID_match">+200</argument>
<argument name="recID_recIDInvalid_match">+100</argument>
<argument name="recIDInvalid_match">+50</argument>
<argument name="recID_mismatch">-470</argument>
<argument name="recID_recIDInvalid_mismatch">-50</argument>
<argument name="ISBN_match">+85</argument>
<argument name="ISBN_ISSN_match">+30</argument>
<argument name="ISSN_ISSN_match">+10</argument>
<argument name="ISSN_ISBN_mismatch">-225</argument>
</arguments>
</handler>
-
The program performs the RECID comparisons listed in the following table:
# Original Candidate Return Value (# of Points) 1RECID (f1)RECID (f1)recID_match (+200)2RECID (f1)RECID_INVALID (f2)recID_recIDInvalid_match (+100)3RECID_INVALID (f2)RECID (f1)recID_recIDInvalid_match (+100)4RECID_INVALID (f2)RECID_INVALID (f2)recIDInvalid_match (+50) -
If the program finds a match, it saves the corresponding value from the Return Value column and continues with Step 5 to check the ISBNs. Otherwise, the program continues with the next step.
-
If the original RECID (f1) and the candidate RECID (f1) exist, the program saves the value recID_mismatch (-470) and continues with Step 5 to check the ISBNs. Otherwise, the program continues to the next step.
-
If either of the following statements is true, the program saves the value recID_recIDInvalid_mismatch (-50) and continues with the next step to check the ISBNs.
-
The original RECID (f1) and the candidate RECID_INVALID (f2) exist.
-
The original RECID_INVALID (f2) and the candidate RECID (f1) exist.
Otherwise, the program continues to the next step. -
-
The program performs the ISBN comparisons listed in the following table:
# Original Candidate Return Value (# of Points) 1ISBN (f3)ISBN (f3)ISBN_match (+85)2ISBN (f3)ISSN_INVALID (f4)ISBN_ISSN_match (+30)3ISSN_INVALID (f4)ISBN (f3)ISBN_ISSN_match (+30)4ISSN_INVALID (f4)ISSN_INVALID (f4)ISSN_ISSN_match (+10) -
If the program finds a match, it saves the corresponding value from the Return Value column and continues with Step 8. Otherwise, the program continues with the next step.
-
If any of the following statements is true, the program saves the value ISSN_ISBN_mismatch (-225) and continues with the next step.
-
The original ISSN_INVALID (f4) and the candidate ISBN (f3) exist.
-
The original ISSN_INVALID (f4) and the candidate ISSN_INVALID (f4) exist.
-
The original ISBN (f3) and the candidate ISSN_INVALID (f4) exist.
-
The original ISBN (f3) and the candidate ISBN (f3) exist.
Otherwise, the program continues with the next step. -
-
The program compares the return values from the RECID and ISBN checks and returns the highest value, disregarding the sign of the number (for example, a return value of -470 is higher than +85).
CDLTitleSerialComparator
-
If the f7 fields from the original and candidate records are equal, perform the following checks. Otherwise, continue with the next step.
-
If a word is from the common word list (see CDLSeCommonTitleList.xml File), return a value of full_common_match (+135).
-
If a word is not part of the common word list, return a value of full_match (+600).
-
-
If the f8 fields from the original and candidate records are equal, perform the following checks. Otherwise, continue with the next step.
-
If a word is in the common word list (see CDLSeCommonTitleList.xml File), return a value of full_truncated_common_match (+135).
-
If a word is not in the common word list, return a value of full_truncated_match (+175).
-
-
If any words are common in both titles, perform the following checks. Otherwise, return a value of mismatch (-600).
-
If more than half of the words are common, divide the number of common words by the number of words in the longest title and then multiply it by the value of keywords_weight_factor (+75).
-
If any of the common words are a substring of the short title, or visa versa, return the sum of the previous value and the value of keywords_order_base_weight (+50)
-
CDLTitleComparator
-
If the titles are equal, perform the following checks. Otherwise, go to the next step.
-
If the length of the title is less than nine characters, return a value of 0.
-
Otherwise, return a value of match (+600).
-
-
If one title is a substring of the other title, return a value of within (+350). Otherwise, continue with the next step.
-
If any words are common in both titles, perform the following checks. Otherwise, return a value of mismatch (+350).
-
If more than half of the words are common, divide the number of common words by the number of words in the longest title and then multiply it by the value of keywords_weight_factor (+450).
-
If any of the common words are a substring of the short title, or visa versa, return the sum of the previous value and the value of keywords_order_base_weight (+50).
-
CDLDateSerialComparator
-
If the year does not exist for either of the records, return a value of 0. Otherwise, continue with the next step.
-
If the year is the same for both records, return a value of match (+225). Otherwise, continue with the next step.
-
If the difference between the year values from both records is at most 1, return a value of within1 (50). Otherwise, continue with the next step.
-
If the difference between the year values from both records is at most 2, return a value of within2 (25). Otherwise, continue with the next step.
-
If the year values from both records are from the same decade and either of the year values ends with a 0, return a value of last_digit_zero (+20). Otherwise, return a value of mismatch (-150).
CDLPageHandlerComparator
-
If one of the values is null, return a value of 0. Otherwise, continue to the next step.
-
If both values match, perform the following checks. Otherwise, continue to the next step.
-
If both values are greater than 10, return a value of matchgt (+100).
-
If both values are less than 10, return a value of matchlt (+100).
-
-
If the difference between the two numbers is less than 10, perform the following checks. Otherwise, return a value of mismatch (-225).
-
If both values are greater than 10, return a value of withingt (+50).
-
If both values are less than 10, return a value of withinlt (+20).
-