If you are working with Primo VE and not Primo, see Understanding the Dedup and FRBR Processes (Primo VE).
In the serials and non-serials dedup algorithms, the system attempts to match records by comparing the fields in the dedup vector. During the deduplication process, the program adds or subtracts points per field, and matches the records if they pass a required threshold.
The points and the thresholds are defined in the following XML files, which are stored under the ng/primo/home/profile/publish/publish/production/conf directory:
CDLMatchingProfile.xml—used for the non-serials algorithm
CDLSeMatchingProfile.xml—used for the serials algorithm
The CDLArticlesMatchingProfile.xml file is used for the articles dedup algorithm, but it cannot be customized since the algorithm is much simpler and uses only two keys to match.
Refer to Files Used by the Dedup Algorithm to view each of these files.
For additional information, refer to the following sections: