Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    Alma community zone: Match logic, duplicate Bibliographic records, and wrong portfolios-Bibliographic records match

    Alma CZ consists of more than 13 million Bibliographic MARC records, describing electronic content available in various platforms.

    Alma CZ MARC repository is based on the logic of “One Bibliographic record”, aiming to have only one bibliographic record to represent each resource. Different portfolios linked to a record are used to represent the different platforms which this resource is available in.

    Title lists are sent to the CZ from various providers and content sources, so their holdings will be updated in the relevant CZ collections. Our processes match each title in the provider’s title lists to the correct record describing this resource, and a new bibliographic record it is created when needed.

     

    In this article we would like to explain:

    ·         The match logic used when ingesting providers’ titles to CZ.

    ·         Why in some cases CZ has duplicate Bibliographic records describing the same resource.

    ·         Why there are cases of portfolios linked to the wrong Bibliographic record.

    ·         The measures we take to prevent and fix these cases.

     

    What is the CZ match logic when ingesting Provider’s titles to CZ?

    Until April 2024, when a title was ingested into CZ, the match was based on matching identifiers as the first level, and title match as the second level. The main advantage of this method was that it enabled us to overcome cataloging differences and put associate portfolios from different providers with one Bibliographic record, even if they were using different identifiers or don’t have identifiers at all. The downside of this method is that it may bring together portfolios of different resources under one bibliographic record. For example, different editions of the same book, different parts of a series which all have the series name, and even resources which happen to have the same title but are not related.

    After consideration, as of April 2024 we have decided to update the match logic. In the new method, if the title has identifiers and there is no match by them, there will not be title comparison, and a new bibliographic record will be created. This means we may increase the risk of creating duplicates in Alma CZ, but minimize the risk of having wrong portfolios on the same Bib record.

     

    Duplicate Bibliographic records in CZ:

    Duplicates in CZ are in most cases a result of the match method used to match different title lists coming from the providers.

    When there is no clear match of identifiers, we prefer to create a new record than linking the provider portfolio to a record which may not describe this resource.
    Since CZ records are first created as brief records and get enriched in a later process, customers may see sometimes brief level records in enriched collections. We are mitigating this by continues work to find and merge duplicate records especially when they are brief, while enriching the CZ records which need to be kept.

    The Provider Zone, which allowed Providers to update their Holding and send MARC records directly to the CZ, also created duplicates intentionally, as this process was an exception of the one bib policy. As the Provider Zone solution is being phased out, the duplicate records created by it are merged with their CZ versions. In some cases where metadata is not sufficient to match safely, duplicate records may be kept in the CZ.

     

    How does this affect my AutoHoldings process?
    The match method used until April 2024, may have caused another scenario. In some cases, portfolio with ISBN (A) can be linked to record with ISBN (B), when the match was done by Title match. sometimes, a record with ISBN (A) exists in the CZ which was created after the match was done.

    This scenario can cause errors in the Autoholdings process: In this process we match between the customer’s specific title list coming from the provider to the CZ records of a specific collection. The match is done based on ISBN. Hence, a mismatch can happen between the ISBN on the provider title list to the ISBN on the record with the relevant portfolio, ending up with the error “No MMS FOUND” and the title was not activated. To mitigate this, we are changing the way the match is done in the Autoholdings process, from checking ISBNs on the bibliographic records, to check the provider unique ID on the portfolios. This change is gradual and done one provider at a time, to make sure we achieve highly accurate matches.  We see that this new method prevents many of the errors users experiencing while running the AutoHoldings job, and ensures full and automatic activation of CZ titles, according to the customer’s holdings.

     

    Why there are cases of portfolios linked to the wrong Bibliographic record?

    The scenario of portfolio linked to wrong bibliographic record can be a result of several reasons.

    The first, is the ingestion match method described above, which used title match in case of no identifiers match. As explained above, this may have caused portfolios of different editions of the same book, or even different resources which happen to have the same title to match to the same Bibliographic record. When the record is not enriched and has only partial metadata, this scenario is even more frequent as it is hard to determine the difference between the portfolios. In some cases, only when the record is enriched in later phase, the difference between the metadata on the record and part of its’ attached portfolios stands out.

    The second reason is manual merges done by our teams, proactively or as a response to customers’ requests. When merging 2 bibliographic records, the portfolios of the merged records are moving to the preferred record. Unfortunately, this manual process may cause errors, such as portfolios linked to the wrong records. Usually, the duplicates are identified by sharing the same ISBN/ISSN, but it sometimes records have incorrect identifiers, and this may lead to incorrectly merging two different resources.

    Given these risks, the merge routines are done carefully and not automatically. Fixing bad merges and linking portfolios to their right Bibliographic record is a complex manual task, therefore we try to minimize these actions.
     

    Summary

    Keeping CZ aligned with the one Bibliographic record policy, while representing accurately all provides holdings in a timely manner, is challenging. Given the different cataloging standards and the various levels of metadata we get from more than 5,000 providers, the challenges have many technical and quality aspects. While building the match rules to handle all the ingest tasks, our policy is to prefer having duplicate records in the CZ, rather than having portfolios linked to the wrong Bib record. We continue improving the Ingest processes, the quality and accuracy of metadata in our robust automatic processes supporting single Bibliographic record policy, and portfolio corrections. We believe these efforts are building a better Community zone for the benefit of our community.

     

     

     


    • Article last edited: 05-Jun-2024
    • Was this article helpful?