Skip to main content
ExLibris

Knowledge Assistant

BETA
 
  • Subscribe by RSS
  • Back
    Content
    Ex Libris Knowledge Center
    1. Search site
      Go back to previous article
      1. Sign in
        • Sign in
        • Forgot password
    1. Home
    2. Content
    3. Knowledge Articles
    4. Alma
    5. Knowledge Articles
    6. AI Bibliographic Records Enrichment

    AI Bibliographic Records Enrichment

    1. Last updated
    2. Save as PDF
    3. Share
      1. Share
      2. Tweet
      3. Share
    1. AI Bibliographic Records Enrichment Webinar 

    Within the Alma Community Zone, bibliographic records undergo enrichment from various sources, predominantly content providers' metadata. However, since certain book bibliographic records lack comprehensive MARC feeds, Ex Libris is exploring alternative methods for enrichment. Artificial intelligence presents an opportunity to enrich a greater number of records at scale. 

    How does it work? 

    The AI-driven enrichment process utilizes the full text of a book, or existing informative metadata on the record, to create additional metadata about the publication and populate MARC fields in the descriptive Bibliographic record. 

    Since February 2024, Metadata generated by AI is added to the Alma Community Zone records, focusing on enriching metadata for the following specific MARC fields: Summary (520), Table of Content (505) and LC subject headings (650). The primary focus has been on enriching Ebook Central books, which were lacking these specific metadata elements.

    Starting November 2025, EBC bibliographic records are enriched by AI and provided to the CZ with this additional metadata. The EBC records have additional fields as Classification (050/082) and Personal names Subjects (600), and they will also be added to the CZ records. more info on the EBC process can be found here.

     

    What will the records look like? 

    MARC records enriched by AI will be explicitly marked as follows (or variations of this text):  

    588$a Part of the metadata in this record was created by AI, based on the text of the resource. 

    The enhanced fields will contain a new subfield, $7, indicating that the text in the field was generated by AI (e.g. the 520 field in the example below).  

    AI enriched records by Ex Libris will have an additional 035 field with the prefix (Exl-AI) and the MMS ID or the EBC record number. 035 field is searchable by "other system number", or by simple keywords search.

    For example, MMS ID 996190000000011458:  

    clipboard_e227c4af7cd91a7cd6055017cf88075a9.png

    clipboard_e63c52ca16427040c01ac3398eacc53e8.png


    Title search results display an AI indication icon (AI Indication icon) for Community Zone records enriched by the Community Zone AI generator. 

    AI Indication Icon Displayed in Search Results

    The AI enriched records are also searchable by using the search index (in title search): Enriched with AI metadata by CZ

    Title Search Index - Enriched with AI Metadata By CZ

    More on the process for these specific fields: 

    • The Summary is produced from the book's text, aiming to provide a concise and detailed summary of the book’s main subjects. The Summary will always be in English.
    • Table of Content is generated from the PDF structure. The 505 field will have one subfield $a with all the chapters/section names, delimited with two dashes, and will have up to 30 entries. It may happen that the generated table of content will not have meaningful chapter names, but rather list of sections and books parts.
    • The Subject Headings are generated from the book’s text, and are formalized to align with the LCSH vocabulary. The Subjects are in English.The new subjects may be partial and describe only part of the book's possible subjects. While in some cases the subject heading will contain subdivisions (subfields $v, $x, $y, $z), in most cases it will only have the topical term (subfield $a). Currently, Entries from LCNAMES are not added as 600/610. Efforts are underway to enhance the subject functionality. 
      The Subjects can be generated also from existing metadata exists on the record: The full title, the description and the table of content. This process is used only when the existing metadata is informative enough.

     

    Metadata Quality

    As we are generating these four fields in this phase, records may still lack additional metadata elements such as additional subjects, authors, publishing information, classification and more.

    AI has its limitations, and especially when working at scale some errors are expected.
    The tool was tested on many records and is tuned to create relevant metadata. We continue to develop and use new AI models to make sure we minimize these errors as much as possible.
    We will remove any AI generated metadata which is reported as incorrect. Metadata which is not specific enough (for example, subjects with no subdivisions) will not be removed as long as it describes the title.
    The AI Bib enrichment project does not replace the existing enrichment process we have with providers, and other metadata sources like Library of Congress. In addition, we will not override existing rich metadata which came from other sources with the AI generated metadata. 

    If content providers will add these fields in their metadata feed, it will override the AI generated metadata. If all AI generated metadata will be overridden or removed, the relevant indicators will be removed as well.

    For any questions regarding this project, please write to:  AI.enriched@clarivate.com

    For reporting metadata corrections please open a support case.

     

    AI Bibliographic Records Enrichment Webinar 

    Following the webinar given in March 27th 2024, we received many questions which we will answer in the following section.

    Use of Full text: 

    The AI metadata generator for the Alma CZ generates the data from the book’s full text. We are working with publishers to cooperate on this project, as we believe it presents an opportunity for them too. We started this project with ProQuest’s Ebook Central books and plan to expand it to more providers and more collections in the Alma Community Zone.  

    Types of books used in this project: 

    There are a variety of books in Alma Community Zone in many genres. The AI tool currently works better with non-fiction books. Thus, for this phase, we will prioritize this genre. In the future, we will adjust our priorities for using the AI tool based on availability of full text and on customer usage. Based on our experience, we see that the prompt itself and the post-processing need to be adjusted for optimized results for different types of books. For example, while reading the first 30 pages of a scholarly book can provide a good summary of its content thanks to the preface and table of contents, in fiction books, the first 30 pages would generate a partial and even misleading summary. 

    Metadata generated: 

    The 3 fields we have chosen for now to focus in this release are: 

    • Language (041) 
    • Description (520) 
    • LC Subject Headings (650) 

    We are now focusing on improving the quality of these metadata fields and planning to enhance the tool and generate additional metadata fields. The fields we will be focusing on in the future are Classification (050, 082), Table of Contents (505), additional Authors, and identifiers. Some of the metadata elements can be found in the books themselves, and the challenge is to locate and extract it. In discussion with the working groups, we decided to begin with the 520 (summary) and not 505 (table of contents) since the 520 is generated by AI, and this is the tool we are focusing on at this stage of the project, while the 505 (table of contents) can be extracted using non-AI tools 

    LCSH: As explained in the webinar, we have decided to start with LCSH as they are cardinal for many libraries and library patrons. The first batch of AI-generated LCSH subjects includes primarily 650$a main headings, and only a small number of the AI-generated subjects have subdivisions. The challenge of specifying the subjects with the correct subdivisions is indeed an interesting one, and there’s still more work that lies ahead until we can add subdivisions for most of the subjects. in this current stage, general LCSH terms in 650$a are improving the record quality and contributing to its discoverability. Currently, we are focusing on 650, 651 and 655, and in the future, we hope to add 600/610 (from LC Names) and also consider working with additional vocabularies, such as FAST. 

    Summary: The summary's quality and tone are a result of what we ask in the AI prompt. Currently, we have achieved informative and accurate summaries for non-fiction books. As we expand to other genres, we will adjust the prompt and post processing for different types of books.  

    The tool: 

    The AI engine we use mostly is GPT4o (by OpenAI), but we are also able to test other LLMs, and we will consider working with them or another combination in the future, based on quality and cost. 
    Part of the solution is to process the results generated by the AI tool and improve them. Sometimes we use AI again to improve and verify the results, and sometimes the results are adjusted and processed by  non-AI tools for example confirming the SH are valid LCSH.  

    Text formats:  

    To date we have tested PDF, EPUB and TXT formats. Overall, we see good results in how the AI can process the text. We are working internally on optimization to ensure cost and effort efficiencies where possible.  
     

    What the enriched records look like in Alma: 

    The records are clearly marked as enriched by AI (field 588), and in the specific fields (with subfield 7 – see screenshot). We will not detail which AI engine exactly was used to generate the data, as we work with several LLMs and tools to achieve the best quality. Also, we are not planning to indicate if the metadata element was reviewed by humans. Human review is an important part of our early work on this project, but  automated quality assurance (QA) processes which will guarantee sufficient quality and will be optimized to the number of records we hope to process.[DH7] [TG8] [9]  As the tool develops with better quality and more fields, we may revisit some of the titles we have already enriched if necessary. 

    Ex Libris is taking steps to ensure the AI-generated metadata will interact well with other sources of record metadata in the CZ. In the Alma CZ we have one Bibliographic record for each resource, and when its MARC is sufficiently high quality, we will choose not to update it with additional changes. We can block or allow changes based on a set of rules to achieve this. We have configured these rules to allow keeping authorized fields (metadata created by content providers or other cataloging sources such as Library of Congress) rather than overriding them with AI generated fields, as well as replacing AI generated fields with future better metadata provided by authorized sources. 

     

    Copyright: 
    Copyright is an important issue that needs to be considered in any use of LLM. We are working under the guidance of legal advisors, and we take action to ensure we do not risk using copyrighted material for any unauthorized purpose.  

     

    Working with the community:  

    As we work to improve the quality of existing fields and add more fields to be generated by AI, the results are reviewed by an internal group of librarians from different teams, as well as by community members. We are happy to receive your feedback on the records we have already released (and answer additional questions) via the email ai.enriched@clarivate.com. 

     

    Additional uses suggested during the webinar:  

    We had several questions and interesting ideas about additional features and uses that AI and specifically this metadata generator can help with. For example, enriching metadata in CDI, enriching local bib data via a tool in the Alma MD Editor, improving the way search is done in Alma and in other discovery products, and more. There are indeed many future possibilities for implementing AI in our products, and Clarivate as a company is focused very much on responsibly researching and developing in this area.

     

    Finding these records in the Community Zone

    Search for "All titles" -> "Other System Number" -> Exl-AI.

    Make sure you search in the community zone!

    clipboard_eaf79e5f0f86096f6c40777c4737059d8.png

    New for May! New Title search results display an AI indication icon (AI Indication icon) for Community Zone records enriched by the Community Zone AI generator. 

    AI Indication Icon Displayed in Search Results

    View article in the Exlibris Knowledge Center
    1. Back to top
      • Adding a new Alma institution as an External resource.
      • Alexander Street Content Reload and Mapping
    • Was this article helpful?

    Recommended articles

    1. Article type
      Topic
      Community Content Type
      Enhancements
      Content Type
      Knowledge Article
      Language
      English
      Product
      Content
    2. Tags
      1. AI
      2. artificial intelligence
      3. Bibliographic Records
      4. chatgpt
      5. ebook central
      6. Enrichment
    1. © Copyright 2026 Ex Libris Knowledge Center
    2. Powered by CXone Expert ®
    • Term of Use
    • Privacy Policy
    • Contact Us
    2025 Ex Libris. All rights reserved