AI Metadata Enrichment for Libraries
As the scope of cataloging, as well as the depth and complexity of cataloging standards evolve and grow, Ex Libris, part of Clarivate, is working with our community to build innovative solutions that ensure content quality, at scale. We are undertaking initiatives to harness advanced technologies to assist catalogers and libraries in their quest for better data and discoverability.
Libraries across the world, as well as content providers and aggregators, are managing vast volumes of new and constantly changing content, at a scale that is almost humanly impossible to maintain. The varied catalogs of many libraries offer additional challenges, leading catalogers to catalog subject matters in which they may not be experts and that require extensive research, on top of the challenges offered by the growing amounts of resources and resource types. At the same time, metadata quality continues to be a critical piece in collection management and development, as well as discovery and patron services, and therefore a necessity for all libraries to uphold. Various resource types such as images, sound recording and videos often require even more time and effort to catalog and expose in discovery, as they cannot be easily handled by traditional tools.
Using AI technology enables us – Ex Libris and our community members – to ensure that records contain the relevant metadata needed by the different stakeholders in the library, from assessing the collection and collaborating with partner libraries in resource sharing and shared print initiatives, to providing the best possible services for patrons.
AI Metadata Enrichment Principles
Ethics and Fair Use
We take great care to make sure that our metadata enrichment not only conforms to legal requirements, but also to ethical considerations. Our main focus areas for this are:
Respecting Copyrights
Our plans include supporting enrichment for various resource types, focused on allowing the library to use its own collections to generate the metadata.
Our enrichment processes for community records are only performed on materials for which we have the rights to do so, according to licenses and agreements with vendors.
Protecting Proprietary Information
The information shared with the AI Metadata Assistant is used only to generate the metadata required for the library’s needs. It is not used to train AI models, and is not saved by us for later use.
Furthermore, the metadata returned by the AI is filtered to prevent accidental use of proprietary data such as system numbers or local fields.
High Quality of Metadata
The measures taken to maintain metadata quality are evaluated, assessed and refined in collaboration with the Alma community.
The available enrichments highly depend on many factors, such as the type of resource, the data provided to the AI, the requested information and formatting, and whether the data validation is automated or done by a cataloging expert. Clarivate it putting a lot of effort into assessing these differences and tailoring solutions for libraries’ needs, that will ensure getting the most out of the AI capabilities.
In addition, the metadata generated by the AI Metadata Assistant is filtered and validated using automated tools, such as checking it against the authority vocabularies available in our platform, to maintain a high quality of data.
Leaving Control in the Library’s Hands
The choice of whether or not to use the metadata suggested using AI is the library’s. Whether by adding review and correction capabilities to enrichment workflows, or marking the enriched fields in community records for easy identification – we design the AI Metadata generation and enrichment processes to provide the library and its catalogers as much control over the use of this data as possible.
AI Metadata Assistant Plans
Enriching Community Records
In February 2024, an AI-based metadata generator for select bibliographic records in the Alma Community Zone was launched, to improve record quality, making them more discoverable and accessible for all who use the library management system.
Our AI metadata generator is live and enriches select ProQuest EBook Central records, by adding language, summary, and subject headings fields in alignment with the Library of Congress standards. These additions are mentioned in a Source of Description Note detailing the AI enrichment, to allow libraries to search for the enriched records. Our plans are to grow both the scope of records and the number of fields generated by the innovative AI metadata generator.
More information is available in the Community Center Knowledge Article and in the Metadata Enrichment using AI 2024 Content Webinar.
AI enriched data in community records is clearly marked for easy identification by libraries
Assisting Catalogers’ Workflows in the Metadata Editor
The AI Metadata Assistant in Alma’s Metadata Editor will help catalogers in their work by suggesting metadata they can use when cataloging a resource, saving time and effort on researching and searching information and freeing catalogers’ time to handle more complex cataloging tasks.
The cataloger will be able to easily create a new bibliographic record or enrich an existing brief record using the resource’s metadata (such as title, author, content note, etc.) and/or images (such as a book’s back cover or title page). After receiving the AI’s suggested metadata generated from the provided information, they will be able to accept, correct or discard the suggested changes.
Alma’s AI Metadata Assistant enrichment workflow embedded in the Metadata Editor
AI generated metadata is clearly marked for the cataloger to review and accept, correct or reject
Creating a new record using the AI Metadata Assistant in Alma’s Metadata Editor
The first phase of the Metadata Editor AI assistant will help with creating MARC 21 records for English books, with more formats and languages support to follow as we collaborate with our global community to ensure quality metadata generation for more varies resources.
For more information, see Alma’s roadmap plans:
Supporting Digital Collections
Digital collections often need original cataloging, a labor-intensive process. GenAI is opening new opportunities to speed up this process by allowing subject matter experts to focus on sharing knowledge rather than on repetitive tasks, thus making collections available faster.
Specto, a new digital asset management solution, will equip staff with an Al Metadata Assistant to analyze assets, break them into distinct entities, and connect related materials. Since digital collections include various materials like historical documents, images or videos, Specto will offer customized workflows for each type.
For images, for example, Specto’s AI will create descriptive metadata, extract distinct entities using Named Entity Recognition (NER) such as faces or buildings, and connect materials to others through Linked Open Data.
For text, Specto’s Metadata Assistant will read the text using OCR, extract metadata and entities, and connect them to related materials.
Specto’s AI Metadata Assistant will also support group cataloging capabilities: catalogers will be able to tag an entity once, and Specto will apply it to all other objects sharing the same entity in the collection.