Go VE: Full Text Indexing in Primo VE
Primo VE allows you to index full text externally held in a file of type PDF, TXT, or HTML for discovery. To use this functionality, you must store the link to the externally-held full text in a local search field. During the indexing of a local field, Primo VE will perform the following actions on each of the full-text records:
-
Remove stop words based on language.
-
Remove HTML tags.
-
Index up to 10,000 terms.
In addition, the system automatically indexes full-text files of digital items (Alma-D).
In general, external full-text indexing occurs when a URL in the metadata leads to a full-text target, which may be a file of type PDF, TXT, or HTML. If the URL leads to a PDF file containing searchable text within the file, Primo will index that full-text target for searching. However, if that PDF file includes an image of text, such as a scanned image of a newspaper article, Primo will not index that full-text target because the text is not searchable within the PDF file. This is not true for Alma-D files with text within images because OCR is used first to convert the text, allowing it to be indexed and searched in Primo VE.
Background
Main Principles and Differences between Primo and Primo VE
Full text indexing is used the same way in both Primo and Primo VE, but here are their configuration differences:
-
Primo:
Full-text indexing is handled with file splitters in the pipe. For more details on how it is done in Primo, see the File Splitters page in the Ex Libris Developer Network.
-
Primo VE:
-
Full-text indexing is configured by mapping the full text's URL in the source to a local field.
-
For digital items (Alma-D) that contain full-text files, no additional configuration is needed in Primo VE to index the full-text, but it may be necessary to run the Extract Fulltext job if the digital files are images or if the PDFs cannot be indexed for some reason (such as poor quality). See Extracting Full Text (OCR) in the Alma Digital Repository for more information.
-
Documentation and Training Videos
In preparation for this task, it is recommended that you familiarize yourself with the following documentation and training:
Indexing Full Text (2 min)
Preparation: Check for Current Usage of Full Text Indexing
Before configuring Primo VE, check to see what full text is indexed in Alma and external sources. In addition, you can check to see if a file splitter exists or if your data contains a link to the full text.
Configuration: Creating a Local Field for Full Text Indexing
The Define a Local Field page allows you to create local fields to enhance discovery via additional display fields, facets, and search indexes. For this specific case, Primo maps the full-text link in a source record to a local field, where the full-text file is indexed for search.
You cannot configure multiple local fields to handle full-text indexing.
-
Open the Manage Display and Local Fields page (Configuration Menu > Discovery > Display Configuration > Manage Display and Local Fields).
-
Select Add field > Add local field to open the Define a Local Field page.
-
Specify the following fields:
-
Field to edit – Specify which local field will be used for full-text indexing.
-
Display label – Specify the display label for this local field.
-
Enable field for search – Clear this option.
-
Use full text links for indexing – Select this check box to instruct the system to fetch and index externally held full text. This option can be assigned only to a single local field and cannot be selected if used for another local field.
Only the last link will be indexed if multiple full-text links are defined for a record.
-
-
Define from which field the full-text link should be taken for indexing. If the full-text link has been mapped to the same local field in your externally loaded DC records, select the Use the parallel Local Field 01/50 links from the Dublin Core record option. Otherwise, use the MARC Bibliographic Field mapping method and select the MARC field containing the full text link. For more information, see Adding a Local Field in Primo VE.
The link must not have a reCAPTCHA mechanism or any other script that blocks direct access to the full-text file to read its text.
In the following example, the link to the full text is mapped from the MARC 979 field to the local field using the MARC21 Bibliographic Field mapping method.
Full-Text Indexing - MARC21 Bibliographic