Skip to main content
ExLibris

Knowledge Assistant

BETA
 
  • Subscribe by RSS
  • Back
    Rosetta

     

    Ex Libris Knowledge Center
    1. Search site
      Go back to previous article
      1. Sign in
        • Sign in
        • Forgot password
    1. Home
    2. Rosetta
    3. Community Knowledge
    4. Create an ALTO xml file

    Create an ALTO xml file

    1. Last updated
    2. Save as PDF
    3. Share
      1. Share
      2. Tweet
      3. Share
    No headers
    Created By: Dave Allen
    Created on: 17/05/2019



    ALTO (Analyzed Layout and Text Object) is an open XML Schema developed by the EU-funded project called METAe. The standard was initially developed for the description of text OCR and layout information of pages for digitized material. The goal was to describe the layout and

    text in a form to be able to reconstruct the original appearance based on the digitized information - similar to the approach of a lossless image saving operation.

    there are many options out there to create an ALTO file -- however the open source rpoute is to use tesseract 

    you will ned to install tesseact V4 and a language pack

    the command to crete a ALTO file is as follows

    tesseract /permanent_storage/archive/images/slq/pub/2019-05-14/archive/690444-v012n006/690444-v012n006-s0002.tif       output-filename      -l eng      alto

    Hope you find this article of value

     

     

     

     

     




    Report
    View article in the Exlibris Knowledge Center
    1. Back to top
      • add URL links in 'More Information pane in the IIIF UV
      • CSV ingest file - Allow Navigation on Collections
    • Was this article helpful?

    Recommended articles

    1. Article type
      Topic
      Community Content Type
      How To
      Language
      English
      Product
      Rosetta
    2. Tags
      This page has no tags.
    1. © Copyright 2025 Ex Libris Knowledge Center
    2. Powered by CXone Expert ®
    • Term of Use
    • Privacy Policy
    • Contact Us
    2025 Ex Libris. All rights reserved