Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    WARC

    This information is not applicable to Primo VE environments. For more details on Primo VE configuration, see Primo VE.
    The WARC template is based on the standard metadata elements that are parsed to the XML file created by the WARC file splitter. The following format is usually used as the path: metadata/<tag>.

    Control

    Control Section
    Control field Source/Content Additional Normalization Rules
    Source ID
    From configuration file.
    Required field.
    Record ID
    Source ID + Source Record-ID
    Required field.
    Source system
    From configuration file.
     

    Display

    Display Section
    Display Field Source/Content Additional Normalization Rules
    Type
    Constant – “website”
     
    title
    metadata/title
     
    OR if not present the URI is taken:
       
    warc-target-uri
       
    creator
    metadata/author
     
    contributor
    metadata/producer
     
    creation date
    metadata/created
     
    format
    metadata/content-type and metadata/resource-type
    The two fields are merged
    subject
    metadata/keywords
     
    description
    metadata/description
     
    language
    metadata/language
     
    rights
    metadata/rights
     

    Links

    Links Section
    Link Source/Content Additional Normalization Rules
    link to resource
    warc_record/warc-target-uri
     

    Search

    Search Section
    Search Field Source/Content Additional Normalization Rules
    creatorcontrib
    metadata/author
     
    metadata/producer
       
    title
    metadata/title
     
    OR if not present the URI is taken:
       
    warc-target-uri
       
    description
    metadata/description
     
    subject
    metadata/keywords
     
    fulltext
    content
    This tag includes the content of the harvested web page.
    recordid
    from PNX control/recordid
     
    resource type
    from PNX display/type
     
    creation date
    metadata/created
     
    format
    metadata/content-type and metadata/resource-type
    The two fields are merged

    Sort

    Sort Section
    Sort Source/Content Additional Normalization Rules
    title
    Copied from PNX display/title
     
    author
    Copied from PNX display/creator
     

    Facets

    Facets Section
    Facet Source/Content Additional Normalization Rules
    language
    metadata/language
     
    topic
    metadata/keywords
     
    toplevel
    Constant: online_resources
     
    prefilter
    from PNX display/type
     
    resource type
    from PNX display/type
     

    Duplicate Record Detection

    No dedup vectors are predefined.

    FRBR

    No FRBR vectors are predefined.

    Delivery and Scoping

    Delivery and Scoping Section
    Delivery Field Source Additional Normalization Notes
    Delivery category
    Online Resource
    Modify as relevant

    Ranking

    No Ranking fields are predefined.

    Enrichment

    No enrichment fields are predefined.

    Additional Data

    No additional data fields are predefined.