Skip to main content
ExLibris
  • Subscribe by RSS
  • Ex Libris Knowledge Center

    SIP Processing, Configuration, and Routing Rules

    Understanding SIP Processing

    SIP processing settings define how a submission information package (SIP) is moved between processing stages on the Staging Server, and how this SIP is processed at each stage.
    The SIP processing workflow begins when a SIP is moved from the Deposit Server to the Staging Server, and ends when a SIP is moved from the Staging Server to the Permanent Repository. At each stage, the Rosetta system performs a series of tasks, as described in the table below.
    SIP Processing Stages
    Stage Description
    Pre-Approval After files are uploaded to the Staging Server, the Rosetta system runs a series of tasks, known as a validation stack, to identify technical problems, such as viruses or corrupted files, and extract technical metadata. For more information, see Pre-Approval Stage.
    Approval The SIP is reviewed by Assessors, Arrangers, and Approvers. These staff users decide whether the SIP should be approved, returned to the Producer Agent, or declined. For more information, see Approval Stage For general information on the review of SIPs by staff users, see Assessors, Arrangers, and Approvers in the Rosetta Staff User's Guide.
    Enrichment The Rosetta system prepares the SIP for storage in the Permanent Repository. For more information, see Enrichment Stage.
    Move to Permanent The Rosetta system moves the SIP to the Permanent Repository. For more information, see Move to Permanent Stage.
    A Deposit Manager defines the following settings to configure SIP processing:
    • SIP processing configuration, which specifies how the SIP is processed at each stage. A Deposit Manager can select a task chain for each stage from the predefined list.
      A Deposit Manager can create multiple SIP processing configurations for different SIPs.
      For more information about SIP processing configuration, see Defining SIP Processing Configuration.
    • SIP routing rules, which specify criteria for choosing the SIP processing configuration that must be applied to the specific SIP.
      When creating SIP routing rules, a Deposit Manager defines input parameters (such as material type and Producer) and corresponding output parameters (such as SIP processing configuration that must be applied to the SIP and approval group that reviews the SIP).
      For more information about SIP routing rule configuration, see Configuring SIP Routing Rules.

    Pre-Approval Stage

    At the pre-approval stage, the Rosetta system performs a task chain known as a validation stack. The validation stack tasks verify that the files uploaded to the Staging Server do not have any technical problems such as viruses or corruption.
    The validation stack task chain can contain the following tasks:
    A Deposit Manager can configure multiple validation stack task chains for different SIP configurations. For example, one SIP configuration can use a validation stack that performs all the tasks, while another SIP configuration can use a validation stack that does not perform a virus check.
    The Rosetta system moves only those files that successfully pass the validation stack checks to the next processing stage.
    Otherwise, the Rosetta system marks the failed files as problematic and forwards them to a Technical Analyst. (For more information on Technical Analysts, see Technical Analysts in the Staff User's Guide.)

    Fixity Check

    The fixity check task verifies that the files uploaded to the Staging Server are not corrupted. This task generates a checksum, which is stored in the file metadata. When the file is moved to the Staging Server, the Rosetta system compares the actual checksum with the original checksum using hash algorithms, such as CRC32 or MD5.
    A fixity check can be run without providing an algorithm as a parameter. In this case Rosetta simply verifies the file with the expected name exists in storage without accessing the file to determine its integrity.
    Checksums may be provided in the deposited METS, csv, and Bagit. In such cases, Rosetta validates the checksum values. If the checksum is the same type that Rosetta runs and the value is found to be valid, Rosetta overwrites the dnx section with information from the internal outcome. If validation fails, the SIP is routed to the TA work area with an appropriate error message.
    An example of such a checksum value in the deposited METS is as follows:

    <mets:techMD ID="fid1-1-amd-tech">

       <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="dnx">

           <mets:xmlData>

                <dnx xmlns="http://www.exlibrisgroup.com/dps/dnx" xmlns:out="http://www.loc.gov/METS/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xlin="http://www.w3.org/1999/xlink">

                   <section id="fileFixity">

                       <record>

                           <key id="agent">REG_SA_JAVA5_FIXITY</key>

                           <key id="fixityType">MD5</key>

                            <key id="fixityValue">4179a95cc8635d84690a87e782c3ace4</key>

                       </record>

                   </section>

               </dnx>

           </mets:xmlData>

       </mets:mdWrap>

    </mets:techMD>

    You can configure custom fixity checks. First install the plugin for the fixity. Then add the plugin to the Fixity Type code table (see Appendix A Code Tables in the Rosetta Configuration Guide.) The custom fixity is then available when adding parameters to a task and is indexed and searchable.

    Virus Check

    The virus check task verifies that the submitted files do not contain any viruses. This task can run any deployed virus check plugin. For more information, see the Virus Check section of the Rosetta Configuration Guide.

    Format Check

    The format check task automatically identifies the format of the file by analyzing its content. If the extension of the file does not correspond to the format that the task identified, the Rosetta system generates an error.
    To perform the format check, the task uses the format identification task. Multiple format identification tools can run in this stage; however, only the first PRONOM-based tool has operational impact on the system. Output from any additional tool is only saved and indexed for searching.

    Technical Metadata Extraction

    The technical metadata generation task produces technical metadata about a file (such as file size and creation date). The Rosetta system generates this metadata based on the metadata embedded into the file, as well as on the information that the system identifies automatically.
    To generate technical metadata, the task uses a utility such as JHOVE or the NLNZ MD Extractor. Each of these utilities generates technical metadata for different formats. Deposit Managers associate the metadata generation utilities with formats using the Format to MD Extraction mapping table.

    Risk Extraction

    As part of the validation stack phase in SIP processing, Rosetta determines whether the extracted technical metadata has a risk associated with its format. If it does, the system runs the risk extractor tool and saves the output in the HDeStreamRef table. The extracted technical metadata is stored in a way that allows the risk analysis job to gather the information and summarize it in risk reports.
    After a validation stack is performed, the system modifies information about the SIP in the DNX section of the METS file.

    Approval Stage

    After a SIP successfully passed the validation stack checks (see Pre-Approval Stage), the Rosetta system forwards this SIP to one of the following routes:
    • An Assessor and Arranger
    • An Approver. The amount of content to be reviewed by an Approver is determined by the sampling rate parameter, which is defined at the material flow level. Staff users can mandate one of the following:
      • 100% - All content must be reviewed by an Approver
      • Less than 100% - The specified amount of content must be reviewed by an Approver. The rest of the content is moved to the next processing stage without an Approver's review.

    Enrichment Stage

    At the post-approval stage, the Rosetta system prepares the SIPs for storage in the Permanent Repository. For example, the Rosetta system can generate derivative copies and thumbnails for intellectual entities (IEs), as well as synchronize metadata stored in a collection management system with the IE metadata.
    If a SIP fails to pass the post-approval stage, the Rosetta system forwards this SIP to a Technical Analyst. (For more information on Technical Analysts, see Technical Analysts in the Rosetta Staff User's Guide.)

    Move to Permanent Stage

    After the Rosetta system performs the enrichment, the SIP is moved to the Permanent Repository. The Permanent Repository in intended to store Producer Agent content that was approved by staff users for permanent preservation. As a result, SIPs that are stored in the Permanent Repository cannot be updated, deleted, or rearranged.
    For general information on storing the SIPs in the Permanent Repository, see Storage Components in the Rosetta Overview Guide.

    SIP Routing Rules

    Deposit Managers can determine how SIP submission errors are handled by the system. An error in a SIP submission may cause the SIP to be rejected or it may be routed to a Technical Analyst for further evaluation.
    To access the list of SIP routing rules, follow the path from the Rosetta rollover menu: Submissions > Advanced Tools > SIP Routing Rules.
    sip_routing_rules.png
    SIP Routing Rules
    The default rule for error handling appears in the list along with any other SIP error handling rules your institution has added. You can work with rules in one of the following ways, all of which take you to the Rule Details page (SIP Routing Rules).
    • To add a new rule, click the Add Rule button.
    • To edit an existing rule, click the Update text of the rule's row.
    • To create a rule based closely on an existing rule, click the Duplicate text of the rule's row.
    You can also delete a rule by clicking the Delete text of the rule's row.
    SIP_routing_rules_details.png
    SIP Routing Rules - Rule Details
    On the Rule Details page, you can add or edit the Name or Description fields in the Rule Editor section.
    For the Input General Parameters section, create or edit the conditions for the input that will cause the rule to take effect. Include the values that define the parameters.
    Refer to the Operators Used in Rule Parameters section for detailed information on commonly used operators.

    Operators Used in Rule Parameters

    The following operators are used for specific types of parameter data.

    String Values

    String values are words that are not separated by a comma (,), for example, one Producer name (John Smith), one MIME type (audio/mp3), one error code, one Format ID). String values use the following operators:
    • Equal - The string and the input value must match exactly.
    • Contains - The string and the input value must match partially with the '*' character.

    List of Strings

    A list of strings is a list of string values separated by a comma (,) sometimes populated by a widget. Lists of strings use the following operators:
    • List Contains – used when each error returned should match exactly a single given error in the rule.
    • List Equals ‐ Used when the order of the items in the list and the list itself should match exactly. For example, a rule defined as “Invalid page dictionary object, Invalid object number in cross‐reference stream” will match to the actual output from JHOVE – “Invalid page dictionary object, Invalid object number in cross‐reference stream.”

    Numeric Fields

    Numeric fields (for example, file size) use numbers as matching and comparison values.
    • Greater Than (>) - The input value should be greater than the parameter value.
    • Less Than (<) - The input value should be less than the parameter value.
    • Equal (=) - The input value should be equal to the parameter value.
    • Not Equal (! =) - The input value should be not equal to the parameter value.

    Date Fields

    Date fields (such as Creation Date) compare date values with time operators.
    • After - The input date should be later than the parameter date value.
    • Before - The input date should be earlier than the parameter date value.
    • Equal (=) - The input date should be the same as the parameter date value.
    • Not Equal (! =) - The input date should not be the same as the date parameter value.

    Any

    All fields can use this operator for indicating that any input value will be accepted by the rule. For example, if the ג€˜Anyג€™ operator is used in the Producer Name field, the rule can match all Producers.
    The following table summarizes the possibilities for matching between the rule parameter values and the run-time values:
    Possible Matches Between Rule Parameter and Run-Time Values
    Run-time Value Operator Possible Rule Values Result
    Demo Producer Equal Demo Producer Match
    Demo Producer Contains Demo* Match
    image/tiff or image/bmp In List Image/tiff, image/bmp Match
    image/tiff, image/bmp List Equals Image/tiff, image/bmp Match
    grey or gray In List with Regular Expression gr[ea]y Match
    12345 <, >, =, != 10000 < - No match > - Match = - No match != - Match
    23/11/2011 Before, After, =, != 23/11/2011 Before - No match After - No match = - Match != - No match
    To define Boolean logic when using multiple conditions, select one of the following options between conditions:
    • OR
    • AND (default)
    The Boolean connector between different types of attributes (for example, IE Attributes and File Attributes) is always AND.

    Defining SIP Processing Configuration

    The SIP processing configuration determines how the SIP is processed at each stage. Deposit Managers define this on the SIP Processing Configuration List page.
    To access this page, follow the path from the Rosetta rollover menu: Submissions > Advanced Tools > SIP Processing Configuration.
    SIP_processing_configuration.png
    SIP Processing Configuration List Page
    The following actions can be performed on the SIP Processing Configuration List page:

    Adding a SIP Processing Configuration

    Deposit Managers can add a new SIP processing configuration.
    To add a SIP processing configuration:
    1. On the List of SIP Processing Configuration page (see Defining SIP Processing Configuration), click Add Processing Configuration. The SIP Processing Configuration page opens.
    add_processing_configuration.png
    Add SIP Processing Configuration Page
    1. Complete the fields as described in the following table:
      SIP Processing Configuration Page Fields
      Field Description
      Name The name of the SIP processing configuration.
      Description The description of the SIP processing configuration.
      Priority Select a priority:
      • High - Process ASAP
      • Normal - SIPs are queued up to 1 hour
      • Low - SIPs are queued up to 6 hours
      Validation Stack Routine The list of available routines that can be executed when the SIP enters the validation stage.
      Approval The list of available options for the human stage of the SIP processing (for example, reviewing a SIP by an Approver). The following options are available:
      • Assessor + Arranger
      • Approver
      Allow Split / Merge Allow the ability to split and merge IEs. For more information, see Organizing IEs.
      Enrichment Routine The list of available routines that can be executed when the SIP enters the enrichment stage.
      Delete SIP when processing complete Deletes SIPs immediately after processing is completed
      Send Notification Whether you want notification of the results emailed to the staff user
      All fields with an asterisk (*) are mandatory.
    2. Click Save.
    The new SIP processing configuration is saved in the Rosetta system.

    Updating a SIP Processing Configuration

    Deposit Managers can update an existing SIP processing configuration. The parameters that can be changed include the validation stack routine, the human approval process, and the enrichment routine.
    To update a SIP processing configuration:
    1. On the List of SIP Processing Configuration page (see Defining SIP Processing Configuration), locate the SIP processing configuration with which you want to work and click Update. The SIP Processing Configuration page opens.
    2. Modify the fields as described in the table above.
    3. Click Save.
    The updated SIP processing configuration is saved in the Rosetta system.

    Duplicating a SIP Processing Configuration

    Deposit Managers can duplicate an existing SIP processing configuration. This is especially helpful when creating a new SIP processing configuration. It is often faster to duplicate an existing SIP processing configuration and then modify it, than to create a new configuration.
    To duplicate a SIP processing configuration:
    On the List of SIP Processing Configuration page (see Defining SIP Processing Configuration), locate the SIP processing configuration that you want to duplicate and click Duplicate.
    An exact copy of the SIP processing configuration is added to the List of SIP Processing Configuration page. The Rosetta system automatically labels the new SIP processing configuration with the name Copy of followed by the name of the original SIP processing configuration.

    Deleting a SIP Processing Configuration

    Deposit Managers can delete an existing SIP processing configuration.
    Any SIPs in progress that are associated with a deleted SIP processing configuration will complete their processing according to the deleted configuration's instructions.
    To delete a SIP processing configuration:
    1. On the List of SIP Processing Configuration page (see Defining SIP Processing Configuration), locate the SIP processing configuration that you want to delete and click More. Additional options are displayed.
    2. Click Delete. The Delete Confirmation page opens.
    3. Click OK.
    The system removes the SIP processing configuration from the List of SIP Processing Configuration page.

    Activating and Deactivating a SIP Processing Configuration

    Deposit Managers can activate or deactivate an existing SIP processing configuration.
    On the List of SIP Processing Configuration page, the current status is indicated by the check mark in the Active column:
    • Yellow - The SIP processing configuration is active.
    • Grey - The SIP processing configuration is inactive.
      Any SIPs in progress that are associated with a deactivated SIP processing configuration do not get promoted to the next processing stage.
    To activate or deactivate a SIP processing configuration:
    1. On the List of SIP Processing Configuration page (see Defining SIP Processing Configuration), locate the SIP processing configuration that you want to activate or deactivate.
    2. In the Active column, click the check mark. The check mark in the Active column indicates the new status.
    The SIP processing configuration is changed from active to inactive or inactive to active.

    Configuring SIP Routing Rules

    SIP routing rules define the SIP processing configuration that must be applied to the specific SIP.
    Deposit Managers can define SIP routing rules using the List of SIP Routing Rules page. To access this page, follow the path from the Rosetta rollover menu: Submissions > Advanced Tools > SIP Routing Rules.
    sip_routing_rules00001.png
    SIP Routing Rules List Page
    The following actions can be performed on the SIP Routing Rules List page:

    Adding a SIP Routing Rule

    Deposit Managers can add a new SIP routing rule. When adding a new SIP routing rule, Deposit Managers provide information in two panes:
    • In the Input Parameters pane, matching criteria parameters are defined.
    • In the Output Parameters pane, result parameters are defined.
    The Rosetta system determines the input and output parameters. Deposit Managers cannot add or delete these parameters.
    The logical relationship between the input parameters or the output parameters is AND. Deposit Managers cannot change the logical relationship between the parameters.
    To add a SIP routing rule:
    1. On the List of SIP Routing Rules page (see Configuring SIP Routing Rules), click Add Routing Rule. The Routing Rule Editor page opens.
    SIP_routing_rules_details00001.png
    Routing Rule Editor Page
    1. In the Input General Parameters pane, complete the following fields:
      Input General Parameters Pane Fields
      Column Description
      Operator The list of available operators.
      The operator describes the logical relationship between the parameter and the value. The values in the drop-down list vary according to the type of parameter. For more details, see Operators Used in Rule Parameters.
      Value The value for the parameter. For example, the department output parameter must contain "Academic."
      The value is either a text input box or a drop‐down list, depending on the type of parameter.
    2. In the Output Parameters pane, complete the following fields:
      Output Parameters Pane Fields
      Column Description
      Approver Group The approver group to which the SIP must be forwarded.
      Department The department to which the SIP is connected.
      Process Configuration ID The process configuration ID that is applied to the SIP. For more information about SIP processing configuration, see Defining SIP Processing Configuration.
    3. Click Save.
    The new SIP routing rule is saved in the Rosetta system.

    Updating a SIP Routing Rule

    Deposit Managers can update both the input and output parameter information of an existing SIP routing rule.
    To update a SIP routing rule:
    1. On the List of SIP Routing Rules page (see Configuring SIP Routing Rules), locate the SIP routing rule with which you want to work and click Update. The Routing Rule Editor page is displayed.
    2. In the Input General Parameters pane, modify the fields that you want to update.
    3. In the Output Parameters pane, modify the fields that you want to update.
    4. Click Save.
    The updated SIP routing rule information is saved in the Rosetta system.

    Duplicating a SIP Routing Rule

    Deposit Managers can duplicate an existing SIP routing rule. This is especially helpful when creating a new SIP routing rule. It is often faster to duplicate an existing SIP routing rule and then modify it than to create a new SIP routing rule.
    To duplicate a SIP routing rule:
    On the List of SIP Routing Rules page (see Configuring SIP Routing Rules), locate the SIP routing rule you want to duplicate and click Duplicate.
    An exact copy of the SIP routing rule is added to the List of SIP Routing Rules page. The Rosetta system automatically labels the new SIP routing rule with the name Copy of followed by the name of the original SIP routing rule.

    Deleting a SIP Routing Rule

    Deposit Managers can delete an existing SIP routing rule.
    To delete a SIP routing rule:
    1. On the List of SIP Routing Rules page (see Configuring SIP Routing Rules), locate the SIP routing rule you want to delete and click Delete. The confirmation window is displayed.
    2. Click OK.
    The SIP routing rule is deleted from the Rosetta system.

    Activating and Deactivating a SIP Routing Rule

    Deposit Managers can activate or deactivate an existing SIP routing rule. After a routing rule is deactivated, it is no longer used by the Rosetta system.
    On the List of SIP Routing Rules page, the current status is indicated by the check mark in the Active column:
    • Yellow - The SIP routing rule is active.
    • Grey - The SIP routing rule is inactive.
    To activate or deactivate a SIP routing rule:
    1. On the List of SIP Routing Rules page (see Configuring SIP Routing Rules), locate the SIP routing rule that you want to activate or deactivate.
    2. In the Active column, click the check mark. The check mark in the Active column indicates the new status.
    The SIP routing rule is changed from active to inactive, or inactive to active, depending on the previous state of the rule.
    • Was this article helpful?