Introducing Rosetta Preservation
Purpose of the Preservation Module
Rosetta’s Preservation module provides an environment in which large stores of digital data can be stored and managed.
The Preservation module supports all the preservation-related activities in Rosetta. The purpose of these activities is to
- provide the institution with the tools to describe the possible risks in the repository
- locate the population at risk
- create plans to eliminate these risks, and
- execute these plans for the material at risk.
Preservation Sub-modules
The solution contains the following sub-modules:
Format Library
The Format Library allows the institution to describe in greater detail information related to formats, format properties, and applications. Using this network of information, the institution can then determine the risks associated with each format.
Another motivation for the Format Library is to establish the mechanism of a global knowledge base that will be maintained by all Rosetta users and manage the format-related information. For every institution that implements Rosetta, there will be a local copy of the set of libraries in which the relations between the formats, applications, and risk identifiers can be managed. This way, there will be no need to recreate the entire knowledge base for each institution.
Risk Analysis
The Risk Analysis sub-module controls the processes (both automated and manual) that are performed to measure the risk status of the repository. Based on the results of these processes, users can create sets of objects that will be handled by the Preservation Planning sub-module.
Preservation Planning
The Preservation Planning sub-module provides Preservation Analysts with the tools to:
- gather and track information regarding preservation activities
- perform tests and evaluations
- make decisions regarding the best approach to take in order to preserve objects that are at risk
Preservation Execution
The Preservation Execution sub-module creates new representations for the IEs that contain files at risk. At the end of the process, a new version of each IE is created and the latest version of each representation is risk-free.
This functionality is also used in the greater Rosetta application for adding one representation to one IE at a time. This is called Add Representation and is performed in the Web Editor. (For more information, see the Rosetta Staff User’s Guide.)
Preservation Terminology
The following key terms are used throughout this guide.
Application Library
The Application Library contains all of the data regarding applications: name, ID, license end-date, and so forth. Each application can be related to one or more formats. The Application Library is managed globally (exposed to all installations of Rosetta) and some information is managed in each installation. The connections between the Format Library and the Application Library are managed locally, but new applications (as new formats) should be added or removed in the global libraries.
Bitstreams
Since files at risk can be stored in Rosetta as part of byte streams (for example, a WARC file that contains many HTML and image files), the preservation plan should be able to create sets that can include these files and process them.
Classification Group
Since significant properties are usually shared between multiple formats, the classification group is a way to aggregate these common properties so that they will be connected to all the relevant formats.
Format Library
The Format Library contains all of the data regarding formats: name, description, related applications, related risks, and sustainability factors. Some of the information is managed globally (exposed to all installations of Rosetta) and some of it can be managed locally (stored within the local DB of the institution). In the local library, data can be added but not removed.
Preservation Plan
The preservation plan is a structured workflow used by the Preservation Analyst (PA) to handle objects that are at risk. The workflow takes the PA through the stages of gathering documentation and general information, creating the preservation set, defining the suggested alternatives, running tests, and summarizing the results.
Preservation Plan Alternative
Advanced preservation plans can have more than one alternative in order to ensure success. For example, the same plan may have two migration utilities that convert the source format to the target format. Each one of these is saved as an alternative, and the workflow allows the user to evaluate each utility and add information that is relevant for the utility being evaluated.
For basic preservation plans, no alternatives are needed.
Preservation Plan Execution
After the institution signs off on the plan, the plan can be executed with no need to go through testing and defining the exact material. The plan’s execution can be scheduled in advance or launched immediately.
Preservation Set
A Preservation set is a set of intellectual entities (IEs) that is defined during the first stage of the preservation planning. The set starts as a logical set and becomes an itemized set every time a preservation plan is executed.
Risk Grading
This routine processes each file that is loaded into the repository and checks whether or not it is at risk. The risk grading is part of the SIP processing, but it can run separately as part of the process automation mechanism.
Risk Identifiers List
The Risk Identifiers List displays all the risks that can be identified by the system. The risk can be either a query on the file attributes that put the file at risk (existing technical metadata) or a tool that extracts the technical metadata that describes the problem. Each format can be related to one or more risk identifiers.
Process Flow
There are two preservation workflows available:
- Advanced – Includes performing a format risk analysis and defining alternative preservation plans.
- Basic – You create a set of files, select a migration tool, and execute the plan. No format risks or alternative plans are necessary.
Advanced Preservation Workflow
The following diagram shows the various aspects of the Advanced Preservation Workflow:
Advanced Preservation Workflow
The following is an explanation of the stages depicted in the above diagram:
- Populate Libraries (Format, Application, Classification, and Risk) – Each of these libraries can be managed through the UI.
- Perform a Risk Analysis – The risk analysis process is part of the SIP processing so that each file that is inserted into the repository is measured for risks. The process can be run for the files that are already in the repository as well.
- Generate a Risk Report – The report is generated on demand or by schedule and shows the risk status of the repository. Opening this report is the first step in creating a preservation plan.
- Create a Preservation Set – From the risk report, the flow moves the analyst toward creating a preservation set that is based on a format and a risk. The set can be honed or narrowed down by adding conditions to the search query.
- Create a Preservation Plan – Once a preservation set is created, the analyst can create a plan that is built around the preservation set.
- Execute the Preservation Plan – After the plan has been tested and the preferred alternative has been selected, the plan is signed-off on. It is then ready to be executed on the entire preservation set. The plan can be executed every time there is a need (when there is a population of files at risk that match the preservation set search criteria). Once the plan is signed-off on, it cannot be changed.
See the following sections for more information on the advanced preservation workflow:
Basic Preservation Workflow
The following diagram shows the various aspects of the Basic Preservation Workflow:
Basic Preservation Workflow
- Create a Preservation Set – The workflow starts by creating a preservation set of files.
- Create a Preservation Plan – Once a preservation set is created, the analyst can create a plan that is built around the preservation set.
- Execute the Preservation Plan – After the plan is created it can be signed-off on and executed. There is no testing or alternatives involved.
For more information on the Basic Preservation Workflow, see Basic Preservation Plan.
Staff Roles
Tasks in the Preservation module are performed by several different roles. Each role has its own set of rights and limitations. Specifically:
- Preservation Analysts can do everything except Sign Off and Reject Plan
- Preservation Managers can do everything including Sign Off and Reject Plan
- Editors (View, Typical, or Full) perform Test Evaluation
- Technical Analysts are needed for handling technical issues (this role is not automatically assigned to a Preservation Analyst or Preservation Manager)
Consortial Versus Institutional Scope
The Preservation module distinguishes between what users of a consortial scope versus users of an institutional scope can view and manage. It also distinguishes between what information remains exclusive to one institution and what information may be shared across institutions belonging to the same consortium.
A consortium login has access to consortium-level tests, plans, executions, and risk reports including those of all member institutions.
An institution has access to its own tests, plans, executions, risk reports, and failures (of IEs to load).
Signed-off plans from one institution can be seen and used by other institutions from the same consortium.
Format Library Updates
Prior to version 3.0 of Rosetta, the updating of Format Libraries occurred only during an upgrade or service pack release of the Rosetta software. In order to add flexibility and expediency to updates, the Format Library can now be updated whenever there is a need to. Because the frequency will vary from Format Library version to another, Rosetta contains within its Preservation module the means for managing the Format Library updates.
As with previous versions, a global or master copy is maintained by a group of Rosetta users who consult on general format changes and specific requests from other Rosetta users for recognition of format changes. Once changes are implemented, a newer version of the master copy is packaged and distributed.
Individual libraries may customize their Format Libraries and retain their changes during upgrades.