User:Phillipviana/sandbox/Self-contained Information Retention Format (SIRF)

Self-contained Information Retention Format (SIRF)

SIRF
The Self-contained Information Retention Format (SIRF) is a SNIA logical container format designed to serve as a free, vendor-neutral format for long-term data preservation that will be interpretable by future data preservation systems, thus reducing the associated costs of digital preservation. SIRF was developed by SNIA's Long-term Retention Workgroup. The SIRF format is currently published as a draft and it is available for public review.

SIRF has been designed to have the following characteristics:


 * Media agnostic
 * Vendor agnostic
 * Extensible
 * Self-described
 * Self-contained
 * Correctness and completeness verification support

Components
The following figure schematically depicts a SIRF container that includes:


 * A magic object that identifies whether this is a SIRF container and its version. The magic object is independent of the media and has an agreed defined name and a fixed size. It includes means to access the SIRF catalog.


 * Numerous preservation objects that are immutable. The container may include multiple versions of a preservation object and multiple copies of each version.


 * A catalog that is updatable and contains metadata needed to make the container and its preservation objects portable into the future without relying on functions external to the storage subsystem.

Categories
The SIRF catalog includes metadata related to the whole container and metadata related to each PO within the container. Both types of metadata are organized into *categories*.

The metadata for the whole container includes the following categories:


 * 1) Container Information
 * 2) Specification - includes information about the specification version used (e.g., 1.0, 1.1, 2.0)
 * 3) Container ID - the container identifier e.g. the tape id or cloud container id
 * 4) State - current state of the container which can be INITIALIZING, READY, NOT READY or MIGRATING.
 * 5) Provenance - formation in a SIRF container (e.g., its origins, chain of custody, preservation actions and effects).
 * 6) Audit Log - place for preserving any important information about how a container has been accessed or modified, and is usually domain-dependent.

The metadata for each PO includes the following categories:


 * 1) Object Information
 * 2) Object IDs - a set of identifiers for the preservation object, including name, version, logical and parent identifiers.
 * 3) Related objects - contains references to other preservation objects in the container.
 * 4) Dates - holds information about the creation, last accessed and last modified dates for the preservation object.
 * 5) Packaging Format - used to denote the format of the manifest of the preservation object, e.g. PREMIS, XIP, XFDU.
 * 6) Fixity - is used to demonstrate that the particular content information has not been altered in an undocumented or unauthorized manner. This is usually done via checksums with algorithms such as SHA1 or MD5.
 * 7) Retention - used to implement retention management disciplines into the system management functionality, such as retention policies and hold policies. While SIRF does provide support for basic retention metadata, the implementation will be dependent on the underlying system. For example, a system implemented on CDMI may take advantage of CDMI’s retention mechanisms.
 * 8) Audit Log - place for preserving any important information about how a preservation object has been accessed or modified, and is usually domain-dependent.
 * 9) Extension -  is a placeholder for data store-specific information. Each organization using SIRF may use this reserved, general purpose category to add private information or metadata that is specific to their own domain or data store.

Reference Implementation (OpenSIRF)
OpenSIRF is a free, open-source implementation of the SIRF format under the MIT License written in Java. OpenSIRF encompasses a core component (opensirf-core), which contains the model classes with all the elements that belong to the SIRF format, and a REST component (opensirf-jaxrs), which defines a RESTful API to manipulate the SIRF catalog and its associated objects. Because of the RESTful API, OpenSIRF is compatible and can be easily integrated with user interfaces (e.g. websites) and web services. OpenSIRF is currently under development, therefore not all the categories of the SIRF format have been implemented yet, and there is some work to be done in the source code.