User:Mbruce21/Database preservation

Article Draft
With the prevalence of databases, different methods have been developed to aid in the preservation of databases and their contents. These methods vary depending on database characteristics and preservation needs.

There are three basic methods of database preservation: migration, XML, and emulation. There are also certain tools, software, and projects which have been created to aid in the preservation of databases including SIARD, the Digital Preservation Toolkit, CHRONOS, and RODA.

Database characteristics
The characteristics of the database itself are taken into consideration when attempting preservation of said database. Relational databases are most popular. Relational databases are made up of tables which contain data in records and these tables then connect to one another through common data points that are stored in their records. However, with the emergence of big data the new NoSQL database is also coming into play. Databases are characterized as open or closed and static or dynamic. When a database is considered to be open it means it is open to additional data being added, however when a database is considered to be closed it means the opposite—that it is closed to new data because of its completed nature. A database is considered to be static when it contains records that are not edited or changed after their initial inclusion, however a database is considered to be dynamic when it contains records that may be edited in the future. Whether a database is open and static, open and dynamic, closed and static, or closed and dynamic will affect the methods used for preservation. It is more difficult to preserve a dynamic database than a static one because the data is constantly changing, and it is more difficult to preserve an open database than a closed one because data is constantly being added. The more often a database changes, either within a record or by adding a record, the more often steps must be taken to capture that change for preservation.

Database Preservation Methods
Three core methods of digital preservation can be applied to the preservation of databases as well. These methods include migration, XML, and emulation.

Migration
The migration method (also known as inactive archiving) involves transferring data from an obsolete database program to a newer format. There are three methods of migration: backward compatibility, interoperability, and conversion to standards. Backward compatibility involves utilizing newer software or hardware versions to open, access, and read a document which was made using an older version. Interoperability involves decreasing the possibility of obsolescence by ensuring a particular file can be accessed with more than one combination of software and hardware. Conversion to standards involves transferring data storage from a proprietary format to an open, more readily accessible, and widely used format.

XML
The XML method (also known as XML normalization) involves converting original database information to the XML standard format. XML as a format does not require a particular hardware or software (beyond a text editor or word processor) and is both human and machine readable, making it a sustainable format for preservation and storage purposes. However, in converting data to XML format, certain interactive functionality of the database, such as the ability to query, is lost.

Emulation
The emulation method involves recreating an older computing environment with newer technologies and software. This allows obsolete software, hardware, or file formats to remain accessible on new systems. Therefore, an outdated database could be run on an emulator which mimics the environment that the database was originally created in.

CHRONOS
CHRONOS is a software product which serves as a database preservation tool. CSP Chronos Archiving represents one proprietary solution for database preservation. CHRONOS was developed from 2004 to 2006 by CSP in partnership with the University of Applied Sciences Landshut's department of computer science. CHRONOS pulls data from a database management system and stores it in a CHRONOS archive as text or XML files. All data can therefore be accessed and read without a Database Management System (DBMS), or CHRONOS itself, as it is in plain text format. This eliminates the need for maintaining a DBMS solely for reading preserved static databases as well as the need to, potentially riskily, migrate database files to newer database formats. Although CHRONOS stores data in plain text format, its querying capabilities, are considered comparable to that of a relational database.

RODA
RODA, or the Repository of Authentic Digital Objects, was a project launched in Portugal in 2006 by the Portuguese National Archives, in order to preserve those digital objects produced by Portugal’s government institutions. The project aimed to combine several types of digital objects into one repository including relational databases. As a singular repository of many differing types of digital objects, RODA aims to normalize all ingested objects, that is to minimize the format types utilized to store documents and to preserve like documents in like formats.

The RODA project emphasized the creation of a standardized method for preserving databases as digital objects. Database preservation poses a unique challenge in that the preservation process is split into three layers: data, structure (logic), and semantics (interface). That is, it was determined that the databases’ data, as well as its structure and semantics, need to be preserved. In order to preserve all three of these elements, the RODA project developed the Database Preservation Toolkit.

Database Preservation Toolkit
A series of steps, created by the RODA project to ingest and preserve relational databases in a normalized format, represent the Database Preservation Toolkit or dbtoolkit: an instrument designed for the preservation and access of archived databases. Using the Database Preservation Toolkit, to achieve normalization of relational databases, data is converted to DBML (Database Markup Language) or SIARD, as both utilize XML, a standard format which does not require specific or proprietary software or hardware--ideal for a preservation format.

In this conversion process the toolkit extracts unique DBMS information using DBMS-specific connectors. These connectors pair with a particular DBMS, extract its data, and represent it in XML form which then leads to representation in DBML and SIARD. New connectors can also be created for the ingestion of new DBMS’.