User:Pganza/Document storage reduction

Document Storage Reduction (DSR) decreases the amount of disk storage required to electronically store large volumes of documents in digital archives and electronic content management (ECM) systems.

The term generally refers to reducing the disk storage requirements for storing high volume document types such as bank and credit card statements, telephone and utility bills, invoices, insurance policies, insurance explanation of benefits statements, and tax notices. These documents are also classified as unstructured data.

DSR is a subset of single-instance storage and can be applied to documents that:  Are produced regularly (for example, every week, every month, every quarter, or every year) Are produced in large batches for many recipients, from thousands to millions Contain transaction data as well as promotional and marketing data Contain graphics and images  DSR can also be applied when the ratio of image size to transaction data size in each document is high (for example, 10 to one) and each document in the batch shares common graphics and images.

A DSR strategy uses enterprise content management (ECM) technology that can intelligently separate the common composition resources from the unique transactional content that makes up a high volume document. In doing this, only one single copy of the “common resources” is stored for any given batch of documents, thereby reducing the amount of disk storage required.

Pointers are then inserted within the transactional document that link it to the composition resources. When the document is retrieved, the two pieces are integrated in near real-time to present a complete, reconstituted document as was originally produced.

When the document is accessed, it is composed into its original form by replacing the links to objects with the objects themselves. Furthermore, objects can be cached on servers so they do not need to be retrieved each time from the document archive.

Growth in Storage Demands
Information technology research firm, International Data Corporation (IDC) forecasts the amount of unstructured data will grow six fold from 2009 to 2012.

Disk Space Reduction Example
Organizations have achieved up to a 94% storage reduction by applying some of the document storage techniques described herein.

This is considerable; for example, if your daily document storage needs are 100Gigabytes, reducing this to 6 Gigabytes daily is a significant saving.

Alternative Solutions
One solution is to compress the large elements of a document – the graphics and images. Compression has been used in the video, music and photography industries for many years through standards such as Jpeg and Mpeg among others. These compression techniques have two benefits: reducing the disk storage required for these objects and reducing the transmission time and response time when these objects are distributed electronically.

The compression solution may not address the problem of storing large volumes of identical graphics and images. Each image may be compressed on its own, without detecting if the image already exists in the archive.

Another solution is deduplication. This refers to finding duplicate objects in an archive and only storing them once, replacing the duplicate objects with references to the single stored object. This process is not effective for high volume documents, because each document is treated as a single object.

For transpromotional documents that contain both transaction data and promotional images and messages, it is highly unlikely that any two documents will be identical.