Document Content Architecture

Document Content Architecture, or DCA for short, is a standard developed by IBM for text documents in the early 1980s. DCA was used on mainframe and IBM i systems and formed the basis of DisplayWrite's file format. DCA was later extended as MO:DCA (Mixed Object Document Content Architecture), which added embedded data files.

The original purpose of DCA was to provide a common document format that could be used across multiple IBM word processing platforms, such as the IBM PC, IBM mainframes, the Displaywriter System, and the IBM 5520 Administrative System.

DCA defines two types of documents:
 * Revisable-Form Text (DCA/RFT) which is editable.
 * Final-Form Text (DCA/FFT) which is "formatted for a particular output device and cannot be changed."

Description
DCA defines a data stream representing a document. "Documents may contain fonts, overlays and other resource objects required at presentation time to present the data properly. Finally, documents may contain resource objects, such as a document index and tagging elements supporting the search and navigation of document data, for a variety of application purposes."

MO:DCA is the wrapper or container for various objects that can make up the document. Each object is defined by its own subordinate architecture. The architectures are:
 * Presentation Text Object Content Architecture (PTOCA) describes formatted text, including text attributes such as font or color.
 * Image Object Content Architecture (IOCA) describes resolution-independent images.
 * Graphics Object Content Architecture (GOCA) describes vector graphic images. A variation of GOCA, AFP GOCA, is used in Advanced Function Presentation environments.
 * Bar Code Object Content Architecture (BCOCA) describes bar codes in a number of different formats.
 * Font Object Content Architecture (FOCA) describes fonts to be used in the document.
 * Color Management Object Content Architecture (CMOCA) describes the required color management information.

Each architecture uses a series of binary structured fields to describe its corresponding object.

Revisable-Form Text
Revisable-Form Text (abbreviated RFT or RFT-DCA) is part of DCA. It is sometimes referred to as Revisable Format Text. It was used by IBM DisplayWriter 4 and 5 word processors on System/360 and 370 mainframe computers, and OfficeVision/400 to allow transfer of formatted documents to other systems.

RFT has a counterpart Final-Form Text (abbreviated FFT or FFT-DCA), which was not intended to be editable and was output-only.

History
The drive to initiate international standards for the DCAs was initiated in 1980 at the IBM Rochester facility. A team, consisting of two MODCA architects, an RTOCA architect, and a PTOCA architect, was assembled. These architects were responsible for forming IBM consensus for the design of the data streams and to take the work into the international standards arena. There was a concerted effort to bring the international community into the development. This decision was based in part on the experience gained over the acceptance of GML into an international SGML standard. To avoid the long delay of creating the architecture, they wanted to get everyone involved early. SGML standardization had taken many years to develop. IBM's work with document content had been pushed by the needs of main frame computers where GML and DCA were in use, but that experience was pointing to a need for standardized component architectures for revisable and non-revisable text in particular.

In 1981, shortly after its inception, the group was moved along with the IBM 5280 Distributed Data System to IBM Austin near Round Rock, TX, where the work continued with mixed success. As the architectures were becoming more firmly positioned on the international stage, the team was moved again in 1987 to The IBM Dallas Programming Center, where in 1998 it was disbanded and the work on the DCA architectures discontinued due mainly to the PC community which had gone in a different direction of necessity. The DCA architectures were fully completed, but not completely agreed upon after 18 years. There were no active implementations in sight.

The PC world had decided on HTML (believed to be an application of the SGML international standard) and used portions of it for their purposes. Microsoft Word eventually used the similar datastream for the internal working datastream for storage of editable content. Even though the SGML standard was available, it was impractical to use the full SGML parser implementation, so a potential subset of it became the de facto standard for revisable text used today in the PC arena.

At about the same time, Adobe Systems designed and produced the printable document encoding PDF, which has become the standard for PC-produced printable documents. The international standard was set in 2008, with input from the users, who decided to use the products offered in great numbers. The decision was driven by the need for the product, and the solution found was far more acceptable than the standards committees could design. Over 10 years of work had not produced an acceptable method, and the PC computing community created what they needed in less time.

Attempting to achieve a consensus document data stream was quickly out-flanked by the available and usable content provided by the companies who did not attempt to share with others, but created a workable solution and successfully sold it to users. The output of the word processing software is 'printed' into the PDF format provided by the most used presentation product. For example, Microsoft Word provides a printer selection 'Microsoft Print to PDF' to produce the requisite output for a PDF document. A similar method could have been used to produce the international standard had one eventually arrived.

When IBM disbanded its Dallas Programming Center in 1998, the entire staff of architects retired and left the company, except the manager, who was moved, ending the DCA architecture project for the foreseeable future at IBM.