User:Daomeideren/sandbox/Data definition specification

Data Definition Specification (DDS) serves as a guideline to ensure that a data definition is comprehensive and consistent. It represents the attributes required to quantify a data definition quality. A comprehensive data definition specification encompasses the enterprise use of data. The hierarchy of managing data, prescribed guidance enforcement and criteria to determine compliance.

Overview
A Data Definition Specification can be developed for any organization or specialized field and improves the quality of resultant products through consistency and understandability. It eliminates redundancies as all contributing areas are referencing the same specification, and provides standardization, making it easier and more efficient to create, modify, verify, analyze and share information across the enterprise. In order to understand how a Data Definition Specification functions within an Enterprise, we must first look at the building blocks of a DDS. Writing data definitions, defining business terms or business rules within the context of a particular environment provide the structure to develop an organization’s data architecture. In the course of developing these definitions, the nouns used must be traceable to data and that data must be unambiguously defined.

A DDS may be used by any of the following cross-functional activities to provide consistency and clarity of terms between the departments engaged in supporting the activity:


 * Business Intelligence
 * Business Process Modeling
 * Business Rules Management
 * Data Analysis
 * Data Architecture/Modeling
 * Information Architecture
 * Metadata Management
 * Report Generation

Criteria
The Data Definition specification requires that data definitions be:
 * Atomic - singular, describing only one concept
 * Definitions should be included for all commonly used terms as well as those that could be misinterpreted. The term Cartoon for example is considered legacy; it has been replaced by multiple terms that more clearly represent a particular medium for the concept such as Animated Feature or Comic Strip. While a term references a single concept, more than one word may be required to identify that concept:

A single concept that can be identified with one word: File A single concept that must be identified with more than one word: File Extension
 * Traceable - the term can be mapped to a specific data element
 * In business, a term may be traced to an entity such as a "Customer" or to an attribute such as the "Customer Name". Further, a term may be a value in a data set such as "female" (gender) or designate the data set as in "Customer Type".

Traceability represents the relationships within the data hierarchy.
 * Consistent - used in a standardized syntax; if the term is used in a specific context, that context is noted
 * Accurate - precise, correct and unambiguous, stating what the term is, not only what it is not
 * Clear - readily understood by the reader
 * Complete - containing the term, description and any necessary context references
 * Concise - limited to information about the term only, avoid circular references
 * Declarative - be stated as a descriptive phrase or sentence(s)

Enterprise data
An excellent example of a Data Definition Specification was produced by the Open Mobile Alliance to document Charging Data. The document provides definitions, relationships and serves as the centralized catalog of data elements defined for interfaces and also specifies the mapping of these data elements to protocol fields used in the protocol bindings for the interfaces. Open Mobile Alliance Charging Data Definition Specification

Related to a Data Definition Specification, created specifically for the interchange of financial data, the Market Data Definition Language (MDDL) is an XML specification designed "to enable the interchange of information necessary to account, to analyze, and to trade financial instruments of the world's markets. It defines an XML-based interchange format and common data dictionary on the fields needed to describe: (1) financial instruments, (2) corporate events affecting value and tradability, and (3) market-related, economic and industrial indicators. The principal function of MDDL is to allow entities to exchange market data by standardizing formats and definitions. MDDL provides a common format for market data so that it can be efficiently passed from one processing system to another and provides a common understanding of market data content by standardizing terminology and by normalizing the relationships of various data elements to one another... From the user perspective, the goal of MDDL is to enable users to integrate data from multiple sources by standardizing both the input feeds used for data warehousing (i.e., define what's being provided by vendors) and the output methods by which client applications request the data (i.e., ensure compatibility on how to get data in and out of applications)." [adapted]

Clinical submissions
The Clinical Data Interchange Standards Consortium (CDISC), a global, multidisciplinary, non-profit organization, has established standards to support the acquisition, exchange, submission and archive of clinical research data and metadata. CDISC standards are vendor-neutral, platform-independent and freely available via the CDISC website. The Case Report Tabulation Data Definition Specification (define.xml) now in draft version 2.0 is the most mature of the data definition specifications. The specification is part of the evolution from the 1999 FDA electronic submission (eSub) guidance and the electronic Common Technical Document (eCTD) documents that specify a document describing the content and structure of the included data should be provided within a submission. Define.xml was developed to help automate the review process by providing a means to generate a Data Definition Document in machine-readable format. Define.xml has significantly improved the review process by standardizing the numerous submissions to the FDA and enabling the interchange and regulatory submissions efficiently. Define XML standard for transmission of Study Data Tabulation Models (SDTM), Standard for the Exchange of Non-clinical Data (SEND) and Analysis Data Model (ADaM) metadata has reduced review cycle times from over two years down to months.

Archival data
A Data Definition Specification (DDS) forms the foundation of building the metadata within the schema for data archiving. While not a DDS, Metadata Encoding & Transmission Standard (METS) does utilize some of the same principles of a DDS: consistent use of well defined terms. METS utilizes these key terms to catalog digital objects for global use. The METS schema provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a digital library object, and for expressing the complex links between these various forms of metadata. It can therefore provide a useful standard for the exchange of digital library objects between repositories.

A similar effort is underway to preserve the complex data associated with the archiving of video games. Preserving Virtual Worlds sought to address the deficiencies in current archival formats siting the absence of suitable ways of documenting interactive fiction and games at the bit-level: specifically, they failed to provide the “representation information” needed to map the raw bits into higher-level data constructs. Preserving Virtual Worlds 2 is the ongoing research project that continues and expands upon the initial effort in this field.