National Database for Autism Research

The National Database for Autism Research (NDAR) is a secure research data repository promoting scientific data sharing and collaboration among autism spectrum disorder (ASD) investigators. The project was launched in 2006 as a joint effort between five institutes and centers at the National Institutes of Health (NIH): the National Institute of Mental Health (NIMH), the National Institute of Child Health and Human Development (NICHD), the National Institute of Neurological Disorders and Stroke (NINDS), the National Institute of Environmental Health Sciences (NIEHS), and the Center for Information Technology (CIT). The goal of NDAR is to provide a shared common platform for data collection, retrieval, and archiving to accelerate the advancement of research on autism spectrum disorders. The largest repository of its kind, NDAR makes available data at all levels of biological and behavioral organization for all data types. As of November 2013, data from over 90,000 research participants are available to qualified investigators through the NDAR portal. Summary information about the available data is accessible through the NDAR public website.

Background
In response to the heightened societal concern over ASD, the United States Congress passed the Combating Autism Act (CAA) of 2006 (P.L. 109–416). Through this Act, Congress intended to rapidly increase and improve coordination of scientific discovery in ASD research. The CAA mandated the creation of the Interagency Autism Coordinating Committee (IACC), a federal government advisory panel charged with developing and annually updating a Strategic Plan for ASD research. This plan provides a blueprint for autism research that is advisory to Congress, the Department of Health and Human Services, and other federal agencies on the needs and opportunities for autism research. The IACC Strategic Plan was designed to detail research opportunities centered on the six most pressing questions facing those affected by autism and links them to specific research efforts. In 2009, the plan was finalized and submitted to the Secretary of the Department of Health and Human Services; a seventh question related to infrastructure and surveillance needs was added to the plan in 2010.

NDAR was developed by the NIH with the goal of improving sample sizes and enabling researchers to share data for increased analyses. NDAR was already in the process of being developed when the seventh question of the IACC Strategic Plan was added. Question 7, Objective H of the IACC Strategic Plan emphasizes the creation of mechanisms to specifically support the contribution of data into NDAR from 90 percent of newly initiated projects regardless of funding source, and the linking of NDAR with other existing data resources by 2012.

Oversight and governance
Thomas Insel, the Director of NIMH, oversees NDAR and its implementation and participates on a Governing Committee responsible for the ongoing management and stewardship of NDAR. This committee includes several other NIH Institute and Center directors or their designees.

The NDAR Implementation Team (NIT) is one of the groups providing direction on NDAR, specifically data submission and access, in order to promote consistent participant protections. The team is composed of program staff representing Institutes and Centers with autism research in their portfolios.

The Autism Informatics Consortium (AIC) was launched in 2011 with the goal of accelerating scientific discovery by making informatics tools and resources more useful to autism researchers. Current members include Autism Speaks, Kennedy Krieger Institute, Simons Foundation, Prometheus Research, and the NIH.

NDAR Organization
The two key components that form the basis of NDAR are the Global Unique Identifier (GUID) for research subjects and the researcher-defined Data Dictionary to describe experiments. This platform requires that common data definitions and standards, as well as comprehensive and coherent informatics approaches, be developed for and with the involvement of the research community.

Global Unique Identifier (GUID)
The NDAR GUID is a subject identifier used to protect the confidentiality of a research subject. When submitting data, an investigator who has appropriate access to a subject's personally identifiable information (PII) uses the GUID Tool to create a unique identifier for each subject in their study. Although a GUID is based on PII, a subject's PII never leaves the local research site. The GUID Tool requires basic information typically found on a birth certificate such as first name at birth, last name at birth, date of birth, gender at birth and city/municipality of birth which do not change throughout an individual's life. A one-way hash code sequence is then generated based on this input. The GUID Tool transmits the encrypted codes to NDAR. The submitted one-way hash code sequence is compared to other sequences previously submitted. If that sequence has already been registered and has an associated GUID already identified, then the same GUID will be returned to the researcher. If the sequence has not been previously registered, a new GUID is created and returned to the researcher. The GUID for a subject is the same regardless of the location or time where it is generated. If the same subject enrolls in another investigator's project or provides a biological specimen for a repository, the same information from his or her birth certificate is entered into the software by the second investigator and the same GUID is generated. Data are always submitted to NDAR in association with a GUID, and the data in NDAR are indexed by the GUIDs. In this way, data from a de-identified individual subject can be aggregated, tracked and linked across projects, time, databases, and biobanks allowing for a more complete picture of the subject.

The GUID is the result of a collaboration between NDAR, the Simons Foundation, and a team of researchers from Columbia University. It has become the standard as a patient identifier for autism research and serves as a model for similar standards in other research areas.

Data dictionary and validation tool
NDAR has established a data dictionary with over 300 clinical, imaging, and genomic research definitions which were created in close collaboration with the ASD research community. To submit data to NDAR, researchers are required to format their data in accord with an existing data definition or define new data definition which will be available for use by other researchers. As of May 2012, NDAR contains over 35,000 discrete data elements.

Researchers confirm that their data conforms to the existing definitions by using the Validation Tool. The Validation Tool ensures that naming conventions are defined, GUIDs are properly registered, and the reported data are consistent with the value ranges defined in the dictionary. NDAR requires minimal adjustments to the way raw data are entered, and multiple web tutorials and demos are available for researchers willing to submit data. All data contributed and shared must pass validation before they are submitted.

Genomics tool
After thorough analyses of functional genomics acquisition and storage criteria as well as a review of the needs of the research community, NDAR staff developed a tool to simply and clearly define the relationship between samples and data files. A predefined set of parameters was built that would guarantee the consistency of raw experimental data, while simplifying the data definition for submission and aggregation across federated repositories. The predefined set of parameters includes attributes specific to each experiment (such as molecule and sub-molecule), experiment technology, vendor and platform, extraction protocol and kit, processing protocol and kits, analysis software, equipment.

Imaging tool
NDAR currently supports the receipt of unprocessed brain images in DICOM format, as well as processed images in variety of formats, including DICOM, MINC 1.0 and 2.0, Analyze, NIfTI-1, AFNI and SPM. Images could be visualized using NDAR's built-in image registration and visualization tool [MIPAV]. Collaborations are planned with prominent ASD researchers in order to define data structures and develop standardization tools for functional neuroimaging, EEG, TMS, MEG, and eye-tracking.

Data submission
Investigators working on autism-related projects, regardless of their funding source, are strongly encourages to submit any type of autism-related data generated in their laboratories. After extensive consultations with the research community, NDAR has established a two-tiered submission strategy for investigators receiving NIH funding. Descriptive (raw) data are expected to be submitted biannually in January and July, and includes non-proprietary behavioral and diagnostic data. Examples include standard clinical assessments, family medical history, demographics, unprocessed images, and genomic data. Making this information available early in the research process allows other investigators to understand the general characteristics of the participants enrolled. Experimental (analyzed) data are expected to be submitted within 12 months after accomplishment of each primary aim or objective (or set of interdependent aims or objectives) of the supported research, or at the time of publication of the results of the primary aim(s), whichever occurs first. Examples include outcome measures, analyzed genomics data, results from image analysis, and volumetric data.

Data sharing
NDAR's Ongoing Study capability allows investigators to work collaboratively on research studies in progress; sharing data, tools, and standards through the NDAR portal before they are shared with the rest of the ASD community. Qualified researchers can also request access to data stored in NDAR and/or data stored at federated repositories, after the data are made public. To gain access to that data, an investigator must obtain NDAR data access privileges. By default, all data contained in NDAR has passed data validation ensuring that all research participant data has an NDAR GUID, conforms to the NDAR data standard, and meets standard value constraints. Beginning with the January 2011 submissions, NDAR developed and implemented automated quality procedures that are run against all incoming data to check for a variety of potential data discrepancies such as duplicate data, uniformity of gender, age consistency across measures, and scoring errors on a number of measures. Not only will the new QA procedures make NDAR-residing data of higher quality, but will increase data accuracy across each individual laboratory and project.

Federation
NDAR is federated with four other private databases- the Autism Genetic Resource Exchange (AGRE), the Autism Tissue Program (ATP), and the Interactive Autism Network (IAN). This federation allows the data to be kept in their respective locations while enabling users to search across the databases simultaneously. These repositories all use the NDAR GUID as well as common data definitions. NDAR is currently finalizing a federation agreement with the Simons Foundation.

Federal databases
NDAR is linked to the following federal data repositories providing a wealth of information in one central location: the Pediatric MRI Data Repository, dbGaP, dbVaR, and the Sequence Read Archive.

NDAR Study
The NDAR Study allows researchers to record basic information about the cohort, measures, analysis, and results of a study, linking to data contained in NDAR as well as the resulting publication. This tool allows others to replicate results and understand the data analysis methods. NDAR data is associated with PubMed papers; readers are able to easily access the NDAR data from PubMed using this feature.

Awards

 * 2008 and 2009 NIH Award of Merit
 * 2011 HHS Innovates Award
 * NIMH's Top 10 Research Advances of 2011