User:TCIAteam/sandbox

History
The Cancer Imaging Archive (TCIA), begun in 2011, was an expansion of the original National Biomedical Imaging Archive (NBIA). TCIA was formed to provide publicly accessible collections of DICOM images with HIPAA and private health information (PHI) removed while preserving information sufficient for computer or image processing research. The initiative assembled collections of biomedical and clinically indicated CT, MRI and nuclear medicine (e.g. PET) diagnostic studies acquired during conventional clinical care. By 2008 NBIA use-case analysis demonstrated an increasing demand for additional image collections especially those that might be analyzable in connection with tumor cases genomically extensively analyzed by the NHGRI/NCI cancer genome atlas (TCGA) project. Its’ research value thus justified, the NIH National Cancer Institute’s Cancer Imaging Program (CIP) convened an extramural cross-disciplinary science gap analysis. That analysis motivated an announcement of a competitive CIP-funded contract that was ultimately awarded to the ERL laboratory at Washington University in St Louis to host as a downloadable Internet service the NBIA (now TCIA) software and its multiple focused imaging collections in original DICOM format.

Mission
National Cancer Institute’s Cancer Imaging Program (CIP) has committed TCIA to providing DICOM-based imaging datasets publicly accessible, downloadable, and organized in tumor-specific searchable collections. The data-field tags, elemental to each DICOM image, are robustly de-identified and devoid of PHI. But tags that are likely needed for useful clinical and computer researchers are carefully preserved untouched (e.g. scan parameters, etc.). That data is essential to develop standardized methods for quantitative imaging (QI), radio-genomics and image interpretation since it has links to clinical and tissue relevant metadata. Numerous special collections (labeled with the prefix “TCGA-…”) connect, through a common unique identifier, to cases genomically analyzed, and publicly available, in the TCGA DataPortal. That DataPortal also connects to and allows download of extensive case report form clinical data and demographics of the individual donors from whom the tissues originated.

Organization of Archive
The overall master organizational listing consists of named collections that allow simple or advanced free text dynamic searches. Case indexing follows the conventional DICOM hierarchy of patient -> study -> series. One of the archive’s most important properties is its’ service-oriented architecture that offers a plethora of organized collections of indexed DICOM clinical images for public download without logon requirements or cost. It provides robust de-identification, quality control, curation, and helpdesk services that are essential for users of such a system. The diverse collections are an open access research resource that has been described as a leading example of Big Data. One of its key aspects is as a service-oriented repository of robustly HIPAA-compliant, multi-level de-identified of private health information (PHI), clinical diagnostic images. Unique aspects of TCIA, not readily found in other image databases to date, includes: A common unique-identifier link to the NIH/NHGRI database known as The Cancer Genome Atlas (TCGA) Data Portal.

That linkage enables researchers to explore the genotype-phenotype relationships of 10,000+ donor tissue cases analyzed from 22+ human cancers. The resource is intended for studying prognosis, survival, drug development and pathway hypotheses. It has sustained a multi-institutional network of over 100 geographically disparate investigators volunteering their time to employ TCIA-TCGA data to develop methods to determine prognosis or predict the response to therapy. In addition it also serves as a common site of storage for scientifically justified restricted-access images obtained during clinical trials. Its well-established, robust de-identification curation process provides safe harbor storage for clinical images contributed by otherwise risk-averse investigators and institutions. It lowers the cost burden for new researchers and NCI grantees wishing to comply with the new more demanding data-sharing NIH policies. It implements a digital object identifier (DOI) process which sets a new standard for scientific reproducibility for journal publishers since it offers journal readers direct access to the original images of the subjects studied that are research subjects in the specific published manuscript. Using TCIA resources NCI also has implemented a unique and cost effective research strategy to encourage national and international scientific research societies (RSNA, AAPM, SPIE, WMIC, SNM, and MICCAI) to participate in software “Challenges”. Results from these efforts should facilitate selection of optimum quantitative imaging clinical decision support tools for clinical trials and co-clinical trials.

Therefore TCIA resources are intended to support: Development of computer aided diagnosis methods (quantitative imaging) Evaluation of unbiased science reproducibility by acceptable standard statistical methods Research on correlation of clinical diagnostic medical images with digital microscopic histological images Exploratory biomarker research for which imaging is a key element Collaboration between cross-disciplinary investigators where imaging is crucial to research on: tumor heterogeneity - between patients and within the tumor; tissue temporal response tracking - objective measurements of tumor progression; imaging genomics and Big Data linkages and analysis (clinical, histo-pathology, genomics). Most collections on The Cancer Imaging Archive can be accessed without a log-on. Access to some collections are scientifically justified for being limited only to specific users who have been given permission to access a specific collection. That enables TCIA to: host support data collections for private or internal projects; protect data in the interval while the original investigators have time to publish results; offer access to just those individuals directly involved in a project such as a clinical trial.

Downloading or viewing data
Specific illustrated instructions on searching and downloading chosen images and image studies reside at the TCIA wiki URL: https://wiki.cancerimagingarchive.net/display/Public/Wiki A plethora of free or commercial DICOM viewers for all operating systems are available for online download. A well-organized annotated listing of such software with descriptions can be found at http://idoimaging.com

Downloading associated (non-image metadata, etc.) data Analysis results, clinical trial data and other non-image information are also managed by TCIA. A full list of the supporting data that compliment available images can be found on the Collections wiki page that is available on the Wiki page associated with a specific collection. The data itself is sometimes hosted directly on the wiki. Other times it is provided as additional series/annotations that are downloaded in the same way as the image data, or it may be hosted on external web sites.

Supporting metadata that resides on the TCIA Wiki In some cases the supporting data is provided to us in the form of a CSV/XLS file or other relatively small file that can simply be attached to the collection Wiki page. For example, the Prostate-Diagnosis wiki page provides links to files that contain clinical metadata and multiple NRRD 3DSlicer segmentations.

Supporting data stored on other web sites: In addition to hosting supporting data directly we sometimes link to external web sites. A prime example of this is our partnership with The Cancer Genome Atlas (TCGA). Their TCGA Data Portal stores extensive genomic, clinical and pathology microscopic images and data for patients in our TCGA-related TCIA image collections. The patient unique identifiers are maintained across both TCIA and TCGA systems so that their information can be correlated. On the other hand some collections such as NSCLC Radiogenomics and NSCLC-Radiomics-Genomics use Gene Expression Omnibus (GEO) to store the related gene sequencing information which connects to the images that are hosted in TCIA. Again, the patient IDs are kept consistent between both systems to allow researchers to connect the data sets from each site.

TCIA - TCGA linked databases
A number of named TCIA collections are designated by nomenclature shared by the TCGA genomic Data Portal (e.g.: TCGA-BRCA, TCGA-GBM, etc.). This offers researchers the ability to correlate clinical images using shared unique identifiers each study that has in TCGA extensive ‘omic analysis, digital pathology slides and bulk download of individual demographic data and clinical data

TCIA for “Challenge” competitions
TCIA has also provided specific data sets used for ‘Challenge’ competitions such as international digital image-focused professional societies like MICCAI, SPIE, or ISBI. Where applicable TCIA links to the challenge management systems (such as Top Coder) employed by Challenge organizers which help promote awareness of these competitions and encourage participation by TCIA’s user community.

Related software (NBIA, CTP, Confluence Wiki, caMicroscope)
Software complementary components of the TCIA resource can be further explored by following the above designated links:

Digital Object Identifiers (DOI’s)
To facilitate data sharing, many publications encourage authors to include data citations to the data that the authors used in creating the results described in their scholarly papers. In addition, new journals are now available for describing data collections outright (e.g., Nature Scientific Data). TCIA has the ability to create persistent identifiers linked to subsets of data held within TCIA that authors may use as data citations in their scholarly papers. TCIA uses the DataCite system to manage references. DataCite leverages the Digital Object Identifier (DOI) infrastructure, which is widely used in citing scholarly articles. TCIA users may request that a DOI be created for subsets of data stored within TCIA. By definition, only publicly available data may be included. Currently DOIs created by TCIA may only reference static (unchanging) subsets of data. In other words, if someone changes the content, this will not be reflected in data returned by existing DOIs created from that shared list.