User:Tiemeyer/sandbox

GlyGen is a knowledge base for glycans, glycoconjugates and related gene, protein and other molecular biology information. GlyGen retrieves information from multiple international data sources such as PDB, RefSeq, and UniProt, and integrates and harmonizes content to allow unique searches that cannot be executed in any of the integrated databases alone.

Organization
The GlyGen project is an international multi-institutional effort. The effort is led by the University of Georgia (UGA) and the George Washington University (GW). The two institutions collaborate in the development of the GlyGen portal. GlyGen collaborates with international organizations such as the European Bioinformatics Institute (EMBL-EBI), the National Center for Biotechnology Information (NCBI), the Georgetown University, Soka University, and Griffith University (Institute for Glycomics) to gather and integrate data relevant to glycoscience.

Integrated databases
Currently GlyGen integrates data from the following publicly available databases:
 * BioXpress
 * BioMuta
 * Disease Ontology
 * GlyTouCan
 * Mouse Genome Database
 * NCBI PubChem
 * NCBI PubMed
 * NCBI RefSeq
 * NCBI Taxonomy
 * Orthologous MAtrix
 * Protein Ontology
 * RCSB the Protein Data Bank
 * The Monarch Initiative
 * UniCarbKB
 * UniProt Knowledgebase

Content and features
The goal of the GlyGen project is to integrate and disseminate data describing glycoconjugate and complex carbohydrate structure, biosynthesis, and function. GlyGen accesses and retrieves data from international sources, integrates and harmonizes this data, and provides an interface for exploration. The GlyGen web portal allows users to execute unique searches of these integrated datasets to mine for new knowledge that cannot be acquired through queries of isolated databases.
 * Data Collection - Data are collected with intensive data quality control. Metadata are captured using the BioCompute Object schema.
 * Data Integration - Data from the different resources are accessed and downloaded in resource-specific formats (e.g. RDF, FASTA, CSV) and mapped to common identifiers (e.g., accession numbers).
 * Quick Search - Complex multi-domain search queries can be performed using the "Quick Search" option, which is based on user-supplied use cases.
 * Explore Searches - Filtered Glycan, Protein, and Glycoprotein lists are generated using simple or advanced search options.
 * Data Visualization - GlyGen integrates Homo sapiens, Mus musculus, and Rattus norvegicus proteins, glycans, and glycoproteins.
 * Resources - A library of Glycobiology resources, including databases, informatics tools, learning material and tutorials are provided.
 * SPARQL Endpoint - All data sets are also RDFized using standard ontologies (e.g. UniProt RDF schema, GlycoCoO, FALDO) and made available via a public SPARQL endpoint.
 * Feedback - An integrated feedback system allows users to submit comments and suggestions on every web page.

Availability
The Creative Commons Attribution 4.0 International (CC BY 4.0) license applies to all GlyGen datasets, thereby permitting users to copy, distribute, display and commercialize the data in all legislations, provided appropriate credit is given. Project source code is released under GNU General Public License v3 and is available at GlyGen GitHub repository. GlyGen data is available without cost and can be accessed via GlyGen GitHub repository, Portal, Data, API, SPARQL.

Funding
GlyGen is funded by the National Institutes of Health (NIH) of the United States of America through the NIH Glycoscience Common Fund Program and is managed by the NIH Office of Strategic Coordination (grant # 1U01GM125267-01).