Glycan nomenclature

Glycan nomenclature is the systematic naming of glycans, which are carbohydrate-based polymers made by all living organisms. In general glycans can be represented in (i) text formats, these include commonly used CarbBank, IUPAC name, and several other types; and (ii) symbol formats, these are consisting of Symbol Nomenclature For Glycans and Oxford Notations.

History
In the beginning of the nineteenth century, names of sugar molecules were derived from their source. For example, glucose were called grape sugar (Traubenzucker), saccharose were called cane sugar (Rohrzucker). In 1838, the name glucose was coined; subsequently in 1866 Kekulé proposed the name 'dextrose' as glucose is dextrorotatory. It was decided by the scientific community that sugars should be named with the ending '-ose', which then was combined with the French word 'cellule' for cell, resulting in the term cellulose. As the empirical composition of monosaccharides can be expressed as Cn(H2O)n, they were termed as ‘carbohydrate’ (French ‘hydrate de carbone’).

Text tormats
To represent the structural information of glycans more accurate and achieve specific purpose for the community, several unique formats were designed and used in different carbohydrate databases developed through different research groups and organizations.

CarbBank
The CarbBank format is originally from CarbBank, a database management system for Complex Carbohydrate structure Database (CCSD). The CarbBank is created by researchers at the Complex Carbohydrate Research Center (CCRC) of University of Georgia. An example of an N-glycan of Man-3-Core F is shown below:  In general, this format is human-readable but the vertical bars make it difficult for a computer to parse.

IUPAC
IUPAC is the International Union of Pure and Applied Chemistry, and they propose a nomenclature for representing complex carbohydrates called 2-Carb. The IUPAC nomenclature provides three forms to represent the glycans.

The above example glycan can be represented as below:
 * Extended form: In this format, a monosaccharide unit is represented by a given symbol, after the anomeric descriptor and the configuration symbol. An italic letter is used to represent the ring size, e.g. f for furanose and p for pyranose. The parentheses between the symbols is used to provide locants of the linkage and a double-headed arrow is used to show a linkage between two anomeric positions.
 * Condensed form: This format eliminated both the configurational symbol and the letter denoting ring size. In general, the configuration is D (except for fucose and iduronic acid that are generally in L configuration) and the rings are in pyranose form (unless explicitly mentioned as in other form). The parentheses is used to write the anomeric descriptor along with the locants.
 * Short form: It is usually desirable to shorten the notation by eliminating the anomeric carbon atoms locants, the parentheses around the locants of the linkage, and the hyphens. Moreover, branches can be shown on the same line with the aid of appropriate enclosing marks including parentheses and square brackets.

Extended form: α-D-Manp-(1→3)-[α-D-Manp-(1→6)]- β-D-Manp-(1→4)- β-D-GlcpNAc-(1→4)-[ α-L-Fucp-(1→6)]- β-D-GlcpNAc-(1→NASN-protein

Condensed form: Man(α1-3)[Man(α1-6)]Man(β1-4)GlcNAc(β1-4)[Fuc(α1-6)]GlcNAc(β1-ASN

Short form: Manα3(Manα6)Manβ4GlcNAcβ4(Fucα6)GlcNAcβASN

Note:

Modified Condensed IUPAC: Manα1-3(Manα1-6)Manβ1-4GlcNAcβ1-4(Fucα1-6)GlcNAcβ1-Asn

LINUCS
Linear Notation for Unique description of Carbohydrate Sequences (LINUCS) is a format used in Glycosciences.de. This format is targeted to describe the glycan structure unique. The glycan example in LINUCS format could be:

Linear Code
Linear Code is a linear notation proposed by GlycoMinds Ltd. and is one of the most compact formats. Here, (i) the common monosaccharides are indicated by a maximum two letter code, (ii) linkages are indicated by “a” or “b” for anomers, (iii) the number are at the end carbon number linkage, and (iv) The branches are indicated by parentheses.

GlycoCT
GlycoCT is the format designed and developed under the EuroCarbDB project. This format uses connection table approach to describe the full complexity of carbohydrate sequence data. It is widely used by the bioinformatics community through the database GlycomeDB. A GlycoCT format of the example glycan is shown below:

WURCS
The Web3 Unique Representation of Carbohydrate Structures (WURCS) format is initially developed for GlyTouCan, the international glycan structure repository. As GlyTouCan used the Semantic Web technologies for development, it requires a linear string to represent the glycan. The example glycan in WURCS format as below:

KCF
The KEGG Chemical Function (KCF) is designed and used in Kyoto Encyclopedia of Genes and Genomes (KEGG) database. It also uses a connection table approach. The example glycan in KCF format as below:

CSDB Linear
Carbohydrate Structure Database (CSDB) includes the Bacterial (BCSDB) and Plant and Fungal (PFCSDB) parts. This database utilizes a connection table for internal storage of structures and the CSDB linear code for input–output.

GLYCAM Condensed
GLYCAM Condensed format, as well as GLYCAM format, is provided by GLYCAM-Web, which is produced by the research group of Professor Robert J. Woods in the Complex Carbohydrate Research Center at the University of Georgia in Athens GA.

Glyde and Glyde II
The GLYcan Data Exchange (GLYDE) format, is an XML-based representation format for glycomics data. It was a part of the Integrated Technology Resources for Biomedical Glycomics, which established by a team from Complex Carbohydrate Research Center of University of Georgia. GLYDE II, is the successor of GLYDE to overcome the limitations of GLYDE, uses a connection table approach.

CabosML
A carbohydrate sequence markup language (CabosML) is a description of carbohydrate structures using XML.

Symbol formats
Many glycobiologists use figures to depict the complex glycan structures. Currently, there are two major ways to represent glycans using symbols: Symbol Nomenclature For Glycans (SNFG) and Oxford Notation.

Oxford notation
The Oxford Notation was designed and developed by the researchers from Oxford Glycobiology Institute at University of Oxford in 2009.

Hybrid notation
To comply with the SNFG notation and respect the Oxford notation some drawing tools generate hybrid cartoons with the SNFG symbols (monosaccharides) and linkage orientation as set by Oxford.



Formats conversion tools
The scientific community has developed a number of software tools to convert glycans represented in one format to another. Some of these most commonly used tools are listed below:


 * 1) GlycanFormatConverter:  A core library of glycan text conversion tools, which encoding WURCS from IUPAC-Extended, KCF and LinearCode® for the great majority of glycans registered in GlyTouCan.
 * 2) RINGS: A web resource providing algorithmic and data mining tools to aid glycobiology research.
 * 3) glypy: An open source glycoinformatics library.