User:Kinkreet/Glycobiology

Glycobiology is the study of saccharides, or sugars. This includes sugar metabolism such as glycosylation and gluconeogenesis, but also glycans' molecular roles in cellular localization of macromolecules, modulating structural conformation of macromolecules; cellular roles in signalling, recognition/receptor binding, cell-cell/matrix adhesion, cell structure; and systemic roles such as in the immune system, inflammation and pathogenicity. Glycosylation is an important process within a cell and for whole organisms. The pathways of glycan synthesis are conserved.

Structure
Monosaccharides are characterized by two abilities - its ability to form rings, and its ability to form linkages. Monosaccharides can exist in a linear or a cyclic conformation. To form a ring from the linear conformation, one of the aldehyde group must react with an alcohol on the same chain forming an hemi-acetal, closing up the ring. To form linkages, the oxygen from the aldehyde group loses a charge and is nucleophilically attacked by a hydroxide/alcohol from another monosaccharide, linking the two disaccharide to form a disaccharide. Continuation of this process can lead to oligosaccharide.

Monosaccharides are polar - they have a reducing and non-reducing end. Subsequently, a chain of oligosaccharide will be polar also, with one end being the reducing-end, and one being the non-reducing end. When a glycan is conjugated to a protein or lipid, it is the reducing end which gets attached to the amino acid/lipid, leaving the non-reducing end at the terminals; thus, if a sugar chain branches, a glycan moiety can potentially have many non-reducing ends.

Glucose is a six-carbon sugar (hexose) and the most common ring conformation is in the pyranose form, where the ring has 6 vertices (5 from the carbons of the sugar and one from the hemi-acetal). The pyranose form is formed when the alcohol group on the C5 carbon nucleophically attack the aldehyde group on the C1 carbon. Alternatively, but rarely, if the C4 carbon is used as the attacking alcohol group, the ring would contain only 4 carbons and would be a five-member, furanose ring. In both cases, carbon 1 has a coordination number of 4 and thus is tetrameric and has sp3 orbitals. Carbons outside of the ring (C6 in pyranose and C5,6 in furanose have different properties to other carbons in the ring.

Anomers
Stereoisomers are isomeric molecules that have the same molecular formula and sequence of bonded atoms (constitution), but that differ only in the three-dimensional orientations of their atoms in space. Enantiomers are special cases of stereoisomers where the enantiomers are mirror-images of each other; diastereomers are stereoisomers that are not enantiomers. Epimers are diastereomers that differ in configuration of only one stereogenic center and anomers are a special case of epimer, where the stereogenic center is the hemiacetal carbon (a.k.a. anomeric carbon).

For monosaccharides, the different anomers are denoted by α if the the hydroxyl group is in an axial ('perpendicular' to the ring) projection, and β if the projection is equatorial. In glucose, all hydroxyl groups in the ring are equatorial, except C1, which is the anomeric carbon.

Common monosaccharides
Glucose can be modified to produce other sugars. Epimerization at C4 gives -Galactose and epimerization at C2 gives-mannose. -mannose can then be derivatized to -fucose, the only monosaccharide found naturally which has the absolute stereochemistry.

Addition of an acetamido group onto the C2 of glucose and galactose is common, yielding N-Acetyl--glucosamine (GlcNAc) andN-Acetyl--galactosamine (GalNAc). This process occurs in two steps - the first step involves replacing the 2-OH group with a quarternary ammonium cation, which is subsequently nucleophilically attacked by the oxygen on an acetyl group in the second step. GlcNAc and GalNAc are commonly found, whereas the intermediate sugar GlcN and GalN, are not commonly found.

Oxidation of C6 of glucose yields -glucuronic acid (GlcA), which is rarely found. Removal of C6 of glucose yields a pentose pyranose, which is not commonly found but is involved in some interesting biological processes.

Sialic acid is a general name for N-Acetylneuaminic acid (NeuAc), and is form by an adol condensation reaction between anN-Acetyl--mannosamine and pyruvate. Note that sialic acid is also used to describe any derivatives of NeuAc. Sialic acids contain 9 carbons and are often found at the terminal structures of glycans.

Structures
The term 'structure' refers to the covalent, 'primary' structure of the glycan, and conformation refers to the 3D structure, similar to 'tertiary structure' used to describe the 3D structure of proteins.

A conformation of a glycan can be changed without breaking any covalent bonds, simply by rotating around single bonds. This is in contrast to configurations, which requires the breaking of covalent bonds. Different conformations includes the chair, boat, half-chair conformations; and configurations can range from different stereoisomers and anomers.

Of the 500-1000 glycans estimated to exist inside human cells, only ~50 have had its conformation (3D structure) determined. This is because most glycans do not have a constricted conformation, and can change its conformation. The most preferred conformation is the 4C1 chair conformation, where the ring is in pluckered position and C4 and C1 are furthest away from each other, with C6 equatorial. There is also the 1C4 chair conformation, but this is not preferred because all hydroxyl groups are axial and this is not preferred. The least favourable is the boat conformation, unfavourable due to the steric hindrance of C1 and C4. The ring is rigid and so can only take on one of these conformations, although they can change from one conformation to another without breaking covalent bonds. The whole ring is staggered because this gives the least steric hindrance between the groups. While C1 to C5 have position defined by their particular conformation, C6 is free to rotate, however, it is free to rotate but still confined to the steric constraints, and need to have a ω angle of ~60° or ~180°.

The ω angle is the angle between the OH and C6 and between C5 and C4, measured when looking down the C6-C5 bond. There are also two more torsion angles - φ and Ψ. φ is the angle between O-C1 and O*-C4' when viewed down the C1-O* bond; and φ is then angle between C1-O* and C4'-C3' when viewed down the O*-C4' bond. There is alternate ways to define torsion angles based on hydrogens, since most NMR data are done with hydrogens.

Observed torsion angles for each linkage can be plotted on a plot, analogous to a Ramachandran plot for polypeptides. This is usually done with Ψ on the y-axis and φ on the x-axis, but for 2-6 linkages, a 3D plot is required to take into account the ω angle. Using this plot, we can show that a φ angle of ~60° is most favourable and minimizes close approach of orbitals and steric crowding.

The faces of the ring can be assigned as the A face or B face. The A face is the face where the carbon atoms are named in a clockwise manner, and the B face is the face where the order of the carbon are in an anti-clockwise manner.

The structure of glycans are almost always determined by local interactions such as hydrogen bonds between hydroxyl groups, or the van der Waals packing of planar faces, and not long-range interactions as in proteins. Glycans are smaller and so cannot fold back around to influence distant residues. Amylose is one exception. The glycosidic bond is also only one oxygen, meaning carbohydrates are more flexible and more susceptible to hydrolysis.

There are three main methods to study glycan conformations (not structure) - NMR, X-ray crystallography and computational models, or their combined effort . Because glycans are small and soluble, they are good for NMR studies as NMR can measure flexibility. However, the atoms are not in close proximity, only several hydrogens in certain conformations are in close proximity (<5Å), so although NMR is good, it actually gives little information. X-ray crystallography gives the structure of the glycan in relation to the rest of the glycoprotein. However, crystallizing conditions might mean that it is not in the native conformation. Torsion angles can change between each glycan and there is much heterogeneity, and so do not give a clear signal. There are some common polysaccharide conformations found in nature. Amylose is a repeated polymer with a preferred linkage of (Glcα1-4)n; amylose is also stabilized by local hydrogen bonds between OH2 and OH3. Mannan has a preffered linkage of (Manα1-2)n, is stabilized by hydrogen bonding. Polysiaclic acid has a preferred linkage of (NeuAcα2-8)n and this is favourable because the negatively charged COO- groups are far apart.

Cellulose form fibers by first polymerizing via Glcβ1-4Glc linkages, which forms a long linear chain, stabilized by intra- and interchain hydrogen bonds to form sheets. The sheets are then packed together by van der Waals interactions to form a cellulose microfibril, which twists together to form a cellulose fiber.

Energetics
The formation of a glycosidic bond typically requires 14kJ of free energy per mole of bonds formed; this is made possible because the formation of the bond is coupled with the hydrolysis of two ATP molecules, which provides a free energy of -61kJ/mole. So the overall reaction is favorable (ΔG=-47kJ/mole). The actual utilization of the energy is not at the time of the glycosidic bond formation between the sugars, but rather the two ATPs are used to synthesize each nucleotide sugar donor, where the sugar residue is attached onto a nucleotide (for proteins) or dolichol (for lipids) via the phosphate groups. This sugar-phosphate group mimics a glycosidic bond and have similar energies. Therefore, when the intermediate donor is transfered onto the acceptor using a specific glycosyltransferase, no further energy is required.

Variety
There are potentially hundreds of thousands of combinations of monosaccharides that can make up a 4-unit long glycan, with the variety coming from the 7-10 common constituents (Glc, Gal, GlcNAc, GalNAc, Man, Fuc, NeuAc, GlcA etc), different linkages and different anomeric configurations. However, only a much more limited number of combinations are seen, this is because glycosyltransferases are specific and so the combinations are limited by the number of glycosyltransferase a cell has. From genomic data, there are thought to be about 250 different glycosyltransferases in humans; from glycomic data, there are thought to be about 500-1000 glycans in human cells.

In order for the processing to occur in a stepwise manner, each compartment (ER; cis-, medial- and trans-Golgi) must have its own sets of glycosyltransferases. There are two hypothesis which tries to explain this localization of glycosyltransferases, and both have accumulated evidence arguing for their case. In the stationary cisternae model, the glycosyltransferases stay within its own compartment, and it is the processing glycoconjugate which is transported, via vesicles, from one compartment to the next. In the cisternal progression model, the whole compartment movies spatially, and the glycosyltransferases are retrograde transported back to a previous compartment using vesicles. Both model requires extensive sorting, but the cisternal progression model also explains how enzymes from the ER, which arrived with the precursor, are able to be transported back to the ER.

These different glycosyltransferases must also be grouped together so that the glycoconjugate can be progressively processed, passing from one enzyme to the next. Glycosyltransferases, and maybe other processing enzymes, can be grouped together by forming a complex via kin recognition; or it may also be grouped according to bilayer thickness, where the bilayer on the trans-Golgi is thought to be thicker than the bilayer on the ER/cis-Golgi.

When looking at membrane glycoconjugates, it was traditionally assumed that high mannose glycoconjugates were artifacts that were inside the cell, and only made apparent when the membranes were disrupted, resulting in an unfinished processing of the glycan. But it has been later confirmed that these are not artefacts, and that high-mannose glycoconjugates are indeed found on the cell surface.

In fact, the processing of the precursor in the Golgi can give rise to hundreds of glycans at any given time, and it depends on the availability of different glycosyltransferases, which in turn depends on their level of transcription. High mannose structures can exist on the membranes because it somehow skips subsequent processing steps.

Most glycans are linked to proteins or lipids to perform its function. The two most common types of linkages of glycans to proteins are the asparagine-linked (N-linked) glycans, and the serine- or threonine-linked (O-linked) glycans.

Other factors
Glycosylation is species-specific. Humans lack NeuGc that is present in mice; and so when using mice to study glycosylation, often a transgenic lacking NeuGc needs to be used. Mice also tend to have core fucose on more N-glycans.

Glycosylation can also be organism-specific due to polymorphism. These polymorphisms commonly alter the terminal elaborations; this is most prominently exemplified by the ABO blood group.

Glycosylation pattern often depends on the cell type and development stage, and also in disease state compared to healthy state.

Structure
All glycosyltransferases contain conserved Asp-X-Asp (DXD) motifs.

Mechanism
Different monosaccharide types are first attached to different nucleotides in the cytoplasm to form nucleotide sugars. Glucose and its derivatives (Gal, GalNAc and GlcNAc) are all attached to UDP; more precisely, UDP-Gal, UDP-GalNAc and UDP-GlcNAc all are first made as UDL-Glc, and then later modified. Likewise, mannose is attached to GFP to make GDP-Man, which can then later be modified into GDP-Fuc. NeuAc is attached to CMP.

The nucleotide sugars is then transferred onto the phosphate group of an intermediate donor - dolichol phosphate - located in the endoplasmic reticulum membrane. The dolichol is a lipid with 15-19 repeating units of isoprene and terminates at one end with an α-saturated isoprenoid group, while the other end attaches to the sugar via a single phosphate group (because the nucleotide also has a phosphate group, the dolichol-sugar now has two phosphate groups). Its role is to move and stabilize the sugar onto the membrane before the sugar is being transferred. The two GlcNAc residues that start every N-linked glycan chain, as well as 5 mannose (branched into two in the formation 1-(3)(1)) is added in the cytoplasm by specific glycosyltransferases; each sugar requires a different enzyme because they are not equivalent.

The sugar-dilochol conjugate is then flipped so that the sugar group is now in the lumen. The mechanism of this process is unknown, but the enzyme which mediates this flipping is termed the flippase. After being flipped, further monosaccharides are added onto the existing glycan chain using specific glycosyltransferases, with new monosaccharides being supplied by dolichol-p monosaccharides, which were formed in the cytoplasm and then flipped over. The final structure of the pre-processed, precursor glycan is always two GlcNAc, followed by 9 branched mannoses (two primary branch, with one branch then further branching into two secondary branches), and three more glucose attached to the primary branch.

In N-linked glycan synthesis, the whole untrimmed glycan is transferred, en bloc, to the protein. This is mediated by an oligosaccharyltransferase (OST), which specifically transfer Glc3Man9GlcNAc2 onto an asparagine residue in the consensus sequence Asn-X-Ser/Thr. This process may occur co- or post-translationally; if the former is true, then the glycan must be added on before the protein is folded. The same protein can have different glycosylation patterns, these different versions are termed glycoforms. Almost all proteins have more than one glycoforms, although there are exceptions.

After the precursor Glc3Man9GlcNAc2 is added on, the two outermost glucose residues are then taken off in the ER sequentially using first ER glucosidase I, and then glucosidase II. Monoglucosylated proteins then selectively bind to membrane protein Calnexin (CNX) and its soluble homolog Calreticulin (CRT) to enter the calnexin/calreticulin cycle. Mannoses must then be stripped back to the base of the primary branch using ER or Golgi mannosidase (one in the ER and three - IA, IB and IC for the first unbranched primary branch; and II for the second branched branch - in the Golgi), before other residues are added on. The hydrolysis of glycosidic bonds is energetically favorable, and do not require energy input; in fact, β-N-acetylhexosaminidase is a non-specific glucosidase that is mostly used by bacteria to degrade all types of glycans for energy. As seen, many glycosidases are quite general, but the glucosidases and mannosidases in the N-glycan pathway are much ore specific.

N-linked glycans are grouped into three categories - high mannose, complex and hybrid. All three groups have a common core consisting of 3 mannoses attached to a chitobiose core. This core has two GlcNAc resides steming from the asparagine, then to a mannose through a non-conventional 1-4 β linkage, to which two mannoses branch using 1-6α and 1-3α linkages. High mannose glycans are defined as any glycans which contains only mannoses (usually up to 9 mannoses to one glycan) in addition to the chitobiose core; a complex N-glycan has monosaccharides of other types attached to each of the branched mannoses; and a hybrid is where one branch remains mannose-exclusive, whereas the other branch contains other sugars. These different structures are created in the Golgi apparatus.

After the initial trimming of the sugar residues, new monosaccharides can be added in a step-wise manner in the Golgi apparatus. Nucleotide sugars are transported into the Golgi in a electroneutral manner; usually, the transporter is an antiporter where the nucleotide sugar enters at the same time as the nucleosidemonophosphate. There is one transporter for each sugar. After the nucleotide sugar enters the Golgi, it is transferred onto the existing glycan and releases a nucleosidediphosphate, which is converted into nucleosidemonophosphate (exported via the antiport transporter) and phosphate, which leaves via a phosphate transporter on the Golgi membrane.

The addition of new sugars can give rise to branches and long repeats, which must be added on by a specific transferase. Glycans can have branching structures steming from the mannoses in the core. A glycan with two branches (one from each of the mannoses) is termed bi-antennary, those with 3 branches are termed tri-antennary and et cetera. In theory, there can be even more antennas, but these are rarely observed.

Some glycans, such as those accommodating a recognition domain, needs to have its branches projected away from the protein/cell so that it can be recognized. The most common type of extension is the type II polylactosamine extensions, which takes the Galβ1-4GlcNAcβ1- sequence and repeats it many times, before being capped by NeuAcα2-. An alternative uses Galβ1- 3 GlcNAcβ1- repeats instead.

Because there are many different enzymes in the Golgi competing with each other, other modifications on top of branches and repeats may be seen. One such modification is the modification of the core glycan. The most common modification to the core is the addition of a fucose to the GlcNAc that is attached to the asparagine; this is through a Fucα1-6GlcNAc linkage. Because this linkage is close to the protein, it often has an effect on glycan-protein interactions. Another modification is the attachment of a GlcNAc monosaccharide (termed a bisecting GlcNAc) by GlcNAc transferase III to the stem-mannose in the core through a GlcNAcβ1-4Man linkage. This is not a branch because no further monosaccharides can be added to this. It also disrupts the structure of the glycan and so stops tri- or tetra- antenna from forming.

Often multiple enzymes compete for the same substrate to be modified; in these cases, the enzyme with the highest concentration often succeed in the modification; and this, in turn, depends on the level of expression of these enzymes. However, these 'non-essential' (in the sense it is not required to make every glycan, unlike ER glucosidases I & II and ER mannosidase) enzymes are present in very small numbers and so this rule is not as clear-cut as first appear.

Most diversity occurs at the non-reducing end because it is the least hidden and most likely to interact with other macromolecules. But although there is a vast possibility for different combinations of monosaccharides and linkages at the ends, there are only a few that are commonly found. The sequence NeuAcα2-6Galβ1-4GlcNAcβ1- is commonly found at the end of glycan chains. Most terminal elaborations are either this sequence or some derivative of it, such as 3'-sialyllactosamine (NeuAcα2-3Galβ1-4GlcNAc), 6'-sialyllactosamine (NeuAcα2-6Galβ1-4GlcNAc), Sialyl LewisX (NeuAcα2-3Galβ1-4(Fucα1-3)GlcNAc), and if the NeuAc group is substituted with an sulfate group. Although recognition do not usually require the sugars or the linkages to be exact, a significant number of instances do make a difference.

Other post-translational modifications exists in the Golgi, but glycosylation is by far the most prominent. After modification, the processed glycan then buds off and delivered to the plasma membrane or other membranes to be secreted or incorporated into the membrane.

O-linked glycans
O-linked glycosylation tends to occur on the extended regions of proteins.

O-glycosylation is restricted to the cis to trans Golgi compartments and not in the ER and occurs after the protein has folded. Most O-linked glycan is initiated by peptidyl GalNAc-transferases transferring a reducing terminal N-acetylgalactosamine (GalNAc) onto the peptide. These are the most common type of O-glycan and is known as the ‘mucin-type’ glycan; these are found in all animals and some higher plants. Other monosaccharides such as mannose (Man), fucose (Fuc), glucose (Glc), Gal or xylose (Xyl) have also been observed to be at the reducing terminal.

There are no en bloc transfer and each monosaccharide is added individually onto the core, in a similar manner to N-glycans, using nucleotide sugars which are imported into the Golgi using antiport nucleotide sugar transporter. Nucleoside sugar transferases have a transferase domain which carries out the actual transfer, and a R-type carbohydrate recognition domain (CRD) which recognizes the sugar. There are ~20 redundant but different GalNAc transferases; these are not sequence-specific but can recognize different sequences. Because of this, it is hard to produce transgenics lacking these GalNAc-transferases because one'd have to mutate at over 20 different sites. In contrast, there is only one Gal-transferase, of which its knockout is embryonic lethal.

There are also no defined sequences for attachment of O-linked glycans, but the amino acids around the attachment site tends to be clusters rich in serine, threonine, alanine and proline (and to a lesser extent hydroxyproline and hydroxylysine). This support the fact that O-glycans are often found at positions with no particular structure. The actual glycosylation site (P0), however, must always be either serine or threonine. The reducing terminal GalNAc residue can be further extended with galactose (Gal), N-acetylglucosamine (GlcNAc) or GlcNAc and Gal resulting in 8 common core structures, which are often further decorated with the addition of up to three sialic acid residues. The complexity and variety seen in O-glycans is much greater than seen in N-glycans. For example, Core 1 is the reducing end-GalNAc and a β1-3 linked Gal; Core 2 is Core 1 with a GlcNAc β1-6 linked to the GalNAc.

Secretory mucin MUC1 and MUC2 are heavily O-glycosylated. Because glycans in general are much more hydrophilic than peptides, they are able to retain water well, and these secreted water-retaining mucins help to protect epithelial surfaces. On the molecular level, O-linked glycans tend to be found in small clusters at hinge regions, where it protects the flexible region from proteolysis. As such, O-linked glycans are found at the hinge regions of immunoglobulins.

Glycans takes up physical space, so if a membrane polypeptide chain is heavily glycosylated, it will cause the peptide chain to stand upright and project its end structures away from the membrane. The terminal structures of glycan are often the part that carries out a glycan's function. In the P-selectin glycoprotein ligand 1 (PSGL-1), the N-terminal tyrosines (46, 48, 51) are sulfated, while Thr57 bears a glycan which has the sialyl Lewisx structure which is responsible for binding to P-selectin and this is needed in the rolling of cells on the endothelium for extravasation of leukocytes, but also in the metastasis of cancer cells.

O-GlcNAcylation
O-GlcNAcylation is an exception for glycosylation events because it occurs in the cytoplasm and nucleus. O-GlcNAcylation involves an N-acetylglucosaminyl being O-linked to a serine and threonine. A diverse range of phosphoproteins such as RNA polymerase II and its associated transcription factors, cytoskeletal proteins, nucleoporins, viral proteins, heat shock proteins, tumor suppressors, and oncogenes, undergo O-GlcNAcylation. These are all phosphoprotein complexes consisting of many different subunits, hinting at the fact that O-GlcNAcylation may play a role in protein-protein interaction On these phosphoproteins, O-GlcNAcylation and phosphorylation are mutually exclusive, and the site for both are very similar; this suggests that O-GlcNAcylation is an antagonist to phosphorylation, regulating phosphorylation-dependent functions.

ER-Golgi
L-type lectins (named from like soluble legume lectin) mediate the trafficking and sorting of glycoproteins through the biosynthetic pathway, namely the ER and Golgi. ERGIC-53 is a type I integral membrane protein with a large luminal domain and a short (12 residues) cytosolic domain ; its cytoplasmic motifs has a sorting signal that are attracted to budding vesicles and recruits correctly folded glycoproteins to be transported via vesicles to the Golgi . The vesicle or compartment intermediate between the ER and Golgi is known as the ER-Golgi intermediate compartment (hence the name ERGIC). Here the glycoprotein dissociate from ERGIC-53 and onto VIP36, which delivers it to the cis Golgi and accompany the glycoprotein as it matures; the ERGIC-53 receptor is recycled back to the ER to carry out another round of transport.

Haemophilia is a disorder that impairs the ability of an organism to form blood clots, which is required to stop bleeding. There are different types of haemophilia - type A is associated with clotting factor VIII deficiency, type B with clotting factor IX deficiency and type C with factor XI deficiency. Although haemophilia are usually an X-linked hereditary disorder, there are also an uncommon autosomal recessive disease caused by combined deficiency of factors V and VIII; patients usually have about 5-30% of normal levels of these factors, and present symptoms similar to mild factor VIII deficiency. The genes for these factors are not mutated or defective, instead, the ERGIC gene has null-expression, meaning the clotting factor is not secreted or not in high numbers .

Not all glycoproteins associate with ERGIC, only the most heavily glycosylated proteins tend to.

Lysosomal enzymes
Mannose 6-phosphate is a key targeting signal for the lysosome, this is convinient because a single common tag can be used to target all lysosomal enzymes. It is synthesized by first adding a GlcNAc-phosphate onto a mannose at OH6; this is mediated by N-acetylglucosamine-phosphotransferase, this enzyme must be highly specific to lysosomal enzymes. A N-acetyl-glucosaminidase then takes off the GlcNAc to leave it with only the phosphate group attached. An α1,2-mannosidase then takes off any mannoses linked to the phosphorylated mannose; this enzyme is commonly found in the Golgi.

Mannose 6-phosphate receptor is required for targetting glycoproteins to the lysosome. Defects in the mannose 6-phosphate receptor causes lack of functional lysosomes and also degradation of the extracellular matrix due to lack of their breakdown enzymes. There are two types of M6P receptors - cation-independent (CI) and cation-dependent (CD). Both the CI and CD receptors cycles between the trans-Golgi and endosomes, but only the CI receptors cycles from the endosomes to the cell surface.

The cation-independent (CI) M6P receptors have a large lumenal domain with 15 P-type like subdomains. The subdomains near the membrane binds to IGF-II (to down-regulates its activity), and only two subdomains (3 and 9, and weakly does 5) near the top binds to M6P. The cation-dependent (CD) M6P receptors are smaller and have only one extracellular domain, and this binds to M6P; CD M6P receptors exists as a dimer, with each binding site facing opposite ends. Thus, both the M6P receptors bind in a 1:2 ratio.

The two M6P receptors are able to bind to M6P at the pH of the lumen or extracellular space, but at pH 5 in the lysosome, the ligands are released. The binding requires Mn<sup2+ to help in coordination of different groups, but is not actually involved in coordinating with the sugars, as seen in C-type lectins. Two arginine residues (Arg135 and Arg111) is critical (and conserved) in binding of the M6P. KO of the CI receptor led to lack of lysosomal targeting, with 70% of the lysosomal proteins being secreted instead; this is also true for the CD receptor.

I-cell disease
N-acetylglucosamine-phosphotransferase is required for the initial step for the phosphorylation of mannoses into mannose-6-phosphate (M6P). A deficiency in this enzyme leads to lack of targeting of lysosomal enzymes to the lysosome, and so the lysosome is no longer able to degrade unwanted macromolecules, and so these accumulate in the lysosome and leads to swelling. This is particularly detrimental in the brain.

Nomenclature
Glycosyltransferases are named after the monosaccharide of the donor sugar; so galactosyltransferase adds galactose and sialyltransferase adds NeuAc. Glycosyltransferases which catalyse specific linkages are given prefixes - first the anomer and the acceptor sugar, followed by the linkage and the primary name. Because many different enzymes carry out this function, the same name can be used for enzymes of different structures.

Exoglycosidases are named after the monosaccharide which it releases; this is usually at the non-reducing end and the exoglycosidase often have specificity for a linkage, while other are more general.

Symbol nomenclature
Drawing out the complete chemical structure, or even the IUPAC nomenclature is time-consuming. Instead symbols can be used to denote structures of glycans, first done in 1978 by Kornfield. The symbol nomenclature has since been updated and a standardised symbols and designation of linkages was agreed to allow most of the information to be carried in the symbol nomenclature.

Different monosaccharide type (e.g. hexose) have the same shape, and isomers are differentiated by color/black/white/shading. Hexoses are circles, N-Acetylhexosamines are squares and hexosamines (which are rare) are designated squares which are divided diagonally, Fucose is a red upward-pointing triangle, and acidic sugars are diamonds, xylose is designated as an orange, five-pointed star.

The same shading/color is used for different monosaccharides that has the same stereochemical designation (e.g. Gal, GalNAc, GalA all have the same shading/color). Modiﬁcations of monosaccharides (e.g., sulfation, O-acetylation) are indicated by associated small letters, with numbers indicating linkage positions, if they are known.

The configuration of all sugars are defaulted to the -configuration, apart from -fucose and -iduronic acid, which are in the -configuration. All glycosidically-linked monosaccharides assumed to be in the pyranose form. All monosaccharide glycosidic linkages are assumed to originate from the 1-position except for the sialic acids, which are linked from the 2-position. The most common linkages (i.e. Galβ, GalNAcα, GlcNAcβ, Manα, Fucα, Sialic acid α) are assumed, unless indicated. Linkages may also be indicated by the position of different symbols are draw from.

Alternative nomenclature systems have been proposed, albeit with less acceptance.

Prokaryotic
There are many different classes of glycans and they are located at different locations in a cell as well as in the systemic level. The peptidoglycan cell wall of Gram+ bacteria, and the lipopolysaccharide layer on the outer membrane of Gram- bacteria, all contains glycans. Different bacteria tends to display different sugars, and in this way different bacteria types can be distinguished from each other.

Eukaryotic
Glycosylation almost always occurs in the lumenal compartments of the biosynthetic-secretory pathway; this mainly includes the endoplasmic reticulum (ER) and the Golgi apparatus. For this reason, glycoconjugates tend to be secreted entities or are present on membranes (including the cell plasma membrane). There is one exception to this, where O-GlcNAcylation occurs in the cytoplasm and nucleus.

Regulation
The sugars and glycans are not encoded in the genome, and so regulation of the effects of glycosylation is done indirectly through regulating the proteins and enzymes which performs the glycosylation; if the glycan functions by interacting with protein receptors (lectins), then the expression of lectin(s) can also be regulated to control the glycobiology. These enzymes are expressed differentially according to cell type and developmental stage.

??
Glycans are made up of monosaccharides linked up together.

Human fertilization
Human fertilization begins when spermatozoa bind to the extracellular matrix coating of the oocyte, known as the zona pellucida (ZP). This binding is known to be facilitated by carbohydrates sequences on the ZP, primarily the sialyl-Lewisx ligand. Sperm-ZP binding was largely inhibited by glycoconjugates terminated with sialyl-Lewisx sequences or by antibodies directed against this sequence.

Definition of midline
The Notch protein is a receptor that is activated by binding to its ligand. After binding, the extracellular region of the Notch protein is then cleaved by a metalloprotease (the S2 cleavage), to which the product of the S2 cleavage is subject to further intramembrane cleavage by γ-secretase (the S3 cleavage). After S3 cleavage, the intracellular domain is now free, and moves to the nucleus where it will act as a transcriptional activator.

The Notch receptor has been shown to undergo extensive post-translational modifications, such as proteolytic processing, glycosylation, ubiquitination, and endocytic trafficking

In Drosophila, the rumi gene encodes for an O-glucosyltransferase that attaches glucose sugars to serine residues in the multiple EGF domains of the extracellular region of Notch. The level of Rumi did not alter Notch signalling activity in cells with a truncated form of Notch in which most of the extracellular region is missing(NECN, which mimics the product after the S2 cleavage). Thus, Rumi is likely to act downstream of the S2 cleavage. Rumi has been located in the endoplasmic reticulum, and has been speculated to act either as a glycosyltransferase or directly as a chaperone. O-fucosyltransferase 1 (Ofut1) is a another glycosyltransferase found in the ER which transfer fucose onto EGF domains, and it is found to act as a chaperone that ensure correct folding of Notch.

Point mutations in the rumi gene leads to impairment in Notch signalling, and so it is likely O-glycosylation by Rumi is required for Notch function. Despite the mutation of rumi, the Notch receptor is still able to translocate to the cell surface. The affinity of the Notch receptor in these mutants are also not reduced. The factor which leads to its loss of function is the O-glycosylation of the EFG domains of Notch. There are usually 36 EGF domains in Notch, and they are extensively glycosylated; in Drosophila Notch, only 5 of its 36 domains are devoid of both O-fucose and O-glucose sites.

fringe connection (frc) is a gene encoding a UDP-sugar transporter, most significantly in transporting UDP-NAcetylglucosamine (UDPGlcNAc); its mutation leads to phenotypes similar to Notch processing defects. Fringe is a different glycosyltransferase that attaches GlcNAc to O-fucose, originally added on by Ofut1.

The dorsal (back) and ventral (front) side of an organism is separated by a midline. The definition of the midline, and thus the regulation of the dorsal-ventral axis is mediated by Notch signalling. For Notch signalling to be activated, it requires binding to its ligands Delta and Serrate, and the glycosyltransferase Fringe must be expressed. Because Serrate and Fringe are only expressed on the dorsal side, and Delta is only expressed on the ventral side, only at the midline will all these be expressed together. Thus, only at/around the midline will Notch be activated.

Protein glycosylation
There are three types of oligosaccharides in mammalian glycoproteins - N-linked, O-linked and glycosylphosphatidylinositol (GPI) lipid anchors. N-linked glycans are attached to its protein via an amide bond to an asparagine residue within the consensus sequence of Asn-X-Ser/Thr (X cannot be Pro). O-linked glycans are attached via the hydroxyl group of serine or threonine, although there are no strict consensus sequence. A protein usually have multiple glycosylation sites of both N- and O-linked glycans, often at defined domains. Not all sequence which satisfy the consensus sequence are are glycosylation sites, and not all glycosylation sites are glycosylated at any one time; this variation gives rise to heterogeneity in the mass and charge of the glycoprotein.

The function of glycosylation can be split into the intrinsic functions and extrinsic functions. Intrinsic functions are those which requires only the glycan, such as for providing structural architecture in cell walls and extracellular matrices, or by modifying the physical/chemical properties (e.g. solubility, stability, half-life) of a protein. Extrinsic functions are those which requires the glycan to interact with a receptor (a lectin), such as for cell signalling, cell-cell/matrix adhesion, or for directing trafficking of glycoconjugates such as endo/exo-cytosis.

There are some glycoproteins which cannot be made in certain organisms, humans lack NeuGc that is present in mice; and so when using mice to produce human glycans, often a transgenic lacking NeuGc needs to be used.

Trypanosomes
The surface of trypanosomes are lined with many different copies of a variant surface glycoprotein. These are made with different sequences and in different number of copies to prevent antibodies from recognising them all and mounting a large immune response at it. There are more than 1000 alternative genes to give rise to the huge variety. If the surface glycoprotein is recognized, the GPI anchor can be hydrolysed to release the glycan, replacing it with that of a different sequence.

CD2
CD2 is an immunoglobulin-like molecule found on the surface of killer T cells. They are used to recognize CD48/58 (also Ig-like) on surface of target cells to induce the killing. The interaction between CD3 and CD48/58 is mediated by glycans on the variable (V) domains. PNGase F digestion of N-glycans lead to null-binding to CD58, and mutations in Asn65Gln and Thr67Ala, required for N- and O-glycans attachment led to lack of binding. From structural studies, it is found that the glycosylation site is separate from the glycan binding site, which binds via hydrogen bounds. The binding of the glycan stabilizes the structure of CD2 by allowing to take on a more folded structure; the nature of the sugar it binds to does not make a huge difference.

Immunoglobulins
Glycans are present at the hinge region to prevent the hinge from being hydrolysed and also to prevent too much flexibility. But another glycan is also present between the constant heavy chain domain 2 (CH2) which are required for keeping the domains apart, a feature that is required for the binding of Fc receptors and for fixing complement. The glycans are attached to Asn297 and are bi-antennary glycans terminating in galactose. It lies 'within' the protein and not exposed to the outside, and so will not get recognized by asialoglycoprotein receptors, unless the Ig is denatured and exposed to the cell surface.

Immunoglobulins are intensively studied, and are used in many therapies, such as to directly target cancer cells by tagging on to cancer-specific markers. This glycosylation complicates the production of imunoglobulins (e.g. antibodies) in bacteria, because they do not glycosylate in the same manner as in humans.

PMP-C
PMP-C is an insect protease inhibitor onsisting of 3 β-strands forming a β-sheet. It can be synthesized chemically due to its small size and simplicity. A fucose attachement at Thr9 have been shown by proton exchange, NMR, circular dichorism and calorimetry, to stabilize the whole protein.

The best evidence comes from proton exchange. When the proton on the NH groups of peptide bonds are swapped for deuterium (D), it will not long appear on a NMR because it is no longer paramagnetic. This exchange is done by putting the sample in D2 solution for different lengths of time. The rate of exchange is determined by how exposed the hydrogen is to the environment. When the fucosylated PMP-C are tested, a lower rate of exchange is observed, meaning the β-sheet is more folded as the peptide bonds are not exposed to the environment. Without fucosylated, the peptide spends more time in the unfolded state.

Fucosylated structures also requires a higher temperature to denature.

Plasmin
The glycan also protect the protein from proteases, and so can be used as a mechanism to modulate proteolysis. Plasmin is a serine protease bound in blood that degrades many blood plasma proteins, especially fibrin clots. Plasmin is released as a zymogen called plasminogen (PLG) from the liver into the systemic circulation. There are two major glycoforms of plasminogen, each with apparently different target preference. In circulation, plasminogen adopts a closed, activation resistant conformation. Upon binding to clots, or to the cell surface, plasminogen adopts an open form that can be converted into active plasmin by a variety of enzymes, including tissue plasminogen activator (tPA), urokinase plasminogen activator (uPA), kallikrein, and factor XII (Hageman factor).

The tissue plasminogen activator (tPA) itself needs to be cleaved at Arg275 for it to perform its function. This cleavage is mediated partly by plasmin itself, and so there is a positive feedback mechanism. There are two major glycoforms of tPA - type 1 form (N-glycosylated at Asn184), or type 2 (unglycosylated) form. Type 1 is cleaved slower than type 2, and so activates plasmin slower.

Follicale-stimulating hormone
The N-glycosylation of the human follicle-stimulating hormone (FSH) is required for their function. Mutagenesis studies show that when the glycan is removed, the affinity for the receptor (adenyl cyclase) increases, but it does not cause activation. It is suggested that the glycan ensures the hormone binds to the receptor at the correct orientation to allow for activation. The glycan is not itself in the binding site.

General
Experimental evidence have shown that attachment of sugars, or lactose, or 10kDa dextran, to chymotrypsin increased the therml stability stability (denaturation temperature) proportionally to the numbers attached. Computational analysis of the effect of attaching N-glycans to the SH3 domain also shows increased thermal (both kinetic and thermodynami) stability when attached to the SH3 domain.

It appears irrespective of what the sugar is, the introduction of non-native glycans leads to increase in stability. This is likely due to the fact that the overall fold is stabilized by the presence of local interaction with the glycan.

Glycolipid
There are two different types of lipid portions found in glycolipids - glycosphingolipids and phosphatidyl lipids. Phosphatidyl lipids contain phosphatidic acid and includes phosphatidyl ethanolamine (PE)/serine (PS)/choline (PC) - major components of the plasma membrane - and phosphatidylinositol (PI), a minor component in the cytosolic side of eukaryotic cell membranes used for signalling through its differential phosphorylation. PI can also be attached to proteins as a GPI anchor to anchor the membrane protein onto the membrane. Glycosphingolipids contain the ceramide group (consists of a sphingosine and fatty acid). Sphingolipids are found in essentially all eukaryotes and in some prokaryotes and viruses, and contributes to cell structure and signaling. There are huge diversity in sphingolipids from differences in their backbone as well as glycosylation patterns.

There are two classes of glycosphingolipids - glucosylceramides, linked to the glycan through a Glcβ1 linkage; and galactosylceramides, short glycolipids containing a Galβ1 linkage. Within the glucosylceramides, lipids can have ganglio sphingolipid cores, neolactosphingolipid cores, or globospingolipid cores.

Lipids are produced in the smooth endoplasmic reticulum (SER), where glucose from UDP-Glc in the SER, is transferred onto the ceramide on the cytoplasmic side. The whole entity then flips over so the carbohydrate faces the lumen, and transported to the Golgi where more monosaccharides are added, and eventually moves to the plasma membrane. A similar process occurs for galactose, but the ceramide is flipped first and then Gal is transferred from UDP-Gal onto it; furthermore, it can only have one further modification, which is the addition of a sulphate group on the 3' OH group of Gal, mediated by 3'-phospho- adenosine 5'-phosphosulphate (PAPS).

The addition of glucose onto the ceramide, and thus the glycan part of the glycolipid, is essential as ceramide glucosyltransferase knockout is embryonic lethal, and causes neurodegeneration when knocked out at a later stage. Inhibiting the glycan chain to lengthen, such as by knockout of β1,4-GalNAc transferase leads to a loss of GM2/GD2 lipids and causes myelin degradation and male sterility, dysmyelination with accompanying motor behavioral deficits, although the mice do develop normally and do not show much signs of disease until it has aged. It is thought that this is due to a reduction in the protein (but not mRNA) levels of myelin-associated glycoprotein (MAG, or siglec-4) on the surface of myelin, meaning the myelin sheath is not held together but becomes detached.

All glucosylceramides with a gangliosphingolipid core are sialylated. The sialylated substrate can be found on the axolemma, or cell membrane surrounding the axon. These sialylated glucosylceramides (GD1a and GT1b) can bind to myelin-associated glycoprotein (MAG, or siglec-4), which is preferentially expressed on the innermost myelin wrap adjacent to the axon. The binding of MAG prevents nerve regeneration. The knockout of the galacosylceramide synthase, which transfer galactose on galactosylceramides, leads to gradual myelin breakdown, presenting as limb tremor followed by paralysis.

Glycolipid anchors
Glycolipid anchors anchor proteins to the membrane. An ethanolamine group is attached to the protein, which is in turn linked to a glycan structure through a phosphodiester bond; the glycan structure is in turn attached to an inositol moiety, which in turn is attached to a phosphatidic acid. The biosynthesis of the GPI anchor begins with the transfer of GlcNAc onto the 6-position of inositol. The acetate group is then removed to give a GlcN residue, making the overall entity positively-charged. The GlcN-inositol-phosphatidic acid is then flipped over into the lumen of the endoplasmic reticulum, where mannoses, ethanolamine are added to the glycan chain, and a fatty acid, such as palmitate, is added to the inositol. The protein to which is to attach to is synthesized by ribosomes on the rough endoplasmic reticulum (RER) and remains attached to the membrane through a C-terminal tether sequence; it is then cleaved and transamidated onto the ethanolamine on the third mannose; no energy is required.

The fatty acid attached to the inositol, as well as the ethanolamine on the second mannose (from the core) is removed. It is unclear what their function is in the first place, but their addition and removal are necessary steps. The GPI anchor-protein then is transported to the Golgi, where the unsaturated fatty acid of the phosphatidic acid is replaced with a saturated fatty acid which have more affinity to cholesterol, and thus also to lipid rafts. The GPI anchor-protein is first transported to the plasma membrane, which will subsequently be sorted into lipid rafts and caveoli, a special type of lipid raft with 50-100nm flask-shaped invaginations distinct from clathrin-coated vesicles, and is required for many cell functions. These caveoli consists of many filamentous structures on the surface, including caveolins, and may be responsible for the transport of small molecules, such as folates, across the plasma membrane directly into the cytoplasm, a process known as potocytosis.

Trypanosomes
The GPI anchor in trypanosomes are different to human's, and consists of lipophosphoglycan (LPG) and glycosylinositol phospholipid (GIPL), and are attached to variant surface proteins that shields and hide conserved proteins on the plasma membrane from recognition by the immune system. If these variant glycoproteins are recognized, they can easily be ditched by expressing phospholipase C, which releases the glycoprotein.

However, the biosynethesis of GPI anchor in trypanosomes uses different enzymes and occur in slightly different order, thus drugs can be developed which are specific against trypanosomes and not humans.

Membrane glycosylation
Because of the way glycans extend away from the cell surface, they are the most prominent structures on cell surface - when anything approaches the cell surface, they are likely to encounter glycans first. Almost all membrane proteins are glycosylated. However, glycans are hard to characterize because non-globular glycans cannot easily form crystals.

For monotopic proteins, and type I (C-terminus in cytoplasm) and type II (N-terminus in cytoplasm) single-pass transmembrane proteins, glycosylation typically occurs near the membrane, in the stalk or neck regions; polytopic membrane proteins are often glycosylated in a single loop.

Calnexin/Calreticulin chaperone system
After the Glc3Man9GlcNAc2 is added onto an asparagine residue, the outermost two glucose residues are removed. The GlcMan9GlcNAc2 is then bound by ER-resident membrane-bound calnexin or its soluble homolog calreticulin, which retains the glycoprotein in the endoplasmic reticulum and encourage the correct folding of the protein. Calnexin is membrane bound and this favours interaction with membrane-bound glycoproteins, and the soluble calreticulin favours more soluble glycoproteins.

Calnexin/calreticulin have an ER localization motif on its tail which keeps it in the ER. It has a globular lectin domain which binds to the glycan portion of the glycoprotein and provides specificity, and encourage its correct folding because it has a P domain, a proline-rich arm, which folds around the glycoprotein while also interacting with folding factors such as the thiol oxidoreductase ERp57, which is a disulphide bond isomerase that catalyzes disulfide bond formation, reduction and isomerization. Calnexin/calreticulin requires binding of, ATP and non-native polypeptides to function.

The lectin domain of CNX/CRT have folds similar to leguminous lectins and mainly consists of two β-sheets curved to form a β-sandwich. The lectin domain also contains a high affinity Ca2+ binding site, which is required for stabilization of the protein but not in determining specificity.

The proline-rich P-domain of calnexin is made up of 8 units, the first four units are what is termed type 1 repeat motifs, and the next four units are copies of a second type 2 repeat motif. This P domain arm interrupts the lectin domain between residues Pro270 and Phe415 and interacts with the thiol oxidoreductase ERp57. The P arm domain of calreticulin is similar to that of calnexin, but consits only of 3 repeats of each type.

After being bound to calnexin/calreticulin, the glycoprotein is passed on to the next enzyme in the processing machinery, glucosidase II which takes off the remaining glucose residue. The removal of the last glucose residue by glucosidase II releases the protein from CNX/CRT ; alternate models also suggests that the glycoprotein first dislocate before the last glucose residue is removed. Correctly-folded proteins are then further trimmed and processed, eventually exiting the endoplasmic reticulum. Unfolded or misfolded proteins are recognized by the enzyme UDP-glucose-glycoprotein glucosyltransferase, which recognizes the hydrophobic patches on misfolded proteins and will not recognize correctly-folded proteins. Glucosyltransferase add a single glucose onto the glycan to allow it once again bind to calnexin/calreticulin, to remain in the ER and have another shot at folding properly.

In cell lines where calnexin and/or calreticulin are inhibited/knocked out, the cells remain viable but the secretion of proteins is affected. Due to the similarity between calnexin and calreticulin, they are able to partially compensate for each other. This is not observed in vivowhere calreticulin KO mice is embryonic lethal, and calnexin KO mice can be born but is quite sick and develop motor defects within a few days.

Endoplasmic reticulum-associated degradation (ERAD)
A protein will go through several cycles of the calnexin/calreticulin system before being degraded in the endoplasmic reticulum-associated degradation (ERAD).

M-type lectin (EDEM1,2,3) are homologous to the mannosidase that take off the 1-mannose, but with less mannosidase activity. EDEMs trim off mannose from terminal misfolded proteins, and this comments the glycoprotein to the ERAD pathway. The trimmed glycans are recognized and bound by OS-9, which in turn binds to a complex of SEL1-HRD1 on the ER membrane, and is translocated across the membrane into the cytoplasm. The mechanism of translocation is unknown. After translocation, the glycoprotein is bound by the E3 Ubiquitin ligase, in complex with the E2 Ubiquitin-conjugating enzyme and E1 Ubiquitin-activating enzyme to add on ubiquitin to the glycoprotein. SCF complex is a E3 ligase which have a Skp1, cdc53 and F-box protein that allows it recognize proteins. Furthermore, it has Fbs1 (Fbx2) and Fbs (Fbx6a) which specifically bind to the chitobiose core.

After being tagged with ubiquitin, a glycanase takes off the sugar and the ubiquitin-tagged protein is delivered to the proteasome for degradation.

During conditions of stress, both EDEM and OS-9 are highly-expressed, possibly caused by misfolded proteins.

Lectins
Lectins are glycan-binding, protein receptors with a high specificity. The major advantage of oligosaccharide-dependent recognition is because it can potentially to encode for much information, because there are a large variety of oligosaccharides which can have many different linkages; compare this to the 20 essential amino acids which contain the same linkage. Furthermore, these oligosaccharides are added post-translationally, which adds to the variety already present through differences in amino acid sequence. Glycans attached to proteins or lipids usually do not change the intrinsic functions of the protein or lipid.

The high-specificity binding to glycans are mediated by different carbohydrate-recognition domains (CRDs), which makes up different lectin families.

Selectins
Selectins are involved in adhesion. Adhesion mediated by carbohydrates are usually quite weak. All selectins are type 1 transmembrane proteins with multiple complement control repeats, or Sushi domains, followed by one EGF-like domain and finally a C-type carbohydrate recognition domain. The C-type carbohydrate-recognition domain is responsible for the recognition of the glycan, and it requires calcium for function. Here, the calcium is directly involved in ligation, unlike calnexin/calreticulin, in which the calcium is involved in stabilizing the protein structure, but not directly involved in the binding site itself. E- and P-selectins have many of the Sushi domains and can thus project the carbohydrate recognition domain further, whereas L-selectins contains only 2. Deletion mutagenesis studies show that only the C-type carbohydrate-recognition domain is required for binding of glycans.

E- and P-selectins
When tissue are injured, they may secrete chemokines which attract immune cells such as neutrophils or monocytes to come to the site of injury to fight off potential pathogens. Neutrophils or monocytes in the circulation expresses glycans which adhere weakly to E- and P-selectins on the lumenal wall of the endothelium; this allow the leukocytes to roll on the wall of the endothelium, eventually adhering tightly through interaction with integrins, leading to diapedesis or extravasation.

During inflammation, E- and P-selectins are expressed on the lumenal side of endothelial cells, and these selectins interacts with glycan ligands on neutrophils and monocytes to mediate the adhesion. E-selectins are strictly on the endothelial surfaces, but P-selectins are also found on platelets, possibly to aid clotting and wound healing.

Normally, P-selectin are already made and stored in Weibel-Palade bodies and is transported to the lumenal surface in response to inflammation signals; this ensures selectins are present to mediate adhesion rapidly. The inflammatory cytokines also induce transcription of E- and P-selectins which moves straight to the cell surface after translation.

The E-, P- and L-selectins are conversed in mice. KO of E-selectins do not produce any detectable phenotypes, probably because it is compensated by P-selectins. The KO of P-selectins, which acts before E-selectins, results in reduced emigration of neutrophils into tissues, and leads to an increase in blood neutrophil count. The KO of both E- and P-selectins results in complete loss of neutrophil emigration into tissues, leading to very high neutrophil count in blood; these mice are susceptible to infection of tissues because the pathogens are not cleared.

L-selectins
Haematopoiesis of leukocytes begins at the bone marrow, where the precursors will then migrate to the places where it will perform its function, namely the peripheral lymph nodes. This homing is specific to lymphocytes and not other leukocytes. However, these leukocytes must leave the lymph nodes periodically and into the circulation to scavenge for antigens, but must return to the lymph node after. After many studies, this homing mechanism is deemed to depend on L-selectins and its glycan ligands - 6-sulfo sialyl Lewisx on O- and N-glycans on high endothelial venules (HEV). This sulphur is required for binding as mice lacking two N-acetylglucosamine-6-O-sulfotransferases (GlcNAc6ST-1 and GlcNAc6ST-2), which obliterated the levels of addressin and 6-sulfo sialyl Lewisx on the surface of HEV and reduced adhesion of lymphocytes.

L-selectins are constitutively expressed on the surface of lymphocytes, and these bind to glycan ligands on the lumenal side of high endothelial venule of lymph nodes.

The KO of L-selectins reduced homing of lymphocytes to lymph nodes, but also reduced neutrophil emigration; this is because although L-selectins are found predominantly on lymphocytes, they are also found on neutrophils.

Methods
One way to elucidate what the selectins bind is to systematically 'take out' different sugars and see its affect on binding using a binding assay. First, endothelial cells are grown on a slide; leukocytes are then added and incubated on the endothelial cells for a certain time in different conditions, after which the slide is fixed with glutaraldehyde and its binding measured, for example using fluorescence microscopy if the leukocytes express a fluorescent tag.

Treatments include using negative sugar inhibitors such as F-fucoidan, a sulfated polysaccharide containing mostly L-fucose and sulfated esters of fucose, lead to a decrease in binding. This is because the selectins on the leukocytes and endothelial cells bind to these sugars instead of the ones on the surfaces of cells. Fucoidan gave a large effect, suggesting that selectins bind to glycans with fucose residues in them. Modifying the sugar residues specifically will give us information as to what residue is essential. Sialidase digestion reduced the amount of binding, and so sialic acids are known to be involved in binding. Using monoclonal antibodies against the sugars will tell us which sugar is required for binding. Each sugar on a glycan is added on by a specific glycosyltransferase; if the expression of these transferases are altered, and there are changes in the level of adhesion, then you can say that that sugar is involved in binding; for example, transfection of α1,3-fucosyltransferase gave the ability for Chinese Hamster Ovaries (CHO) and COS cells to bind to E-selectin, so α1,3 linked fucose is required for E-selectin binding.

Another approach is to use affinity chromatography with the selectin on the beads of the column. A mixture of different purified glycans can be passed through the column and eluted using different conditions. The structure of the glycans can then be elucidated using mass spectrometry.

Ligands
From the binding assays and affinity chromatography data, the ligands for selectins are known to have NeuAc and fucose as terminal sugars. Sialyl-Lewisx and sialyl-Lewisa are two such ligands. However, glycans with terminal NeuAc and fucose are not uncommon, even in situations where binding is not appropriate; so binding requires these glycans to be presented in context, possibly with other neighbouring groups, often sulfate groups.

E-selectin ligand 1 (ESL-1) can bind to both E- and P-selectins, P-selectin glycoprotein ligand 1 (PSGL-1) binds P-selectins, respectively; L-selectin can bind to mucin-like glycoproteins on high endothelial venules, and binds ligands such as MadCAM-1, CD34 podocalyxin, endoglycan and GlyCAM.

P-selectin glycoprotein ligand 1 (PSGL-1) is a dimer of 120 kDa subunits, with each subunit having 3 N-glycosylation sites and 70 mucin-like O-glycosylation sites; these glycans makes PSGL-1 stiff and projects outwards. A sialyl-Lewisx group is attached to the glycan stemming from the threonine residue near the N-terminus; it is this sialyl-Lewisx group which is responsible for binding P-selectin. Although PSGL-1 is found as a dimer, only the sialyl-Lewisx group is required for recognition of P-selectin and so even a monomeric fragment containing the sialyl-Lewisx group is sufficient for recognition. Sulfated tyrosine residues further towards the N-terminus of the subunit is required for high-affinity binding. There may also be multiple copies of Lewisx before the terminal sialyl-Lewisx.

The sugars responsible for binding is added onto O-glycans, because PNGase F, an enzymatic glycosidase that works only on N-glycans, did not affect binding. The sugars are also known to be near the N-terminus, because proteolytic digestion of the N-terminus inhibited binding, and when the N-terminal peptide is fused to other proteins through chimeric recombination, the other proteins were able to bind to P-selectin. To pinpoint which residue at the N-terminal is responsible, site-directed mutagensis identified Thr57 as being responsible. The sugars responsible for binding P-selectin was elucidated from experiments showing PSGL-1 binding is only observed in CHO cells which are cotransfected with core 2 β-GlcNAc transferase and α1-3 Fuc transferase. Sialidase treatment decreases binding.

It has been shown that three tyrosines are always sulfated, and this is required for high-affinity binding to P-selectin, as sodium chlorate treatment, which inhibits sulfation, inhibited binding to P-selectin. Furthermore, mutations in individual sites reduced binding to P-selectin. Sulfation did not, however, affect E-selectin binding.

PSGL-1 also induce signals that activate the β2 integrin LFA-1.

E-selectin ligand-1 (ESL-1) is a 150kDa C-type lectin found on leukocytes, with its sequence similar (91% identity) to a fibroblast growth factor receptor. It contains only N-linked glycans and no O-linked glycans. Sialyl Lewisx is attached to polylactosamine chain and projects from the leukocyte. Fucosylation is essential for binding to E-selectin. ESL-1 also induces signals that activate the β2 integrin Mac-1.

L-selectin ligands. Glycosylated cell adhesion molecule 1 (GlyCAM-1), CD34, podocalyxin and endomucin binding to L-selectin depends on sialic acids, fucose and sulfate groups, especially those attached to position 6 of GlcNAc. Cells at the HEV must express two sulphotransferases - GlcNAc6ST-1 and GlcNAc6ST-2 - are required for binding, and their knockout led to lack of homing of lymphocytes to the lymph nodes. Therefore, these ligands are thought to express 6-sulpho-sialyl-Lewisx.

Endoglycan, a CD34 homologue, binds through tyrosine sulphate in a similar manner to PSGL-1.

Glycoprotein clearance
Glycoproteins are highly varied and can take on different glycoforms. To maintain a steady-state concentration of these glycoproteins, as well to remove dysfunctional (e.g. denatured) proteins, immunocomplexes, aggresive enzymes etc from the circulation, the liver contains many specialized receptors that recognize these glycoproteins and induce endocytosis and transport to the lysosome, Golgi and/or bile canaliculis.

Asialoglycoprotein receptor
Caeruloplasmin is a complement protein which carries Cu to the liver. It was found that caeruloplasmin with its terminal NeuAc residue removed and galactose exposed are rapidly cleared from the circulation. Caeruloplasmin with the terminal NeuAc attached has a half-life of one to a few days, whereas asialocaeruloplasmin has a half-life of less than 5 minutes, this is because it binds to asialoglycoprotein-receptor (ASGP-R). The asialoglycoprotein-receptor (ASGP-R) can be found on liver parenchymal cells can bind to glycoproteins with terminal Gal and GalNAc in vitro and to Siaα2,6GalNAcβ1,4GlcNAcβ1,2Man in vivo in rat. Bovine serum albumin with a 10-15 carbohydrate structure containing the Siaα2,6GalNAcβ1,4GlcNAcβ1,2Man sequence chemically attached to it was injected into the rat circulation and cleared in less than a minute by binding to the subunit 1 of ASGP-R . Of note, in rat in vivo and isolated perfused rat liver, asialoglycoproteins with terminal galactose groups are cleared from the circulation by the hepatocytes (e.g. centrolobular hepatocytes) and excreted in bile.

The asialoglycoprotein receptor (ASGP-R) contain multiple C-type carbohydrate-recognition domains forming a cluster (a trimer). These requires Ca2+ binding to function. ASGP-R can bind to asialylated glycoproteins, but also glycans with terminal sialic acid α2-6, but not α2-3, linked to Gal or GalNAc. This suggests that it is the 3-OH and 4-OH of Gal or GalNAc which contributes to the binding to asialoglycoprotein receptor.

After binding to ASGP-R, the protein is processed through the classical endocytic pathway, where it is taken up into a clathrin-coated vesicle, and then fuses with the endosome, where the low pH (5.5) decreases the affinity of the receptor for Ca2+ and thus the receptor dissociates with the glycoprotein. The receptor is recycled and returns to the cell surface where the pH there (7.2) is again suitable for binding again. The released glycoprotein is then transported into the lysosome where it is hydrolysed and degraded.

At the moment, this is the only pathway known for the clearance of asialylated glycoproteins. People with liver diseases tend to have less receptors for asialoglycoprotein and many desialylated glycoproteins remain in the circulation. As to the generation of these asialylated glycoproteins, sialidases may be introduced to regulate the degradaion of many serum proteins; however, no specific sialidase activity have yet been defined.

The charge of the protein, the terminal sugars of the glycans, and other factors may affect this uptake.

Mannose receptor
When the galactose residues are removed from the asialylated glycans, exposing the GlcNAc residue, these proteins are also taken up by the endothelial cells and Kupffer cell of the liver and degraded. However, asialoglycoprotein receptor do not bind to GlcNAc, and so these cells must express a different receptor that mediates the uptake of these glycoproteins - the mannose receptor.

High mannose sugars are not common in secreted glycoproteins., However, lysosomal enzymes do display high mannose glycoproteins, released by macrophages and neutrophils for respiratory burst. However, after they have fulfilled their function, they must be cleared from the circulation after they have completed there job, and that is up to the mannose receptor.

The mannose receptor is found on all macrophages and can bind to mannose-, GlcNAc- and fucose-terminating glycans. It consists of 8 C-type carbohydrate recognition domains, followed by a fibronectin type II repeat and terminated with a R-type CRD. The 5 CRDs closest to the membrane is responsible for specifically binding to mannose, the outermost 3 do not bind to sugars.

The mannose receptor is quite special in the sense that it serves two functions. Apart from binding to mannoses, the R-type CRD can bind to sulphated GalNAc (SO4-4GlNAcβ1), which are present on pituitary hormones, and rarely found on other molecules. The pituitary gland is the only place which has the two enzymes required to sulphate GalNAc.

The mannose receptor is conserved from mice to humans. Its ligands are often in need of degradation, and so the main role of the mannose recepto can be viewed as to clear unwanted proteins, such as lysosomal enzymes, in order to maintain homeostasis.

Follicle stimulating hormone (Follitophin, FSH) is a pituitary hormone that stimulates the development of the egg from the primordial follicle stage to the mature follicle stage, and also promote production of estrogen; luteinizing hormone (lutrophin, LH), on the other hand, promotes the released of the egg and generation of corpus luteum at ovulation, and also promotes production of progesterone.

LH is a ligand of a signalling receptor found on the target cells of the ovary. The receptor is a 7-pass TM- G-coupled receptor. When LH binds, it generates cAMP and this leads to the release of the egg. Too high a LH level will lead to desentization of the receptor; furthermore, it must be cleared to start the next menstrual cycle. LH is cleared by mannose receptors found on liver endothelial cells, once bound, they are endocytosed and degraded.

Innate immunity
The mannose receptor may serve a third function in the innate immunity . Escherichia coli display lipopolysaccharides on its surface, Candida albicans display mannan, Leishmania donovani display lipophosphoglycan, Mycobacterium tuberculosis display lipoarabinomannan, and the fungi Pneumocystis jiroveci pneumonia; these high mannose structures can be recognized by the mannose receptor on macrophages and the whole microorganism is endocytosed and degraded. Therefore, the mannose receptor should aid in innate immunity. However, knockout the the mannose receptor in mice did not lead to a lower lifespan. These mixes of results might be due to the fact that certain microorganisms (Mycobacterium tuberculosis, Mycobacterium leprae) have used this pathway of endocytosis to hijack the macrophage. It uses the mannose receptor recognition to gain entry in the cell, where it will remain in the phagolysosome and multiply.

Mannose-binding protein (MBP)
Deficiency in MBP is linked to recurrent and severe infections. People with deficiencies in MBP tend to see symptoms after 6 months of age, when the maternal antibodies have waned off; these symptoms decrease as the adaptive immune response matures, through the production of a bigger repertoire of immunoglobulins.

The MBP can initiate the innate immune response through the lectin complement pathway. This does not require antibodies, as do the classical complement pathway. Normally, MBP forms a complex with mannan-binding lectin serine peptidase 1/2 (MASP1/2), and this complex is activated when sugar binds; they in turn activate C4 and C2, which cleaves C3 into C3a and C3b. C3b then further leads to the activation of C5 through to C9. They form the membrane attack complex ({C5b-C6-C7-C8-C9}n) which causes cell lysis of the pathogen. At the same time, C3b is also recognized by C3b receptors on macrophages which will take up the pathogen to degrade it.

The mannose binding protein (MBP) is a member of collectins (collagen-containing C-type lectins). At the base of the protein are cysteines which can form disulphide bonds with other MBPs and mediate the multimerization of the MBP. Extending from the base, it has a collagen-link domain which will eventually form the stem and neck. At the end of the collagen-link domain is a C-type CRD. The smallest unit is a trimer; this means that the C-type CRD will always be found as a cluster of three. The smallest functional unit is a dimer of trimer, however, it can go up to being a hexamer of trimers. They associate through the disulphide bonds and the stem region, with the neck spreading away from each other. The more subunit the complex has, the better it is at fixing complement.

The collagen-like domain is not involved in binding of sugars, but certain mutations have been associated with immunodeficiency. Glycine mutations to glutamic acid or aspartic acid disrupts the helical structure of collagen, meaning they can no longer oligomerize, and thus become ineffective in binding sugars. Mutations of arginine to cysteine tend to occur further up and affects MASP binding, leading to null-activation of the complement pathway and a lack of an immune response.

MBP defects are more commonly observed in Africa, where cerebellar malaria over regulates complement; thus damping down complement might actually give an advantage in terms of survival, as an overactive complement can damage host tissues.

The C-type CRD domain of the MBP do not have much of a regular structure, but have three calcium ions - numbered 1, 2 and 3. Calcium 1 stabilizes the loops on the domain, and calcium 2 ligates with the oxygens on the hydroxyl groups of terminal mannoses and GlcNAc's. However, this binding is weak, and so multiple CRD domains from different MBP are required to bind to different glycans in the cluster. These clusters of CRD domains faces the same direction and have fixed orientation which ensures they are at least 50Å apart, this is to ensure that the mannoses are spread out and ubiquitous, and not because a cluster of endogenous high-mannose sugar made it to the cell surface. The CRDs are in a rigid position, and faces the same direction, and so MBP might be more selective to cell surface glycans than for free glycoproteins.

High mannose N-glycans often have very similar core structures, but divergent structures at branches further away from the protein. So how do the same receptor recognizes these divergent glycan structures? Since MBP only needs to bind to one branch, it will pick the one that binds most favourable. Furthermore, the glycan can adapt to the conformation of the receptor recognition site, which induces an entropic penalty.

The spacing between different branches is about 11Å apart, and so the different CRDs do not bind to different branches of the same glycan, but in fact different glycans. This is also to ensure that the high-mannose entities are in abundance and prevent inappropriate activation of complement.

DC-SIGN
T cells must recognize MHC class II on dendritic cells, and come into close contact with it, so that it can sample the antigen. The interaction must be stabilized to give the T cell sufficient time for its T cell receptor to bind to the antigen-MHC II complex. ICAM-3 is a sialyl Lewisx-containing glycan found on T cells, and is recognized by dendritic cell-ICAM-grabbing non-integrin (DC-SIGN) on dendritic cells. Apart from sialyl Lewisx-containing glycan, it can also bind to mannose- and fucose-containing structures, such as mannoses found on some microorganisms, and aid in the innate immune response in a similar fashion to the mannose receptor.

But again, certain microorganisms and viruses can hijack this pathway. HIV display gp120, a coat protein extensively-glycosylated with high mannose sugars. These binds to DC-SIGN on dendritic cells, which subsequently bind to T cells. The gp120 can complex with CD4 and this leads to fusion and the injection of viral genetic materials inside the T cell, where it will be used to replicate the virus. DG-SIGN has been shown to enhance the infection of T cells by HIV.

DC-SIGN have flexible regions before the CRD domain, meaning it can accomodate different orientations of glycans, such is found on glycoproteins.

Galectin
Galectins are soluble, two-subunit proteins that have no membrane anchors and are not membrane bound; they are a family of lectins which bind beta-galactoside. The family is defined by having at least one characteristic carbohydrate recognition domain (CRD) with an affinity for beta-galactosides and sharing certain sequence elements. Galectins can take on a dimeric, tandem (chimera) or N-terminal extended organisation, depending on which type of galectin it is.

Galectins have intracellular and extracellular functions, and can be translocated into the nucleus, binding to the ribonucleoprotein particle (a complex of RNA and RNA-binding proteins), or secreted to crosslink extracellular glycoonjugates. Unlike C-type lectins and other lectins, galectins do not require a metal ion for its function.

Galectins bind best to poly-N-acetyllactosamine (-Galβ1-4GlcNAcβ1-), followed by single units of Galβ1-4GlcNAc, and then lactose, but can also bind, although much less favourably, galactose. The exact specificity depends on individual galectins

Immune system
For T lymphocytes to mature, they migrate from the bone marrow to the thymus where they will be tested for self-reactivity. Any T lymphocytes that recognizes self-antigens are apoptosed, a process mediated by galectins 1 and 9. Mature T cells then migrate into the spleen and lymph nodes and wait for antigens to present themselves. Upon binding to foreign antigens, the T cells are activated in response to these antigens and try to fight off potential infections. To end the response, the T cells are apoptosed, and this is mediated by galectins 1 and 9, in a similar manner to the process in the thymus. KO of galectin 1 leads to immunodeficiencies. Overexpression of core 2 GlcNAc transferase, which leads to more polylactosamine repeats, increases susceptibility to apoptosis mediated by galectin 1.

T cell receptors (TCRs) need to be associated into a cluster to induce downstream signalling. They associate through binding of galectins to the poly-N--acetyllactosamine repeats on the TCRs. Prevention of galectin binding using GlcNAc transferase V KO leads to lack of signalling and lymphocyte activation.

Apart from roles in apoptosis and receptor activation, galectins have also been postulated to be involved in adhesion, where they are thought to crosslink glycans found on different surfaces and enhance adhesion. However, they are also postulated to cross-link glycans on the same surface and inhibit adhesion.

Siglecs
Sialic acid-binding immunoglobulin-type lectins (Siglecs) are a subset of I-type lectins, categorized this way because of its many immunoglobulin folds. All Siglecs binds specifically to sialic acid, but their specificity differs on the type of linkage (most binds 2,3 linages); different siglecs are also distributed at different cellular locations. Siglec-1 is found on macrophages, siglec 2 is found on B lymphocytes, siglec 3 is found on different cell populations of the immune system, siglec 4a is myelin-associated, and siglec 15 is found on macrophages and dendritic cells. Sialoadhesin (siglec 1) is found on macrophages, and they have a long extension which reaches out and binds to sialic acids on the glycans on the surface pathogens, and direct them to phagocytosis . The CD22 and CD33 family might be involved in self-recognition signalling. Because sialic acids are common in human cells and not in pathogens, it can bind to 'self' sialic acids and activates its immunotyrosine inhibitory motifs (ITIMs), and recruits SHP phosphatases which dephosphorylates signalling proteins and inhibit signalling and response to the self-antigen. Knockout of CD22 have exhibited increased sensitivity to bacterial lipopolysaccharides but also an increase in the production of antibodies against the self.

Siglec 4a is myelin-associated and can be found on the myelin sheath that is directly adjacent to the neuron. Sialylated glycans are expressed on the surface of the neuron and myelin-associated glycoproteins (MAGs) binds to these glycans and attaches the myelin sheath to the neuron. Siglecs are required for the stabilization of myelin, and KO of GalNAc transferases and MAGs leads to deterioration of myelin-axon interaction with age. The myelin sheath separates from the neuron and no longer protects it, leading to axon degeneration and neurological defects.

Paroxysmal nocturnal haemoglobinuria
Paroxysmal nocturnal haemoglobinuria (PNH) is a rare, acquired disease caused by defects in the cell membrane. Defects in four subunits of the GlcNAc transferase responsible for addition onto phosphatidyl inositol means CD59, a glycolipid anchor protein that inhibit lysis via complement, is not put on the membrane, leading to episodes of erythrocyte lysis, occuring at night (hence noctural). Defects in the transferase is embryonic lethal in mice. The cells which acquire this mutation have a proliferative advantage and thus will eventually constitute the whole bone marrow.

Inflammation blocker
During surgery, tissue damage will occur either through trauma or hypoxia; when surgery is complete, neutrophils will rush in and secrete factors that induces inflammation. Excessive inflammation can cause disease. By adding sialyl Lewisx into the circulation, selectins on the neutrophils will bind to these free sialyl Lewisx and are prevented from binding to the endothelial linings and entering tissues to induce inflammation. In practice however, this treatment is not effective because carbohydrates are not viable for long in the circulation.

ABO blood group antigens
The H-antigen is synthesized when a fucose is transferred onto a non-reducing end Galβ1-4GlcNAcβ1- sequence using H transferase (α1,2-fucosyltransferase). The resulting Fucα1-2Galβ1- terminal structure is known as the H antigen. This H antigen can subsequently be modified to give the A antigen and B antigen. 'A transferase' (α1-3-GalNAc-transferase) attaches a GalNAc via an α1-3 linkage to the galactose of the H antigen, giving the A antigen. B transferase (α1,3-galactosyltransferase) is present at the same allele as the A transferase, and differs in a few single-base substitutions, giving a difference of 4 amino acids ; this difference leads to the addition of galactose onto the H-antigen-galactose via an α1-3 linkage, giving rise to the B antigen. A single base loss-of-function mutation is seen in O-individuals, which gene product is unable to process the H antigen further.

If an individual is of the A blood group, the A antigen will be present on their erythrocytes and they will have antibodies in their serum against antigen B; and vice versa. An individual with blood group AB has both antigens present and their serum will contain no antibodies against these antigens; therefore, AB individuals can receive blood transfusions from any blood group, and is known as the universal acceptor. In contrast, an individual with blood group O has no antigens on the red blood cells, and will have antibodies against both A and B antigens; therefore, O individuals can donate their blood to any blood group but will not be able to receive blood from any other group apart from O, and they are known as the universal donor.

h/h, also known as Oh, is a rare blood group which lack the H transferase and so their erythrocytes lack the H antigen. In these individuals, their serum contain antibodies that reacted with all erythrocytes irrespective of their ABO group, even the H antigen. And so technically, they are the universal donor; however since this blood group is rare, this term is not often used.

CD8/MHC
In brief, the development of a T cells begins in the germinal centers of the thymus, haematopoietic stem cells of the bone marrow migrate to the thymus and becomes immature thymocytes, which do not express neither the cell surface glycoproteins CD4 nor CD8, and are therefore classed asdouble-negative (CD4-CD8-) cells. As they progress through their development they becomedouble-positivethymocytes (CD4+CD8+), and finally mature to single-positive (CD4+CD8- or CD4-CD8+) thymocytes that are then released from the thymus to peripheral tissues.

When the thymocytes become double-positive, the TCRβ chain gene is rearranged and is expressed on the cell surface as pre-TCR; the TCRα chain gene is not yet rearranged nor expressed at this immature stage. After DP cells are formed, the TCRα chain gene is rearranged and expressed to form the complete T-cell receptor (TCRαβ). However, thymocytes which recognizes self-antigens, or thymocytes that are unable to recognize foreign antigens, must be disposed of through apoptosis via negative selection or positive selection, respectively; 98% of the thymocytes are apoptosed this way. Only ~2% survive and leave the thymus to become mature immunocompetent T cells.

After selection, the T cells are able to recognize specific foreign antigens and induce a response. It does this via the TCR, which recognizes the antigen presented to it on a histocompatibility class I/II (MHC I/II) molecule. CD4 and CD8 are co-receptors for the TCR that recognizes the same MHC molecule. Both the TCR and the CD4 or CD8 are required to bind to trigger a response.

However, CD4 and Cd8 by themselves are able to bind to MHC irrespective of whether an antigen is present. This adhesion can be observed on early stages of T-cell development, before the presence of a TCR. CD8 is able to bind to MHC and adhere to it (noncognate interaction). But as the T cell matures, the ability of the CD8 glycoprotein to bind to MHC I decreases as the T cell matures, and this process is regulated through glycosylation.

CD8 is glycosylated at an early stage with Core 2 O-linked glycans (derived from Core 1), as the T cell matures, Core 1 is instead sialylated and this decreases the avidity of CD8 with MHC I.

Muscular dystrophies
Contracting muscles must be attached to the basement membrane in order to transform the force generated into meaningful mechanical motion. This attachment is mediated by a complex of actin-dystrophin-dystroglycan-laminin. Actin is a major cytoskeleton component that is present throughout the cell, and particularly on the cell periphery. Dystrophin binds to these actin filaments as well as dystroglycan. Dystroglycan is a basement membrane receptor with an α- and β-subunits. The β subunit is transmembrane and binds to dystrophin; the α-subunit is extracellular and is heavily glycosylated with N-glycosylation, mucin-type O-glycosylation, O-mannosylation, and a type of phosphorylated O-mannosyl glycan modifiction. It is the glycans on the α-subunit of dystroglycan (α-DG) which is required for binding with laminin. Because laminin is part of the extracellular matrix (ECM) of muscles, it is immobile; when the muscle pulls on it, it is able to move the whole ECM to generate motion.

If any part of the actin-dystrophin-dystroglycan-laminin complex is disrupted, there will no long be attachment, and the muscle have nothing to pull from. These muscles are no longer to produce mechanical motion, leading to a group of diseases collectively known as muscular dystrophies. Most muscular dystrophies are caused by sporadic genetic mutations in the dystrophin gene, but there are also a significant number of cases in which mutations in dystroglycans or defects in its glycosylation leads to muscular dystrophies, and these are more specifically termed dystroglycanopathies.

Dystroglycanopathies can be caused by mutations in the dystroglycan gene itself, by mutations in the glycosyltransferases which mediates the glycosylation, or mutations in the synthesis of the sugar-nucleotides.

Defects in glycosylation means the muscle cells can no longer attach to the ECM and generate meaningful motion. In the normal skeletal muscle cell, dystroglycan is O-mannosylated with a unique sequnce of NeuAc-Gal-GlcNAc-Man. ThisO-glycosylation is unique and is facilitated first by protein O-mannosyl-transferase 1/2 (POMT1/2), which transfer the initial mannose, and then protein O-mannosyl β-1,2-N-acetylglucosaminyltransferase 1 (POMGnT1), which adds the next GlcNAc onto the mannose.

Normally, aside from the NeuAc-Gal-GlcNAc-Man heterotetrasaccharide, the mannose also has a heteropolysaccharide attached to it. This heteropolysaccharide is made up of repeating units of glucuronic acid and xylose, and is added onto the mannose through a phosphodiester bond between the reducing end of glucuronic acid and mannose. The addition and polymerization of this heteropolysaccharide is mediated by like-N-acetylglucosaminyltransferase (LARGE) or its homolog LARGE2.

This heteropolysaccharide of glucuronic acid and xylose is able to bind to any extracellular matrix (ECM) components containing the laminin G domain. In muscles, these are predominantly laminin, perlecan, and agrin. But also neurexin (brain) and pikachurin (retina). This binding is required for the function of dystroglycan and attachment of cells of different tissues to the basement membrane ; thus, congenital mutations in these glycosyltransferase genes affects not only muscles, but also the brain and retina also.

LARGE
To determine the laminin-binding sugar moiety, recombinant α-DG expressed in human embryonic kidney (HEK) 293 cells were treated with glycosidases to eliminate all unassociated glycans. The remaining sugars shows a high levels of the O-mannosyl glycan, namelyN-acetylglucosamine (GlcNAc), galactose (Gal), N-acetylgalactosamine (GalNAc), mannose (Man) and glucose (Glc), but also a large amount of glucuronic acid (GlcA) and xylose (Xyl); and so the laminin-binding sugars will consist of these sugars. LARGE has been found to be active in CHO cells with deficiency in the synthesis of CMP-sialic acid, GDP-fucose (Fuc), UDP-galactose (Gal) and UDP-N-acetylgalactosamine (GalNAc), and so these sugars can be disregarded. Therefore, only Xyl, GlcA and Glc remains. Glc was disregarded as a contaminant, leaving only two sugars - Xyl and GlcA.

LARGE has a Xyl-T domain, and since functional LARGE was implicated as required for laminin binding, xylose was first investigated.

KO of UXS1, which makes the xylose nucleotide, leads to lack of a laminin-binding domain, shown as negative staining with IIH6, an antibody for laminin-binding domains. Ectopic expression of UXS1 rescued IIH6 reactivity. Thus, xylose is part of the laminin-binding domain.

Xylose attached to hydrophobic moieties can be introduced into the cell, if laminin binds xylose, then dystroglycan binding would decrease, because these exogenous competes with the xyloses on dystroglycan, but more importantly, they inhibit the glycosyltransferases that transfer xylose. p-nitrophenyl-α--xyloside (Xyl-α-pNP) mimics xylose, and when introduced, led to reduced α-DG glycosylation altogether. And so xylose is not only a constituent of the glycosylation, it is a necessary part required for subsequent glycosylation.

To examine what is attached to xylose, purified mobile LARGE (LARGE without the N-terminal transmembrane domain, LARGEdTM; purified using both the N-terminal FLAG and C-terminal myc + Hisx6 tags) are incubated with Xyl-α-pNP and different donor substrates (xylose is not used to ensure it is not added onto other sugars) and the products separated by high-performance liquid chromatography (HPLC). Only when Xyl-α-pNP is incubated with UDP-GlcA was a peak showing a conjugate observed. This is specific for α-linked xylose.

Because LARGE is known to have two glycosyltransferase domains, and one is known to transfer GlcA to the non-reducing end of α-linked xylose (confirmed by mass spectrometry), the other one is likely to transfer another sugar onto the non-reducing end of GlcA. Repeating the above experiments but with 4-methylumbelliferyl-β--glucuronide (GlcA-β-MU), a peak was observed only when incubated with UDP-Xyl. Thus, xylose is added specifically onto β-linked GlcA, the product of which is confirmed by mass spectrometry.

Furthermore, when the two acceptors (Xyl-α-pNP and GlcA-β-MU) were individually incubated with both UDP-Xyl and UDP-GlcA, a variety of polymers were observed. And so LARGE is able to alternately add xylose and glucuronic acid residues to form polysaccharides. This is confirmed by MS following gel filtration purification. NMR were used to determine linkages, and results show polymers of (-3-Xylα1-3GlcAβ1-)n.

To identify approximate locations of the two domains, site-directed mutagenesis of D242N/D244N and D563N/D565N, positions which lies within the conserved DXD motifs, which are present in every glycosyltransferase, were constructed. This data show loss of function where the mutation occured, and suggests that the Xyl-T domain lies over residues 242 - 244; and the GlcA-T domain lies over residue 563 - 565. All three DXD motifs found in LARGE and LARGE2 are required for function

The same sets of experiments were conducted with LARGE2, a homolog of LARGE, and it was found to contain the same DXD domains, have similar sequence, transferred the same two sugars and nothing else (into alternate repeats) Every aspect of LARGE2 appears to be the same as LARGE, apart from the pH optima - the pH optima of Xyl-T of LARGE was pH 5.0; whereas that pf LARGE2 was pH 5.5-9.0. pH optima for GlcA-T of LARGE ranged from pH 5.5-8.0; whereas that LARGE2 is pH 5.5.

A different group used protein A and IgG sepharose for purification of LARGE2dTM, and used ethanol as the acceptor for the reaction of LARGE2. UDP-Xyl was shown to be added to ethanol, and is thus a substrate of LARGE.

The catalytic function of LARGE2 is shown to be dependent on manganese ion(s), as no products were formed in the presence of EDTA, which chelates the manganese ions; and products are few when manganese were removed.

HNK-1 sulfotransferase can transfer sulfates onto GlcA residues, which can lead to the synethesis of the HNK-1 epitope (SO4–GlcA–Gal). It has been shown that HNK-1 inhibits the generation of the Xyl-GLcA polysaccharide. This is probably because it transfer a sulfate onto the GlcA and prevent further addition of Xyl.

The glycan synthesized by LARGE and LARGE2 is similar to heparin and heparan sulfate, containing alternating α- and β- linkages. Laminin can bind to both entities, and heparin can compete with dystroglycans in binding to laminin.

Summary
In summary, like-N-acetylglucosaminyltransferase (LARGE) contains two glycosyltransferase-like domains - xylosyltransferase (Xyl-T) in the middle of the gene and glucuronyltransferase (GlcA-T) at the C-terminus - which belongs to the glycosyltransferase family 8 (GT8) proteins and uridine diphosphate (UDP)-N-acetylglucosamine (GlcNAc):Galβ1,3-N-acetylglucosaminyltransferase 1 (part of GT49), respectively. LARGE alternatively transfer xylose and glucuronic acid onto the O-mannosyl chain. LARGE binds to the N-terminal domain of α-dystroglycan and synthesize these polysaccharides of (-3-Xylα1-3GlcAβ1-). In the absence of mannose, LARGE can still attach the -(Xylα1-3GlcAα1-3)- polysaccharides to N-linked and mucin-type O-linked glycans. In fact, LARGE can also glycosylate on proteins other than dystroglycan.

LARGE also has a homolog - LARGE2 (a.k.a. glycosyltransferase-like 1B, GYLTL1B) - which functions very similarly to LARGE, but its Xyl-T and GlcA-T subunits have different pH optima to each other, as well as to LARGE. LARGE is widely expressed but most prominently in the brain, heart and skeletal muscle, wheres LARGE2 is expressed prominently in the kidney and placenta, but not the brain.

The negatively-charged (-3-Xylα1-3GlcAβ1-) on α-DG most probably bind by interacting with the three basic patches present on the LG4 and 5 domains of laminin α chain

Other glycans are also beneficial to binding, but their absence can be compensated by the overexpression of LARGE and/or LARGE2.

Speculation
This redundancy is best observed in the kidneys of Largemye mice, which do not express LARGE but does express LARGE2; and the high expression LARGE2 compensate to glycosylate α-DG

The difference in pH optima may allow glycosylation to occur effectively across the whole Golgi, as the pH in the Golgi range from 6.7 (cis) to 6.0 (trans). However, O-glycosylaion tends to occur near the trans-Golgi, and so this speculation might not be valid at all. On a separate note, alterations in pH, as occurs as proteins pass through the Golgi (a decrease in pH), or in cancer cells (increase in pH), can cause mislocation of glycosyltransferases and lead to abnormal glycosylation. And so the two LARGE homologs with different pH might ensure glycosylation of α-DG even in cancer, to prevent the cancer from metastasizing.

However, a more attractive hypothesis is that LARGE and LARGE2 complement each other. EXT1 and EXT2 are homologs which both contain GlcA-T and N-acetylglucosaminyltransferase (GlcNAc-T) domains required for synethesis of heparan sulfate. Their co-expression leads to the formation of a heterodimer which have a different glycosyltransferase activity. If this is also true for LARGE/LARGE2, then they could form dimers in which one specializes in one transferase reaction because it is more efficient, and the other specialize in the other. There is no evidence for this hypothesis, but if proven true, it could be explained as a duplication event and the labour of one protein is divided out to its duplicates, a process known as subfunctionalization.

Mutations in sugar-nucleotide synthesis and/or transport
Uridine 5′-diphosphate (UDP)-xylose synthase 1 (UXS1) is required to generate UDP-Xyl from UDP-GlcA. Chinese Hamster Ovary cells lines which lacks UXS1 fail to become xylosylated on the O-mannose and unable to bind to the ECM; UDP-GlcA also accumulates in the ER where UDP-Xyl is synthesized. Because UDP-Xyl is absent, even overexpression of LARGE or LARGE2 is unable to rescue it, because LARGE has no sugar nucleotides available to add on with. (N.B. KO of UXS1 also leads to absence of heparan sulfate (HS) and chondroitin-dermantan sulfate (CS-DS), which requires xylose to initiate)

UDP-Xyl is synthesized from UDP-GlcA in the ER lumen; LARGE, on the other hand, resides in the Golgi. Thus, transporters must exist to transport the sugar nucleotides to the Golgi. SLC35B4 is a UDP-Xyl transporter for UDP-Xyl and UDP-GlcNAc into the Golgi, and so its mutation is likely to cause muscular dystrophy as well, because binding is inhibited because the nucleotide sugar is not at the correct location for LARGE/LARGE2 to act on.

LAD-II/CDG-IIc
Leukocyte adhesion deficiency II (LAD-II) or congenital disorder of glycosylation IIc (CDG-IIc) is a very rare (affects less than 20 children worldwide) autosomal recessive syndrome caused by defect(s) in the transporter for GDP-fucose (made in the cytoplasm) into the ER an Golgi ; defects in the biosynethesis of GDP-fucose is not observed . Thus leukocytes lack sialyl-Lewisx and other fucosylated structures, leading to lack of rolling on the walls of the endothelia, leading to lack of homing into sites of infection. The adaptive immune system is also highly compromised because leukocytes do not take up residence in lymph nodes. Children with LAD-II/CDG-IIc presents with developmental defects, severe growth and mental retardation, exhibit the rare Bombay blood group type (Hh, which requires addition of fucose), susceptibility to infections, high neutrophil levels, chronic severe periodontitis (inflammation and infection of the ligaments and bones that support the teeth) and short lifespan.

Other
Limb-girdle muscular dystrophy is caused by mutations at the threonine to which the glycan would normally be attached.

Attachment of the muscle cell to the basal laminae may also be mediated through integrin alpha7beta1D (integrin beta1D is expressed throughout the body whereas alpha7 is more prominent in muscle) and patients with mutations in the integrin alpha7 gene (ITGA7) shows signs of congenital myopathy

Analysis of glycoproteins
To determine the mass of the protein portion of a glycoprotein, all the glycans must be removed to eliminate heterogeneity in mass. The mass of the deglycosylated protein is then analyzed using mass spectrometry (MS), most commonly using matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) MS.

Hydrazine
Hydrazine releases unreduced O- and N-linked oligosaccharides. Treatment with hydrazine will result in the destruction of the protein component of the glycoprotein, while the glycans remain intact. Anhydrous hydrazine is added to a salt-free, freshly lyophilized glycoproteins (glycoproteins at a concentration of 5 to 25 mg/ml). O-linked glycans are more easily released, requiring incubation at 60°C for 5 hours; N-linked glycans are released after incubation at 95°C for 4 hours.

Alkaline β-elimination
The O-glycosydic linkages between O-glycans and the β-hydroxyl groups of serine or threonine can be easily hydrolyzed by dilute alkaline solutions. Incubation with 0.05 to 0.1 M sodium hydroxide or potassium hydroxide at 45-60°C for 8-16 hours is sufficient to break this bond. To prevent isomerization or degradation of the carbohydrate, a reducing agent, usually 0.8 to 2 M sodium borohydride, is added. The product is the reduced (alditol) forms of the glycans. Alkaline β-elimination using sodium/potassium hydroxide with sodium borohydride releases all O-glycans except those attached to tyrosine (seen in prokaryotes ), hydroxyproline, and hydroxylysine, or if the attached serine/threonine is at the carboxy-terminus; N-glycans are left alone altogether.

To liberate N-glycans using alkaline β-elimination, harsher conditions of 1 M sodium hydroxide at 100 °C for 6 to 12 hours, with 1 to 2 M sodium borohydride (reducing agent). N-Acetylglucosamine (GlcNAc) will be deacetylated during this reaction, and must be re-N-acetylated using acetic anhydride in methanol.

An alternative to to using sodium hydroxide is to used ammonium hydroxide/carbonate.

Trifluoromethanesulfonic Acid
Trifluoromethanesulfonic Acid (TFMS) destroys all glycans and leaves only the protein portion, although the Asn-linked GlcNAc residue of N-glycans remain attached.

Enzymatic
Enzymatic approaches may benefit from the fact that it can hydrolyze indiidual monosaccharides from glycans sequentially and specifically; this allows for the structure of the glycan to be elucidated.

N-glycans
First, glycoproteins are reduced and alkylated (using tributylphosphine and iodoacetamide) to denature the protein and prevents the formation of disulfide bridges. The glycoprotein is then separated using SDS-PAGE, and Coomassie® stain are used to identify the band or spot (on a 2D gel) of interest. This area of the gel is cut out and sliced into sections. The sections are destained, dried and treated with glycosidases.

Peptide-N-glycosidase F (PNGase F) is one of the most widely-used enzyme used to deglycosylate N-glycans. It leaves the glycan intact; however, it does deaminate the asparagine residue, to which the glycan was attached, into aspartic acid. PNGase F requires one residue on each side of the glycan-linked asparagine for it to break the glycan-protein bond. Treatment with high temperature and denaturing agents such as SDS and 2-mercaptoethanol prevents steric hindrance of PNGase F and increases deglycosylation. PNGase F is able to separate N-glycans of all types (high mannose, hybrid or complex), apart from glycans which have a fucose residue α1-3 linked to the Asn-bound N-acetylglucosamine, often found in plants and parasitic worms. PNGase A derived from almond meal must be used to release glycans with a fucoseα1-3 linked to the Asn-linked GlcNAc; but this enzyme is in turn ineffective with sialylated glycans. Other endoglycosidases such as Endoglycosidase H and the Endoglycosidase F1, 2 and 3 have specificity to the substrate and do not cleave all N-glycans (some cleaves only high-mannose, or hybrid, or complex); of those it does release, a GlcNAc residue is left attached to the asparagine.

After the release of the glycans, they are separated from the peptide using affinity chromatography based on hydrophobicity - based on the fact that sugars are more hydrophilic than peptides.

O-glycans
Enzymatic deglycosylation of O-glycans requires a series of exoglycosidases to sequentially remove monosaccharides until only the Gal-β(1-3)-GalNAc core remains. The core can subsequently be removed by O-Glycosidase, although sialylation of the core prevents this.

Mass spectrometry
Tandem mass spectrometry sequencing methodologies, more sensitivity in analyzing sulfated and polysialylated glycans, better definition for site of O-glycosylation, databases with glycomic information.

The use of matrix-assisted laser desorption ionisation tandem time-of-flight (MALDI-TOF/TOF) instruments increase performance in terms of upper mass range, resolution, sensitivity and signal to noise ratios

Proteoglycans
In mammals, xylose is found as the first sugar residue of the tetrasaccharide GlcAbeta1-3Galbeta1-3Galbeta1-4Xylbeta1-O-Ser, initiating the formation of the glycosaminoglycans heparin/heparan sulfate and chondroitin/dermatan sulfate. GlcA and Xyl are essential components of heparan sulfate (HS) and chondroitin-dermatan sulfate (CS-DS) glycosaminoglycans (GAGs), whose biosynthesis is initiated by linkage of tetrasaccharide GlcAb1-3Galb1-3Galb1-4Xylb1- to proteoglycan core proteins.

Epidermal growth factor repeats
Xylose is found on trisaccharide Xylalpha1-3Xylalpha1-3Glcbeta1-O-Ser on epidermal growth factor repeats of proteins, such as Notch.

Xylose
UDP-xylose synthase (UXS) catalyzes the formation of the UDP-xylose substrate through decarboxylation of UDP-glucuronic acid. The loss-of-function mutation in UXS leads to no xylose and an accumulation of UDP-glucuronic acid.

UXS found in the ER and Golgi lumen

Glossary
A glycan is a oligosaccharide attached to a protein or lipid, and a glycoconjugate is the glycan along with whatever it is bound to.