Arabinogalactan protein

Arabinogalactan-proteins (AGPs) are highly glycosylated proteins (glycoproteins) found in the cell walls of plants. Each one consists of a protein with sugar molecules attached (which can account for more than 90% of the total mass). They are members of the wider class of hydroxyproline (Hyp)-rich cell wall glycoproteins, a large and diverse group of glycosylated wall proteins.

AGPs have been reported in a wide range of higher plants in seeds, roots, stems, leaves and inflorescences. AGPs account for only a small portion of the cell wall, usually no more than 1% of dry mass of the primary wall. They have also been reported in secretions of cell culture medium of root, leaf, endosperm and embryo tissues, and some exudate producing cell types such as stylar canal cells are capable of producing lavish amounts of AGPs. They are implicated in various aspects of plant growth and development, including root elongation, somatic embryogenesis, hormone responses, xylem differentiation, pollen tube growth and guidance, programmed cell death, cell expansion, salt tolerance, host-pathogen interactions, and cellular signaling.

AGPs have attracted considerable attention due to their highly complex structures and potential roles in signalling. In addition, they have industrial and health applications due to their chemical/physical properties (water-holding, adhesion and emulsification).

Sequence and classification


The protein component of AGPs is rich in the amino acids Proline (P), Alanine (A), Serine (S) and Threonine (T), also known as ‘PAST’, and this amino acid bias is one of the features used to identify them. AGPs are intrinsically disordered proteins as they contain a high proportion of disordering amino acids such as Proline that disrupt the formation of stable folded structures. Characteristic of intrinsically disordered proteins, AGPs also contain repeat motifs and post-translational modifications. Proline residues in the protein backbone can be hydroxylated to Hydroxyproline (O) depending on the surrounding amino acids. The ‘Hyp contiguity hypothesis’ predicts that when O occurs in a non-contiguous manner, for example the sequence 'SOTO', such as occurs in AGPs, this acts as a signal for O-linked glycosylation of large branched type II arabinogalactan (AG) polysaccharides. Sequences that direct AG glycosylation (SO, TO, AO, VO) are called AGP glycomotifs.

All AGP protein backbones contain a minimum of 3 clustered AGP glycomotifs and an N-terminal signal peptide that directs the protein into the endoplasmic reticulum (ER) where post-translational modifications begin. Prolyl hydroxylation of P to O is fulfilled by prolyl 4-hydroxylases (P4Hs) belonging to the 2-oxoglutarate dependant dioxygenase family. P4H has been identified in both the ER and Golgi apparatus. The addition of the glycosylphosphatidylinositol (GPI)-anchor occurs in most but not all AGPs.

Families
AGPs belong to large multigene families and are divided into several sub-groups depending on the predicted protein sequence. "Classical" AGPs include the GPI-AGPs that consist of a signal peptide at the N-terminus, a PAST-rich sequence of 100-150 aa and a hydrophobic region at the C-terminus that directs addition of a GPI-anchor; non GPI-AGPs that lack the C-terminal GPI signal sequence, Lysine(K)-rich AGPs that contain a K-rich region within the PAST-rich backbone and AG-peptide that have a short PAST-rich backbone of 10-15 aa (Figure 2). Chimeric AGPs consist of proteins that have an AGP region and an additional region with a recognised protein family (Pfam) domain. Chimeric AGPs include fasciclin-like AGPs (FLAs), phytocyanin-like AGPs (PAGs/PLAs, also known as early-nodulin-like proteins, ENODLs) and xylogen-like AGPs (XYLPs) that contain lipid-transfer-like domains. Several other putative chimeric AGP classes have been identified that include AG glycomotifs associated with protein kinase, leucine-rich repeat, X8, FH2 and other protein family domains. Other non-classical AGPs exist such as those containing a cysteine(C)-rich domain, also called PAC domains, and/or histidine(H)-rich domain, as well as many hybrid HRGPs that have motifs characteristic of AGPs and other HRGP members, usually extensin and Tyr motifs. AGPs are evolutionarily ancient and have been identified in green algae as well as Chromista and Glaucophyta. Found throughout the entire plant lineage, land plants are suggested to have inherited and diversified the existing AGP protein backbone genes present in algae to generate an enormous number of AGP glycoforms.

Structure
The carbohydrate moieties of AGPs are rich in arabinose and galactan, but other sugars may also be found such as L-rhamnopyranose (L-Rhap), D-mannopyranose (Manp), D-xylopyranose (Xylp), L-fucose (Fuc), D-glucopyranose (Glcp), D-glucuronic acid (GlcA) and its 4-O-methyl derivative, and D-galacturonic acid (GalA) and its 4-O-methyl derivative. The AG found in AGPs is of type II (type II AGs) – that is, a galactan backbone of (1-3)-linked β-D-galactopyranose (Galp) residues, with branches (between one and three residues long) of (1,6)-linked β-D-Galp. In most cases, the Gal residues terminate with α-L-arabinofuranose (Araf) residues. Some AGPs are rich in uronic acids (GlcA), resulting in a charged polysaccharide moiety, and others have short oligosaccharides of Araf. Specific sets of hydroxyproline O-β-galactosyltransferases, β-1,3-galactosyltransferases, β-1,6-galactosyltransferases, α-arabinosyltransferases, β-glucuronosyltransferases, α-rhamnosyltransferases, and α- fucosyltransferases are responsible for the synthesis of these complex structures.

One of the features of type II AGs, particularly the (1,3)-linked β-D-Galp residues, is their ability to bind to the Yariv phenylglycosides. Yariv phenylglycosides are widely used as cytochemical reagents to perturb the molecular functions of AGPs as well as for the detection, quantification, purification, and staining of AGPs. Recently, it was reported that interaction with Yariv was not detected for β-1,6-galacto-oligosaccharides of any length. Yariv phenylglycosides were concluded to be specific binding reagents for β-1,3-galactan chains longer than five residues. Seven residues and longer are sufficient for cross-linking, leading to precipitation of the glycans with the Yariv phenylglycosides, which are observed with classical AGPs binding to β-Yariv dyes. The same results were observed where in AGPs appear to need at least 5–7 β-1,3-linked Gal units to make aggregates with the Yariv reagent.

Biosynthesis
After translation, the AGP protein backbones are highly decorated with complex carbohydrates, primarily type II AG polysaccharides. The biosynthesis of the mature AGP involves cleavage of the signal peptide at the N-terminus, hydroxylation on the P residues and subsequent glycosylation and in many cases addition of a GPI-anchor.

Processing and transport
Glycosylation of the AGP backbone is suggested to initiate in the ER with the addition of first Gal by O-galactosyltransferase, which is predominantly located in ER fractions. Chain extension then occurs primarily in the GA. For those AGPs that include a GPI anchor, addition occurs while co-translationally migrating into the ER.

Arabinogalactan sidechains
The structure of the AG glycans consists of a backbone of β-1,3 linked galactose (Gal), with sidechains of β-1,6 linked Gal and have terminal residues of arabinose (Ara), rhamnose (Rha), Gal, fucose (Fuc), and glucuronic acid (GlcA). These AG glycan moieties are assembled by glycosyltransferases (GTs). O-glycosylation of AGPs is initiated by the action of Hyp-O-galactosyltransferases (Hyp-O-GalTs) that add the first Gal onto the protein. The complex glycan structures are then elaborated by a suite of glycosyltransferases, the majority of which are bio-chemically uncharacterized. The GT31 family is one of the families involved in AGP glycan backbone biosynthesis. Numerous members of the GT31 family have been identified with Hyp-O-GALT activity and the core β-(1,3)-galactan backbone is also likely to be synthesized by the GT31 family. Members of the GT14 family are implicated in adding β-(1,6)- and β-(1,3)-galactans to AGPs. In Arabidopsis, terminal sugars such as fucose are proposed to be added by AtFUT4 (a fucosyl transferase) and AtFUT6 in the GT37 family and the terminal GlcA incorporation can be catalysed by the GT14 family. A number of GTs remain to be identified, for example those responsible for terminal Rha.

GPI-anchor
Bioinformatic analysis predicts the addition of a GPI-anchor on many AGPs. The early synthesis of the GPI moiety occurs on the ER cytoplasmic surface and subsequent assembly take place in the lumen of the ER. These include the assembly of tri-mannose (Man), galactose, non-N-acetylated glucosamine (GlcN) and ethanolamine phosphate to form the mature GPI moiety. AGPs undergo GPI-anchor addition while co-translationally migrating into the ER and these two processes finally converge. Subsequently, a transamidase complex simultaneously cleaves the core protein at the C-terminus when it recognizes the ω cleavage site and transfers the fully assembled GPI-anchor onto the amino acid residue at the C-terminus of the protein. These events occur prior to prolyl hydroxylation and glycosylation. The core glycan structure of GPI anchors is Man-α-1,2-Man-α-1,6-Man-α-1,4-GlcN-inositol (Man: mannose, GlcN: glucosaminyl), which is conserved in many eukaryotes. The only plant GPI anchor structure characterized to date is the GPI-anchored AGP from Pyrus communis suspension-cultured cells. This showed a partially modified glycan moiety compared to previously characterized GPI anchors as it contained β-1,4-Gal. The GPI anchor synthesis and protein assembly pathway is proposed to be conserved in mammals and plants. The integration of a GPI-anchor enables the attachment of the protein to the membrane of the ER transiting to the GA leading to secretion to the outer leaflet of the plasma membrane facing the wall. As proposed by Oxley and Bacic, the GPI-anchored AGPs are likely released via cleavage by some phospholipases (PLs) (C or D) and secreted into the extracellular compartment.

Functionally characterized genes involved in AGP glycosylation
Bioinformatics analysis using mammalian β-1,3-galactosyltransferase (GalT) sequences as templates suggested involvement of the Carbohydrate-Active enZYmes (CAZy) glycosyltransferase (GT) 31 family in the synthesis of the galactan chains of the AG backbone. Members of the GT31 family have been grouped into 11 clades, with four clades being plant-specific: Clades 1, 7, 10, and 11. Clades 1 and 11 domains and motifs are not well-defined; while Clades 7 and 10 have domain similarities with proteins of known GalT function in mammalian systems. Clade 7 proteins contain both GalT and galectin domains, while Clade 10 proteins contain a GalT-specific domain. The galectin domain is proposed to allow the GalT to bind to the first Gal residue on the polypeptide backbone of AGPs; thus, determining the position of subsequent Gal residues on the protein backbone, similar to the activity of human galectin domain-containing proteins.

Eight enzymes belonging to the GT31 family demonstrated the ability to place the first Gal residue onto Hyp residues in AGP core proteins. These enzymes are named GALT2, GALT3, GALT4, GALT5, GALT6, which are Clade 7 members, and HPGT1, HPGT2, and HPGT3, which are Clade 10 members. Preliminary enzyme substrate specificity studies demonstrated that another GT31 Clade 10 enzyme, At1g77810, had β-1,3-GalT activity. A GT31 Clade 10 gene, KNS4/UPEX1, encodes a β-1,3-GalT capable of synthesizing β-1,3-Gal linkages found in type II AGs present in AGPs and/or pectic rhamnogalacturonan I (RG-I). Another GT31 Clade 10 member, named GALT31A, encodes a β-1,6-GalT when heterologously expressed in E. coli and Nicotiana benthamiana and elongated β-1,6-galactan side chains of AGP glycans. GALT29A, a member of GT29 family was identified as being co-expressed with GALT31A and act co-operatively and form complexes.

Three members of GT14 named GlcAT14A, GlcAT14B, and GlcAT14C were reported to add GlcA to both β-1,6- and β-1,3-Gal chains in an in vitro enzyme assay following heterologous expression in Pichia pastoris. Two α-fucosyltransferase genes, FUT4 and FUT6, both belonging to GT37 family, encode enzymes which add α-1,2-fucose residues to AGPs. They appear to be partially redundant as they display somewhat different AGP substrate specificities. A GT77 family member, REDUCED ARABINOSE YARIV (RAY1), was found to be a β-arabinosyltransferase that adds a β-Araf to methyl β-Gal of a Yariv-precipitable wall polymer. More research is expected to functionally identify other genes involved in AGP glycosylation and their interactions with other plant cell wall components.

Biological roles
Human uses of AGPs include the use of gum arabic in the food and pharmaceutical industries because of natural properties in thickening and emulsification. AGPs in cereal grains have potential applications in biofortification, as sources of dietary fibre to support gut bacteria and protective agents against ethanol toxicity.

AGPs are found in a wide range of plant tissues, in secretions of cell culture medium of root, leaf, endosperm and embryo tissues, and some exudate producing cell types such as stylar canal cells. AGPs have been shown to regulate many aspects of plant growth and development including male-female recognition in reproduction organs, cell division and differentiation in embryo and post-embryo development, seed mucilage cell wall development, root salt tolerance and root-microbe interactions. These studies suggest that they are multifunctional, similar to what is found in mammalian proteoglycans/glycoproteins. Conventional methods to study functions of AGPs include the use of β-glycosyl (usually glucosyl) Yariv reagents and monoclonal antibodies (mAbs). β-Glycosyl Yariv reagents are synthetic phenylazo glycoside probes that specifically, but not covalently, bind to AGPs and can be used to precipitate AGPs from solution. They are also used commonly as histochemical stains to probe the locations and distribution of AGPs. A number of studies have shown that addition of β-Yariv reagents to plant growth medium can inhibit seedling growth, cell elongation, block somatic embryogenesis and fresh cell wall mass accumulation. The use of mAbs that specifically bind to carbohydrate epitopes of AGPs have also been employed to infer functions based on the location and pattern of the AGP epitopes. Commonly used mAb against AGPs include CCRC-M7, LM2, JIM8, JIM13 and JIM14.

The function of individual AGPs has largely been inferred through studies of mutants. For example, the Arabidopsis root-specific AtAGP30 was shown to be required for in vitro root regeneration suggesting a function in regenerating the root by modulating phytohormone activity. Studies of agp6 and agp11 mutants in Arabidopsis have demonstrated the importance of these AGPs to prevent uncontrolled generation of the pollen grain and for normal growth of the pollen tube. The functional mechanisms of AGPs in cell signalling is not well understood. One proposed model suggests AGPs can interact and control the release of calcium from AG glycan (via GlcA residues) to trigger downstream signalling pathways mediated by calcium. Another possible mechanism, largely based on the study of FLAs, suggests the combination of fasciclin domain and AG glycans can mediate cell-cell adhesion.

The functions of AGPs in plant growth and development processes rely heavily on the incredible diversity of their glycan and protein backbone moieties. In particular, it is the AG polysaccharides that are most likely to be involved in development. Most of the biological roles of AGPs have been identified through T-DNA insertional mutants characterization of genes or enzymes involved in AGP glycosylation, primarily in Arabidopsis thaliana. The galt2-6 single mutants revealed some physiological phenotypes under normal growth conditions, including reduced root hair length and density, reduced seed set, reduced adherent seed coat mucilage, and premature senescence. However, galt2galt5 double mutants showed more severe and pleiotropic physiological phenotypes than the single mutants with respect to root hair length and density and seed coat mucilage. Similarly, hpgt1hpgt2hpgt3 triple mutants showed several pleiotropic phenotypes including longer lateral roots, increased root hair length and density, thicker roots, smaller rosette leaves, shorter petioles, shorter inflorescence stems, reduced fertility, and shorter siliques. In the case of GALT31A, it has been found to be essential for embryo development in Arabidopsis. A T-DNA insertion in the 9th exon of GALT31A resulted in embryo lethality of this mutant line. Meanwhile, knockout mutants for KNS4/UPEX1 have collapsed pollen grains and abnormal pollen exine structure and morphology. In addition, kns4 single mutants exhibited reduced fertility, confirming that KNS4/UPEX1 is critical for pollen viability and development. Knockout mutants for FUT4 and FUT6 showed severe inhibition in root growth under salt conditions while knockout mutants for GlcAT14A, GlcAT14B, and GlcAT14C showed enhanced cell elongation rates in dark grown hypocotyls and light grown roots during seedling growth. In the case of ray1 mutant seedlings grown on vertical plates, the length of the primary root was affected by RAY1 mutation. In addition, the primary root of ray1 mutants grew with a slower rate compared to wild-type Arabidopsis. Taken together, these studies provide evidence that proper glycosylation of AGPs is important to AGP function in plant growth and development.

Human uses
Human uses of AGPs include the use of Gum arabic in the food and pharmaceutical industries because of natural properties in thickening and emulsification. AGPs in cereal grains have potential applications in biofortification, as sources of dietary fibre to support gut bacteria and protective agents against ethanol toxicity.