Aldehyde tag

An aldehyde tag is a short peptide tag that can be further modified to add fluorophores, glycans, PEG (polyethylene glycol) chains, or reactive groups for further synthesis. A short, genetically-encoded peptide with a consensus sequence LCxPxR is introduced into fusion proteins, and by subsequent treatment with the formylglycine-generating enzyme (FGE), the cysteine of the tag is converted to a reactive aldehyde group. This electrophilic group can be targeted by an array of aldehyde-specific reagents, such as aminooxy- or hydrazide-functionalized compounds.

Development
The aldehyde tag is an artificial peptide tag recognized by the formylglycine-generating enzyme (FGE). Formylglycine is a glycine with a formyl group (-CHO) at the α-carbon. The sulfatase motif is the basis for the sequence of the peptide which results in the site-specific conversion of a cysteine to a formylglycine residue. The peptide tag was engineered after studies on FGE recognizable sequences in sulfatases from different organisms revealed a high homology in the sulfatase motif in bacteria, archaea as well as eukaryotes. Aldehydes and ketones are used as chemical reporters due to their electrophilic properties. These properties enable a reaction under mild conditions when using a strong nucleophilic coupling partner. Typically, hydrazides and aminooxy probes are used in bioconjugation by forming stabilized addition products with carbonyl groups that are favored under the physiological reaction conditions. At neutral pH, the equilibrium of Schiff base formation lies far to the reactant side. To form stable hydrazones and oximes, compound derivatives are used to yield more product. Since the pH optimum of 4 to 6 cannot be achieved by adding a catalyst due to associated toxicity, the reaction is slow in live cells. A typical reaction constant is 10−4 to 10−3 M−1 s−1. A carbonyl group is introduced into proteins as a chemical reporter using various techniques, including methods like stop codon suppression and aldehyde tagging. Limiting the use of aldehydes and ketones is their restricted bioorthogonality in certain cellular environments. Limitations of aldehydes and ketones as chemical reporters include:
 * Competition with endogenous aldehydes or ketones in metabolites and cofactors, resulting in low yields and impaired specificity.
 * Side reactions, such as oxidation or unwanted addition of endogenous nucleophiles.
 * Restrained set of probes that form sufficiently stable products.

Aldehydes and ketones are therefore best used in compartments where such unwanted side reactions are decreased. For experiments with live cells, cell surfaces and extracellular space are typical fielding areas. Nevertheless, a feature of carbonyl groups is the vast number of organic reactions that involve them as electrophiles. Some of these reactions are readily convertible to ligations for probing aldehydes. A reaction recently employed for bioconjugation by Agarwal et al. is the adaptation of the Pictet-Spengler reaction as a ligation. The reaction is known from natural product biosynthetic pathways and has the major advantage of forming a new carbon-carbon bond. This guarantees long-term stability compared to carbon-heteroatom bonds with similar reaction kinetics. The modification of cysteine or, more rarely, serine by FGE is an uncommon posttranslational modification that was discovered in the late 1990s. The deficiency of FGE leads to an overall deficiency of functional sulfatases due to a lack of α-formylglycine formation vital for the sulfatases to perform their function. FGE is essential for protein modification and need of high specificity and conversion rate is given in the native setting, which makes this reaction applicable in chemical and synthetic biology. Aldehyde tags were first inserted into the modified sulfatase motif peptide for proteins of interest in 2007. Since then, similar usage of aldehydes and ketones as chemical reporters in bioorthogonal applications has been demonstrated in self-assembly of cell-lysing drugs, the targeting of proteins, as well as glycans and the preparation of heterobifunctional fusion proteins.

Genetically encoding the aldehyde tag
The formylglycine tag or aldehyde tag is a convenient 6- or 13-amino acids long tag fused to a protein of interest. The 6-mer tag represents the small core consensus sequence and the 13-mer tag the longer full motif. The experiments on the genetically encoded aldehyde tag by clearly showed the high conversion efficiency with only the core consensus sequence present. Four proteins were produced recombinantly in E.coli with an 86% efficiency of for the full-length motif and >90% efficiency for the 6-mer determined by mass spectrometry. The size of the sequence is analogous to the commonly used 6x His-Tag and has the advantage that it can also be genetically encoded. The sequence is recognized in the ER solely depending on primary sequence and subsequently targeted by FGE. Notably, in the setup of recombinant expression proteins in E. coli a coexpression of exogenous FGE aids full conversion, although E. coli has endogenous FGE-activity. The introduction of an aldehyde tag has a workflow that consists of three segments: A the expression of the fusion protein, that carries the peptide tag derived from the sulfatase motif, B the enzymatic conversion of Cys to f(Gly) and C the bioorthogonal probing with hydrazides or alkoxy amines (Fig. 1).



As seen in Fig. 1, the engineered aldehyde tag consists of six amino acids. A set of organisms from all domains of life was chosen and the sequence homology of the sulfatase motif was determined. The sequence used is the best consensus for sequences found in bacteria, archaea, worms and higher vertebrates.

FGE-mechanism of cysteine-formylglycine conversion
The catalytic mechanism of FGE is well studied. A multistep redox reaction with a covalent enzyme: substrate intermediate is proposed. The role of the cysteine residue for the occurring conversion was studied by mutating the cysteine to alanine. No conversion was found using mass spectrometry when the mutated peptide tag was used. The mechanism shows the important role of the redox active thiol group of cysteine in the formation of f(Gly), as seen in Fig. 2. The key step of the catalytic cycle is the monooxidation of the cysteine residue of the enzyme, forming a reactive sulfenic acid intermediate. Subsequently, the hydroxyl group is transferred to the cysteine of the substrate and after hetero-analogous β-elimination of H2O, a thioaldehyde is formed. This compound is very reactive and easily hydrolyzed, releasing the aldehyde and a molecule of H2S,



Applications
The aldehyde tag is a technique which recently found increased application because of the introduction of bioorthogonal chemical reporters. Bioorthogonal agents contain functional groups such as azides or cyclooctynes for coupling which are not naturally found in the cell. Due to their foreignness, they seem inert and do not disrupt the native metabolism, Fig. 3 gives an overview of possible labeling methods for formylglycine. For example, it can be coupled to probes such as biotin or a protein tag like Flag that are useful for purification and detection. Furthermore, fluorophores can be directly conjugated for live cell imaging. The conjugation of polyethylene glycol (PEG) chains to potential drug candidates extends the stability against proteases in body fluids and at the same time reduces renal clearance and immunogenicity. The first application described here, deals with the formation of protein-protein conjugates through bioorthogonal probes. Since, the aldehyde tag is strictly speaking not a true bioorthogonal agent as it can be found in various metabolites, it can cause cross reactions during protein labeling. However, coupling bioorthogonal probes such as azides or cyclooctynes can be applied to overcome this obstacle. As a second application, the coupling of glycan moieties to proteins is presented here. It can be utilised in the strategy of chemically introduced glycosylation patterns.



Forming protein-protein conjugates via Cu-free click chemistry
Studies have explored the strategy of producing protein-protein conjugates with the help of the aldehyde tag. Their aim was to connect full length human IgG (hIgG) to the human growth hormone (hGH). These protein-protein conjugates can be superior to monomeric proteins in terms of serum half life in protein therapeutics and, additionally, have appealing dual binding properties. In order to achieve protein fusion, the five-residue aldehyde tag (CxPxR) was incooperated into hIgG and hGH. In hIgG, the aldehyde tag was introduced at the C termini of the two heavy chains, resulting in two possible conjugation sites. FGE then oxidizes the cysteine residue to formylglycine (fGly) during protein expression. For the subsequent conjugation steps, the strategy of the copper-free click chemistry was selected. A strain-promoted 1,3-dipolar cycloaddition of a cyclooctynes and an azide was carried out forming a covalent linkage (also termed the Cu-free azide-alkyne cycloaddition). Thus, the aldehyde bearing proteins react under oxime formation with different heterobifunctional linkers which carry an aminooxy residue on one end and either an azide or cyclooctynes on the other. This results in the attachment of hIgG to a linker containing a cyclooctyne (here dibenzoazacyclooctyne (DIBAC)) and hGH to a linker holding an azide function (Fig.: 2A and B). The proteins hGH and hIgG were also treated with DIBAC-488, azide Alexa Fluor 647 and analysed by SDS-PAGE and Western blot to validate oxime formation. Next, the DIBAC-hIgG and azide-hGH derivatives are joined by Cu-free click chemistry (Fig.: 2C). The resulting fusion proteins were purified and analyzed by immunoblot (see Hudak et al. 2012).



The Western blots were first stained with Ponceau and then incubated with IgG antibodies against hGH and subsequently treated with α-mIgG HRP and α-hIgG 647 for visualisation. In the hIgG-hGH conjugate Western blot (nonreducing conditions), two separate bands with different molecular weights are visible after immunodetection. These can be contributed to the formation of mono- and bi-conjugated hGH to hIgG.

Chemical glycosylation of the IgG Fc fragment
Nature has perfected glycosylation of proteins through a complex interaction of enzymes and carbohydrates over thousands of years. However, chemical glycosylation is still an obstacle due to the difficult synthesis of glycan in general. The synthesis of carbohydrate derivatives can be slow and tedious. Nonetheless, the interest in technologies to structurally mimic protein glycosylation is an appealing application as some protein functions solely depend on the pattern of the attached glycan. The Fc fragment of the IgG antibody, for example, is a homodimer with a highly conserved N-glycosylation site. The attached sugar moieties modulate the binding to specific immunoreceptors, thereby modifying the whole antibody function.

Smith et al. demonstrate the application of the aldehyde tag as a chemical conjugation site for glycans. The aldehyde tag sequence was incooperated into the Fc construct and introduced into CHO (Chinese hamster ovary) cells. As controls, gene constructs were used in which the cysteine residue was mutated to an alanine. After expression, the Fc proteins were purified using a protein A/G agarose column. The conversion in CHO cells of cystein to formylglycine was examined using aminooxy AlexaFluor 488 and subsequent SDS-PAGE. However, fluorescence scanning displayed no fluorescence labeling, i.e. no formylglycine formation by endogenous FGE in CHO cells. The unaltered proteins were then treated with recombinant FGE from Mycobacterium tuberculosis in vitro in which the aldehyde group was successfully installed at the glycosylation site of Fc (Fig. 3A).

Next, the introduction of N-acetylglucoseamine (GlcNAc) to the aldehyde tagged proteins via oxime formation was carried out through the treatment with aminooxy GlcNAc (AO-GlcNAc) (Fig. 3B). The conjugation was confirmed by liquid chromatography-electrospray ionisation-mass spectrometry (LC-ESI-MS) and lectin blot with the GlcNAc-binding wheat germ agglutinin attached to AlexaFluor 647. Having successfully introduced GlcNAc, the monomer was extended with a glycan structure containing GlcNAc, mannose (Man) and galactose (Gal) (Fig. 3C). A mutant endoglycosidase EndoS (EndoS-D233Q) was utilised as it is highly specific for IgG Fc N-linked GlcNAc residues and does not elongate Asn-GlcNAc sites on other proteins or on denatured IgGs. Product formation was again monitored by LC-ESI-MS and lectin blot probing, with the sialic acid-binding sambucus nigra agglutinin attached to fluorescein isothiocyanate.

A successful chemical glycosylation of the Fc IgG fragment was achieved which resembles the natural occurring glycosylation pattern. The study discussed above focused on the IgG antibody, however, the application of the aldehyde tag for glycan conjugation could potentially be extended to other proteins.