Ribosomally synthesized and post-translationally modified peptides

Ribosomally synthesized and post-translationally modified peptides (RiPPs), also known as ribosomal natural products, are a diverse class of natural products of ribosomal origin. Consisting of more than 20 sub-classes, RiPPs are produced by a variety of organisms, including prokaryotes, eukaryotes, and archaea, and they possess a wide range of biological functions.

As a consequence of the falling cost of genome sequencing and the accompanying rise in available genomic data, scientific interest in RiPPs has increased in the last few decades. Because the chemical structures of RiPPs are more closely predictable from genomic data than are other natural products (e.g. alkaloids, terpenoids), their presence in sequenced organisms can, in theory, be identified rapidly. This makes RiPPs an attractive target of modern natural product discovery efforts.

Definition
RiPPs consist of any peptides (i.e. molecular weight below 10 kDa) that are ribosomally-produced and undergo some degree of enzymatic post-translational modification. This combination of peptide translation and modification is referred to as "post-ribosomal peptide synthesis" (PRPS) in analogy with nonribosomal peptide synthesis (NRPS).

Historically, the current sub-classes of RiPPs were studied individually, and common practices in nomenclature varied accordingly in the literature. More recently, with the advent of broad genome sequencing, it has been realized that these natural products share a common biosynthetic origin. In 2013, a set of uniform nomenclature guidelines were agreed upon and published by a large group of researchers in the field. Prior to this report, RiPPs were referred to by a variety of designations, including post-ribosomal peptides, ribosomal natural products, and ribosomal peptides.

The acronym "RiPP" stands for "ribosomally synthesized and post-translationally modified peptide".

Prevalence and applications
RiPPs constitute one of the major superfamilies of natural products, like alkaloids, terpenoids, and nonribosomal peptides, although they tend to be large, with molecular weights commonly in excess of 1000 Da. The advent of next-generation sequencing methods has made genome mining of RiPPs a common strategy. In part due to their increased discovery and hypothesized ease of engineering, the use of RiPPs as drugs is increasing. Although they are ribosomal peptides in origin, RiPPs are typically categorized as small molecules rather than biologics due to their chemical properties, such as moderate molecular weight and relatively high hydrophobicity.

The uses and biological activities of RiPPs are diverse.

RiPPs in commercial use include nisin, a food preservative, thiostrepton, a veterinary topical antibiotic, and nosiheptide and duramycin, which are animal feed additives. Phalloidin functionalized with a fluorophore is used in microscopy as a stain due to its high affinity for actin. Anantin is a RiPP used in cell biology as an atrial natriuretic peptide receptor inhibitor.

In 2012-2013, a derivatized RiPP in clinical trials was LFF571. Phase II clinical trials of LFF571, a derivative of the thiopeptide GE2270-A, for the treatment of Clostridium difficile infections, with comparable safety and efficacy to vancomycin, was terminated early as the results were unfavorable. Also recently in clinical trials was the NVB302 (a derivative of the lantibiotic actagardine) which is used for the treatment of Clostridium difficile infection. Duramycin has completed phase II clinical trials for the treatment of cystic fibrosis.

Other bioactive RiPPs include the antibiotics cyclothiazomycin and bottromycin, the ultra-narrow spectrum antibiotic plantazolicin, and the cytotoxin patellamide A. Streptolysin S, the toxic virulence factor of Streptococcus pyogenes, is also a RiPP. Additionally, human thyroid hormone itself is a RiPP due to its biosynthetic origin as thyroglobulin.

Amatoxins and phallotoxins
Amatoxins and phallotoxins are 8- and 7-membered natural products, respectively, characterized by N-to-C cyclization in addition to a tryptathionine motif derived from the crosslinking of Cys and Trp. The amatoxins and phallotoxins also differ from other RiPPs based on the presence of a C-terminal recognition sequence in addition to the N-terminal leader peptide. α-Amanitin, an amatoxin, has a number of posttranslational modifications in addition to macrocyclization and formation of the tryptathionine bridge: oxidation of the tryptathionine leads to the presence of a sulfoxide, and numerous hydroxylations decorate the natural product. As an amatoxin, α-amanitin is an inhibitor of RNA polymerase II.

Bottromycins
Bottromycins contain a C-terminal decarboxylated thiazole in addition to a macrocyclic amidine.

There are currently six known bottromycin compounds, which differ in the extent of side chain methylation, an additional characteristic of the bottromycin class. The total synthesis of bottromycin A2 was required to definitively determine the structure of the first bottromycin.

Thus far, gene clusters predicted to produce bottromycins have been identified in the genus Streptomyces. Bottromycins differ from other RiPPs in that there is no N-terminal leader peptide. Rather, the precursor peptide has a C-terminal extension of 35-37 amino acids, hypothesized to act as a recognition sequence for posttranslational machinery.

Cyanobactins
Cyanobactins are diverse metabolites from cyanobacteria with N-to-C macrocylization of a 6–20 amino acid chain. Cyanobactins are natural products isolated from cyanobacteria, and close to 30% of all cyanobacterial strains are thought to contain cyanobacterial gene clusters. However, while thus far all cyanobactins are credited to cyanobacteria, there exists the possibility that other organisms could produce similar natural products.

The precursor peptide of the cyanobactin family is traditionally designated the "E" gene, whereas precursor peptides are designated gene "A" in most RiPP gene clusters. "A" is a serine protease involved in cleavage of the leader peptide and subsequent macrocyclization of the peptide natural product, in combination with an additional serine protease homologue, the encoded by gene "G". Members of the cyanobactin family may bear thiazolines/oxazolines, thiazoles/oxazoles, and methylations depending on additional modification enzymes. For example, perhaps the most famous cyanobactin is patellamide A, which contains two thiazoles, a methyloxazoline, and an oxazoline in its final state, a macrocycle derived from 8 amino acids.

Lanthipeptides
Lanthipeptides are one of the most well-studied families of RiPPs. The family is characterized by the presence of lanthionine (Lan) and 3-methyllanthionine (MeLan) residues in the final natural product. There are four major classes of lanthipeptides, delineated by the enzymes responsible for installation of Lan and MeLan. The dehydratase and cyclase can be two separate proteins or one multifunctional enzyme. Previously, lanthipeptides were known as "lantipeptides" before a consensus was reached in the field.

Lantibiotics are lanthipeptides that have known antimicrobial activity. The founding member of the lanthipeptide family, nisin, is a lantibiotic that has been used to prevent the growth of food-born pathogens for over 40 years.

Lasso peptides
Lasso peptides are short peptides containing an N-terminal macrolactam macrocycle "ring" through which a linear C-terminal "tail" is threaded. Because of this threaded-loop topology, these peptides resemble lassos, giving rise to their name. They are a member of a larger class of amino-acid-based lasso structures. Additionally, lasso peptides are formally rotaxanes.

The N-terminal "ring" can be from 7 to 9 amino acids long and is formed by an isopeptide bond between the N-terminal amine of the first amino acid of the peptide and the carboxylate side chain of an aspartate or glutamate residue. The C-terminal "tail" ranges from 7 to 15 amino acids in length.

The first amino acid of lasso peptides is almost invariably glycine or cysteine, with mutations at this site not being tolerated by known enzymes. Thus, bioinformatics-based approaches to lasso peptide discovery have thus used this as a constraint. However, some lasso peptides were recently discovered that also contain serine or alanine as their first residue.

The threading of the lasso tail is trapped either by disulfide bonds between ring and tail cysteine residues (class I lasso peptides), by steric effects due to bulky residues on the tail (class II lasso peptides), or both (class III lasso peptides). The compact structure makes lasso peptides frequently resistant to proteases or thermal unfolding.

Linear azol(in)e-containing peptides
Linear azole(in)e-containing peptides (LAPs) contain thiazoles and oxazoles, or their reduced thiazoline and oxazoline forms. Thiazol(in)es are the result of cyclization of Cys residues in the precursor peptide, while (methyl)oxazol(in)es are formed from Thr and Ser. Azole and azoline formation also modifies the residue in the -1 position, or directly C-terminal to the Cys, Ser, or Thr. A dehydrogenase in the LAP gene cluster is required for oxidation of azolines to azoles.

Plantazolicin is a LAP with extensive cyclization. Two sets of five heterocycles endow the natural product with structural rigidity and unusually selective antibacterial activity. Streptolysin S (SLS) is perhaps the most well-studied and most famous LAP, in part because the structure is still unknown since the discovery of SLS in 1901. Thus, while the biosynthetic gene cluster suggests SLS is a LAP, structural confirmation is lacking.

Microcins
Microcins are all RiPPs produced by Enterobacteriaceae with a molecular weight <10 kDa. Many members of other RiPP families, such as microcin E492, microcin B17 (LAP) and microcin J25 (Lasso peptide) are also considered microcins. Instead of being classified based on posttranslational modifications or modifying enzymes, microcins are instead identified by molecular weight, native producer, and antibacterial activity. Microcins are either plasmid- or chromosome-encoded, but specifically have activity against Enerobacteriaceae. Because these organisms are also often producers of microcins, the gene cluster contains not only a precursor peptide and modification enzymes, but also a self-immunity gene to protect the producing strain, and genes encoding export of the natural product.

Microcins have bioactivity against Gram-negative bacteria but usually display narrow-spectrum activity due to hijacking of specific receptors involved in the transport of essential nutrients.

Thiopeptides
Most of the characterized thiopeptides have been isolated from Actinobacteria. General structural features of thiopeptide macrocycles, are dehydrated amino acids and thiazole rings formed from dehydrated serine/threonine and cyclized cysteine residues, respectively

The thiopeptide macrocycle is closed with a six-membered nitrogen-bearing ring. Oxidation state and substitution pattern of the nitrogenous ring determines the series of the thiopeptide natural product. While the mechanism of macrocyclization is not known, the nitrogenous ring can exist in thiopeptides as a piperidine, dehydropiperidine, or a fully oxidized pyridine. Additionally, some thiopeptides bear a second macrocycle, which bears a quinaldic acid or indolic acid residue derived from tryptophan. Perhaps the most well-characterized thiopeptide, thiostrepton A, contains a dehydropiperidine ring and a second, quinaldic acid-containing macrocycle. Four residues are dehydrated during posttranslational modification, and the final natural product also bears four thiazoles and one azoline.

Other RiPPs
Autoinducing Peptides (AIPs) and quorum sensing peptides are used as signaling molecules in the process called quorum sensing. AIPs are characterized by the presence of a cyclic ester or thioester, unlike other regulatory peptides that are linear. In pathogens, exported AIPs bind to extracellular receptors that trigger the production of virulence factors. In Staphylococcus aureus, AIPs are biosynthesized from a precursor peptide composed of a C-terminal leader region, the core region, and negatively charged tail region that is, along with the leader peptide, cleaved before AIP export.

Bacterial Head-to-Tail Cyclized Peptides refers exclusively to ribosomally synthesized peptides with 35-70 residues and a peptide bond between the N- and C-termini, sometimes referred to as bacteriocins, although this term is used more broadly. The distinctive nature of this class is not only the relatively large size of the natural products but also the modifying enzymes responsible for macrocyclization. Other N-to-C cyclized RiPPs, such as the cyanobactins and orbitides, have specialized biosynthetic machinery for macrocylization of much smaller core peptides. Thus far, these bacteriocins have been identified only in Gram-positive bacteria. Enterocin AS-48 was isolated from Enterococcus and, like other bacteriocins, is relatively resistant to high temperature, pH changes, and many proteases as a result of macrocyclization. Based on solution structures and sequence alignments, bacteriocins appear to take on similar 3D structures despite little sequence homology, contributing to stability and resistance to degradation.

Conopeptides and other toxoglossan peptides are the components of the venom of predatory marine snails, such as the cone snails or Conus. Venom peptides from cone snails are generally smaller than those found in other animal venoms (10-30 amino acids vs. 30-90 amino acids) and have more disulfide crosslinks. A single species may have 50-200 conopeptides encoded in its genome, recognizable by a well-conserved signal sequence.

Cyclotides are RiPPs with a head-to-tail cyclization and three conserved disulfide bonds that form a knotted structure called a cyclic cysteine knot motif. No other posttranslational modifications have been observed on the characterized cyclotides, which are between 28 - 37 amino acids in size. Cyclotides are plant natural products and the different cyclotides appear to be species-specific. While many activities have been reported for cyclotides, it has been hypothesized that all are united by a common mechanism of binding to and disrupting the cell membrane.

Glycocins are RiPPS that are glycosylated antimicrobial peptides. Only two members have been fully characterized, making this a small RiPP class. Sublancin 168 and glycocin F are both Cys-glycosylated and, in addition, have disulfide bonds between non-glycosylated Cys residues. While both members bear S-glycosyl groups, RiPPs bearing O- or N-linked carbohydrates will also be included in this family as they are discovered.

Linaridins are characterized by C-terminal aminovinyl cysteine residues. While this posttranslational modification is also seen in the lanthipeptides epidermin and mersacidin, linaridins do not have Lan or MeLan residues. In addition, the linaridin moiety is formed from modification of two Cys residues, whereas lanthipeptide aminovinyl cysteines are formed from Cys and dehydroalanine (Dha). The first linaridin to be characterized was cypemycin.

Microviridins are cyclic N-acetylated trideca- and tetradecapeptides with ω-ester and/or ω-amide bonds. Lactone formation through glutamate or aspartate ω-carboxy groups and the lysine ε-amino group forms macrocycles in the final natural product.

Orbitides are plant-derived N-to-C cyclized peptides with no disulfide bonds. Also referred to as Caryophyllaceae-like homomonocyclopeptides, orbitides are 5-12 amino acids in length and are composed of mainly hydrophobic residues. Similar to the amatoxins and phallotoxins, the gene sequences of orbitides suggest the presence of a C-terminal recognition sequence. In the flaxseed variety Linum usitatissimum, a precursor peptide was found using Blast searching that potentially contains five core peptides separated by putative recognition sequences.

Proteusins are named after "Proteus", a Greek shape-shifting sea god. Until now, the only known members in the family of Proteusins are called polytheonamides. They were originally presumed to be nonribosomal natural products due to the presence of many D-amino acids and other non-proteinogenic amino acids. However, a metagenomic study revealed the natural products as the most extensively modified class of RiPPs known to date. Six enzymes are responsible for installing a total of 48 posttranslational modifications onto the polytheonamide A and B precursor peptides, including 18 epimerizations. Polytheonamides are exceptionally large, as a single molecule is able to span a cell membrane and form an ion channel.

Sactipeptides contain intramolecular linkages between the sulfur of Cys residues and the α-carbon of another residue in the peptide. A number of nonribosomal peptides bear the same modification. In 2003, the first RiPP with a sulfur-to-α-carbon linkage was reported when the structure of subtilosin A was determined using isotopically enriched media and NMR spectroscopy. In the case of subtilosin A, isolated from Bacillus subtilis 168, the Cα crosslinks between Cys4 and Phe31, Cys7 and Thr28, and Cys13 and Phe22 are not the only posttranslational modifications; the C- and N-termini form an amide bond, resulting in a circular structure that is conformationally restricted by the Cα bonds. Sactipeptides with antimicrobial activity are commonly referred to as sactibiotics (sulfur to alpha-carbon antibiotic).

Biosynthesis
RiPPs are characterized by a common biosynthetic strategy wherein genetically-encoded peptides undergo translation and subsequent chemical modification by biosynthetic enzymes.

Common features
All RiPPs are synthesized first at the ribosome as a precursor peptide. This peptide consists of a core peptide segment which is typically preceded (and occasionally followed) by a leader peptide segment and is typically ~20-110 residues long. The leader peptide is usually important for enabling enzymatic processing of the precursor peptide via aiding in recognition of the core peptide by biosynthetic enzymes and for cellular export. Some RiPPs also contain a recognition sequence C-terminal to the core peptide; these are involved in excision and cyclization. Additionally, eukaryotic RiPPs may contain a signal segment of the precursor peptide which helps direct the peptide to cellular compartments.

During RiPP biosynthesis, the unmodified precursor peptide (containing an unmodified core peptide, UCP) is recognized and chemically modified sequentially by biosynthetic enzymes (PRPS). Examples of modifications include dehydration (i.e. lanthipeptides, thiopeptides), cyclodehydration (i.e. thiopeptides), prenylation (i.e. cyanobactins), and cyclization (i.e. lasso peptides), among others. The resulting modified precursor peptide (containing a modified core peptide, MCP) then undergoes proteolysis, wherein the non-core regions of the precursor peptide are removed. This results in the mature RiPP.

Nomenclature
Papers published prior to a recent community consensus employ differing sets of nomenclature. The precursor peptide has been referred to previously as prepeptide, prepropeptide, or structural peptide. The leader peptide has been referred to as a propeptide, pro-region, or intervening region. Historical alternate terms for core peptide included propeptide, structural peptide, and toxin region (for conopeptides, specifically).

Lanthipeptides
Lanthipeptides are characterized by the presence lanthionine (Lan) and 3-methyllanthionine (MeLan) residues. Lan residues are formed from a thioether bridge between Cys and Ser, while MeLan residues are formed from the linkage of Cys to a Thr residue. The biosynthetic enzymes responsible for Lan and MeLan installation first dehydrate Ser and Thr to dehydroalanine (Dha) and dehydrobutyrine (Dhb), respectively. Subsequent thioether crosslinking occurs through a Michael-type addition by Cys onto Dha or Dhb.

Four classes of lanthipeptide biosynthetic enzymes have been designated. Class I lanthipeptides have dedicated lanthipeptide dehydratases, called LanB enzymes, though more specific designations are used for particular lanthipeptides (e.g. NisB is the nisin dehydratase). A separate cyclase, LanC, is responsible for the second step in Lan and MeLan biosynthesis. However, class II, III, and IV lanthipeptides have bifunctional lanthionine synthetases in their gene clusters, meaning a single enzyme carries out both dehydration and cyclization steps. Class II synthetases, designated LanM synthetases, have N-terminal dehydration domains with no sequence homology to other lanthipeptide biosynthetic enzymes; the cyclase domain has homology to LanC. Class III (LanKC) and IV (LanL) enzymes have similar N-terminal lyase and central kinase domains, but diverge in C-terminal cyclization domains: the LanL cyclase domain is homologous to LanC, but the class III enzymes lack Zn-ligand binding domains.

Linear azol(in)e-containing peptides


The hallmark of linear azol(in)e-containing peptide (LAP) biosynthesis is the formation of azol(in)e heterocycles from the nucleophilic amino acids serine, threonine, or cysteine. This is accomplished by three enzymes referred to as the B, C, and D proteins; the precursor peptide is referred to as the A protein, as in other classes.

The C protein is mainly involved in leader peptide recognition and binding and is sometimes called a scaffolding protein. The D protein is an ATP-dependent cyclodehydratase that catalyzes the cyclodehydration reaction, resulting in formation of an azoline ring. This occurs by direct activation of the amide backbone carbonyl with ATP, resulting in stoichiometric ATP consumption. The C and D proteins are occasionally present as a single, fused protein, as is the case for trunkamide biosynthesis. The B protein is a flavin mononucleotide (FMN)-dependent dehydrogenase which oxidizes certain azoline rings into azoles.

The B protein is typically referred to as the dehydrogenase; the C and D proteins together form the cyclodehydratase, although the D protein alone performs the cyclodehydration reaction. Early work on microcin B17 adopted a different nomenclature for these proteins, but a recent consensus has been adopted by the field as described above.

Cyanobactins
Cyanobactin biosynthesis requires proteolytic cleavage of both N-terminal and C-terminal portions of the precursor peptide. The defining proteins are thus an N-terminal protease, referred to as the A protein, and a C-terminal protease, referred to as the G protein. The G protein is also responsible for macrocyclization.

For cyanobactins, the precursor peptide is referred to as the E peptide. Minimally, the E peptide requires a leader peptide region, a core (structural) region, and both N-terminal and C-terminal protease recognition sequences. In contrast to most RiPPs, for which a single precursor peptide encodes a single natural product via a lone core peptide, cyanobactin E peptides can contain multiple core regions; multiple E peptides can even be present in a single gene cluster.

Many cyanobactins also undergo heterocyclization by a heterocyclase (referred to as the D protein), installing oxazoline or thiazoline moieties from Ser/Thr/Cys residues prior to the action of the A and G proteases. The heterocyclase is an ATP-dependent YcaO homologue that behaves biochemically in the same manner as YcaO-domain cyclodehydratases in thiopeptide and linear azol(in)e-containing peptide (LAP) biosynthesis (described above).

A common modification is prenylation of hydroxyl groups by an F protein prenyltransferase. Oxidation of azoline heterocycles to azoles can also be accomplished by an oxidase domain located on the G protein. Unusual for ribosomal peptides, cyanobactins can include D-amino acids; these can occur adjacent to azole or azoline residues. The functions of some proteins found commonly in cyanobactin biosynthetic gene clusters, the B and C proteins, are unknown.

Thiopeptides
Thiopeptide biosynthesis involves particularly extensive modification of the core peptide scaffold. Indeed, due to the highly complex structures of thiopeptides, it was commonly thought that these natural products were nonribosomal peptides. Recognition of the ribosomal origin of these molecules came in 2009 with the independent discovery of the gene clusters for several thiopeptides.

The standard nomenclature for thiopeptide biosynthetic proteins follows that of the thiomuracin gene cluster. In addition to the precursor peptide, referred to as the A peptide, thiopeptide biosynthesis requires at least six genes. These include lanthipeptide-like dehydratases, designated the B and C proteins, which install dehydroalanine and dehydrobutyrine moieties by dehydrating Ser/Thr precursor residues. Azole and azoline synthesis is effected by the E protein, the dehydrogenase, and the G protein, the cyclodehydratase. The nitrogen-containing heterocycle is installed by the D protein cyclase via a putative [4+2] cycloaddition of dehydroalanine moieties to form the characteristic macrocycle. The F protein is responsible for binding of the leader peptide.

Thiopeptide biosynthesis is biochemically similar to that of cyanobactins, lanthipeptides, and linear azol(in)e-containing peptides (LAPs). As with cyanobactins and LAPs, azole and azoline synthesis occurs via the action of an ATP-dependent YcaO-domain cyclodehydratase. In contrast to LAPs, where cyclodehydration occurs via the action of two distinct proteins responsible for leader peptide binding and cyclodehydrative catalysis, these are fused into a single protein (G protein) in cyanobactin and thiopeptide biosynthesis. However, in thiopeptides, an additional protein, designated the Ocin-ThiF-like protein (F protein) is necessary for leader peptide recognition and potentially recruiting other biosynthetic enzymes.

Lasso peptides
Lasso peptide biosynthesis requires at least three genes, referred to as the A, B, and C proteins. The A gene encodes the precursor peptide, which is modified by the B and C proteins into the mature natural product. The B protein is an adenosine triphosphate-dependent cysteine protease that cleaves the leader region from the precursor peptide. The C protein displays homology to asparagine synthetase and is thought to activate the carboxylic acid side chain of a glutamate or aspartate residue via adenylylation. The N-terminal amine formed by the B protein (protease) then reacts with this activated side chain to form the macrocycle-forming isopeptide bond. The exact steps and reaction intermediates in lasso peptide biosynthesis remain unknown due to experimental difficulties associated with the proteins. Commonly, the B protein is referred to as the lasso protease, and the C protein is referred to as the lasso cyclase.

Some lasso peptide biosynthetic gene clusters also require an additional protein of unknown function for biosynthesis. Additionally, lasso peptide gene clusters usually include an ABC transporter (D protein) or an isopeptidase, although these are not strictly required for lasso peptide biosynthesis and are sometimes absent. No X-ray crystal structure is yet known for any lasso peptide biosynthetic protein.

The biosynthesis of lasso peptides is particularly interesting due to the inaccessibility of the threaded-lasso topology to chemical peptide synthesis.