Ubiquitin-like protein

Ubiquitin-like proteins (UBLs) are a family of small proteins involved in post-translational modification of other proteins in a cell, usually with a regulatory function. The UBL protein family derives its name from the first member of the class to be discovered, ubiquitin (Ub), best known for its role in regulating protein degradation through covalent modification of other proteins. Following the discovery of ubiquitin, many additional evolutionarily related members of the group were described, involving parallel regulatory processes and similar chemistry. UBLs are involved in a widely varying array of cellular functions including autophagy, protein trafficking, inflammation and immune responses, transcription, DNA repair, RNA splicing, and cellular differentiation.

Discovery
Ubiquitin itself was first discovered in the 1970s and originally named "ubiquitous immunopoietic polypeptide". Subsequently, other proteins with sequence similarity to ubiquitin were occasionally reported in the literature, but the first shown to share the key feature of covalent protein modification was ISG15, discovered in 1987. A succession of reports in the mid 1990s is recognized as a turning point in the field, with the discovery of SUMO (small ubiquitin-like modifier, also known as Sentrin or SENP1) reported around the same time by a variety of investigators in 1996, NEDD8 in 1997, and Apg12 in 1998. A systematic survey has since identified over 10,000 distinct genes for ubiquitin or ubiquitin-like proteins represented in eukaryotic genomes.

Structure and classification
Members of the UBL family are small, non-enzymatic proteins that share a common structure exemplified by ubiquitin, which has 76 amino acid residues arranged into a "beta-grasp" protein fold consisting of a five-strand antiparallel beta sheet surrounding an alpha helix. The beta-grasp fold is widely distributed in other proteins of both eukaryotic and prokaryotic origin. Collectively, ubiquitin and ubiquitin-like proteins are sometimes referred to as "ubiquitons".

UBLs can be divided into two categories depending on their ability to be covalently conjugated to other molecules. UBLs that are capable of conjugation (sometimes known as Type I) have a characteristic sequence motif consisting of one to two glycine residues at the C-terminus, through which covalent conjugation occurs. Typically, UBLs are expressed as inactive precursors and must be activated by proteolysis of the C-terminus to expose the active glycine. Almost all such UBLs are ultimately linked to another protein, but there is at least one exception; ATG8 is linked to phosphatidylethanolamine. UBLs that do not exhibit covalent conjugation (Type II) often occur as protein domains genetically fused to other domains in a single larger polypeptide chain, and may be proteolytically processed to release the UBL domain or may function as protein-protein interaction domains. UBL domains of larger proteins are sometimes known as UBX domains.

Distribution
Ubiquitin is, as its name suggests, ubiquitous in eukaryotes; it is traditionally considered to be absent in bacteria and archaea, though a few examples have been described in archaea. UBLs are also widely distributed in eukaryotes, but their distribution varies among lineages; for example, ISG15, involved in the regulation of the immune system, is not present in lower eukaryotes. Other families exhibit diversification in some lineages; a single member of the SUMO family is found in the yeast genome, but there are at least four in vertebrate genomes, which show some functional redundancy, and there are at least eight in the genome of the model plant Arabidopsis thaliana.

In humans
The human genome encodes at least eight families of UBLs, not including ubiquitin itself, that are considered Type I UBLs and are known to covalently modify other proteins: SUMO, NEDD8, ATG8, ATG12, URM1, UFM1, FAT10, and ISG15. One additional protein, known as FUBI, is encoded as a fusion protein in the FAU gene, and is proteolytically processed to generate a free glycine C-terminus, but has not been experimentally demonstrated to form covalent protein modifications.

In plants
Plant genomes are known to encode at least seven families of UBLs in addition to ubiquitin: SUMO, RUB (the plant homolog of NEDD8), ATG8, ATG12, MUB, UFM1, and HUB1, as well as a number of Type II UBLs. Some UBL families and their associated regulatory proteins in plants have undergone dramatic expansion, likely due to both whole genome duplication and other forms of gene duplication; the ubiquitin, SUMO, ATG8, and MUB families have been estimated to account for almost 90% of plants' UBL genes. Proteins associated with ubiquitin and SUMO signaling are highly enriched in the genomes of embryophytes.

In prokaryotes
In comparison to eukaryotes, prokaryotic proteins with relationships to UBLs are phylogenetically restricted. Prokaryotic ubiquitin-like protein (Pup) occurs in some actinobacteria and has functions closely analogous to ubiquitin in labeling proteins for proteasomal degradation; however it is intrinsically disordered and its evolutionary relationship to UBLs is unclear. A related protein UBact in some Gram-negative lineages has recently been described. By contrast, the protein TtuB in bacteria of the genus Thermus does share the beta-grasp fold with eukaryotic UBLs; it is reported to have dual functions as both a sulfur carrier protein and a covalently conjugated protein modification. In archaea, the small archaeal modifier proteins (SAMPs) share the beta-grasp fold and have been shown to play a ubiquitin-like role in protein degradation. Recently, a seemingly complete set of genes corresponding to a eukaryote-like ubiquitin pathway was identified in an uncultured archaeon in 2011,  and at least three lineages of archaea—"Euryarchaeota", Thermoproteota (formerly Crenarchaeota), and "Aigarchaeota"—are believed to possess such systems. In addition, some pathogenic bacteria have evolved proteins that mimic those in eukaryotic UBL pathways and interact with UBLs in the host cell, interfering with their signaling function.

Regulation
Regulation of UBLs that are capable of covalent conjugation in eukaryotes is elaborate but typically parallel for each member of the family, best characterized for ubiquitin itself. The process of ubiquitination is a tightly regulated three-step sequence: activation, performed by ubiquitin-activating enzymes (E1); conjugation, performed by ubiquitin-conjugating enzymes (E2); and ligation, performed by ubiquitin ligases (E3). The result of this process is the formation of a covalent bond between the C-terminus of ubiquitin and a residue (typically a lysine) on the target protein. Many UBL families have a similar three-step process catalyzed by a distinct set of enzymes specific to that family. Deubiquitination or deconjugation - that is, removal of ubiquitin from a protein substrate - is performed by deubiquitinating enzymes (DUBs); UBLs can also be degraded through the action of ubiquitin-specific proteases (ULPs). The range of UBLs on which these enzymes can act is variable and can be difficult to predict. Some UBLs, such as SUMO and NEDD8, have family-specific DUBs and ULPs.

Ubiquitin is capable of forming polymeric chains, with additional ubiquitin molecules covalently attached to the first, which in turn is attached to its protein substrate. These chains may be linear or branched, and different regulatory signals may be sent by differences in the length and branching of the ubiquitin chain. Although not all UBL families are known to form chains, SUMO, NEDD8, and URM1 chains have all been experimentally detected. Additionally, ubiquitin can itself be modified by UBLs, known to occur with SUMO and NEDD8. The best-characterized intersections between distinct UBL families involve ubiquitin and SUMO.

Cellular functions
UBLs as a class are involved in a very large variety of cellular processes. Furthermore, individual UBL families vary in the scope of their activities and the diversity of the proteins to which they are conjugated. The best known function of ubiquitin is identifying proteins to be degraded by the proteasome, but ubiquitination can play a role in other processes such as endocytosis and other forms of protein trafficking, transcription and transcription factor regulation, cell signaling, histone modification, and DNA repair. Most other UBLs have similar roles in regulating cellular processes, usually with a more restricted known range than that of ubiquitin itself. SUMO proteins have the widest variety of cellular protein targets after ubiquitin and are involved in processes including transcription, DNA repair, and the cellular stress response. NEDD8 is best known for its role in regulating cullin proteins, which in turn regulate ubiquitin-mediated protein degradation, though it likely also has other functions. Two UBLs, ATG8 and ATG12, are involved in the process of autophagy; both are unusual in that ATG12 has only two known protein substrates and ATG8 is conjugated not to a protein but to a phospholipid, phosphatidylethanolamine.

Evolution
The evolution of UBLs and their associated suites of regulatory proteins has been of interest since shortly after they were recognized as a family. Phylogenetic studies of the beta-grasp protein fold superfamily suggest that eukaryotic UBLs are monophyletic, indicating a shared evolutionary origin. UBL regulatory systems - including UBLs themselves and the cascade of enzymes that interact with them - are believed to share a common evolutionary origin with prokaryotic biosynthesis pathways for the cofactors thiamine and molybdopterin; the bacterial sulfur transfer proteins ThiS and MoaD from these pathways share the beta-grasp fold with UBLs, while sequence similarity and a common catalytic mechanism link pathway members ThiF and MoeB to ubiquitin-activating enzymes. Interestingly, the eukaryotic protein URM1 functions as both a UBL and a sulfur-carrier protein, and has been described as a molecular fossil establishing this evolutionary link.

Comparative genomics surveys of UBL families and related proteins suggest that UBL signaling was already well-developed in the last eukaryotic common ancestor and ultimately originates from ancestral archaea, a theory supported by the observation that some archaeal genomes possess the necessary genes for a fully functioning ubiquitination pathway. Two different diversification events within the UBL family have been identified in eukaryotic lineages, corresponding to the origin of multicellularity in both animal and plant lineages.