Papain-like protease

Papain-like proteases (or papain-like (cysteine) peptidases; abbreviated PLP or PLCP) are a large protein family of cysteine protease enzymes that share structural and enzymatic properties with the group's namesake member, papain. They are found in all domains of life. In animals, the group is often known as cysteine cathepsins or, in older literature, lysosomal peptidases. In the MEROPS protease enzyme classification system, papain-like proteases form Clan CA. Papain-like proteases share a common catalytic dyad active site featuring a cysteine amino acid residue that acts as a nucleophile.

The human genome encodes eleven cysteine cathepsins which have a broad range of physiological functions. In some parasites papain-like proteases have roles in host invasion, such as cruzipain from Trypanosoma cruzi. In plants, they are involved in host defense and in development. Studies of papain-like proteases from prokaryotes have lagged their eukaryotic counterparts. In cellular organisms they are synthesized as preproenzymes that are not enzymatically active until mature, and their activities are tightly regulated, often by the presence of endogenous protease inhibitors such as cystatins. In many RNA viruses, including significant human pathogens such as the coronaviruses SARS-CoV and SARS-CoV-2, papain-like protease protein domains often have roles in processing of polyproteins into mature viral nonstructural proteins. Many papain-like proteases are considered potential drug targets.

Classification
The MEROPS system of protease enzyme classification defines clan CA as containing the papain-like proteases. They are thought to have a shared evolutionary origin. As of 2021, the clan contained 45 families.

Structure
The structure of papain was among the earliest protein structures experimentally determined by X-ray crystallography. Many papain-like protease enzymes function as monomers, though a few, such as cathepsin C (Dipeptidyl-peptidase I), are homotetramers. The mature monomer structure is characteristically divided into two lobes or subdomains, known as the L-domain (N-terminal) and the R-domain (C-terminal), where the active site is located between them. The L-domain is primarily helical while the R-domain contains beta-sheets in a beta-barrel-like shape, surrounded by a helix. The enzyme substrate interacts with both domains in an extended conformation.

Papain-like proteases are often synthesized as preproenzymes, or enzymatically inactive precursors. A signal peptide at the N-terminus, which serves as a subcellular localization signal, is cleaved by signal peptidase to form a zymogen. Post-translational modification in the form of N-linked glycosylation also occurs in parallel. The zymogen is still inactive due to the presence of a propeptide which functions as an inhibitor blocking access to the active site. The propeptide is removed by proteolysis to form the mature enzyme.

Catalytic mechanism
Papain-like proteases have a catalytic dyad consisting of a cysteine and a histidine residue, which form an ion pair through their charged thiolate and imidazolium side chains. The negatively charged cysteine thiolate functions as a nucleophile. Additional neighboring residues - aspartate, asparagine, or glutamine - position the catalytic residues; in papain, the required catalytic residues cysteine, histidine, and aspartate are sometimes called the catalytic triad (similar to serine proteases). Papain-like proteases are usually endopeptidases, but some members of the group are also, or even exclusively, exopeptidases. Some viral papain-like proteases, including those of coronaviruses, can also cleave isopeptide bonds and can function as deubiquitinases.

Mammals
In animals, especially in mammalian biology, members of the papain-like protease family are usually referred to as cysteine cathepsins - that is, the cysteine protease members of the group of proteases known as cathepsins (which includes cysteine, serine, and aspartic proteases). In humans, there are 11 cysteine cathepsins: B, C, F, H, K, L, O, S, V, X, and W. Most cathepsins are expressed throughout the body, but some have narrower tissue distribution.

Although historically known as lysosomal proteases and studied mainly for their role in protein catabolism, cysteine cathepsins have since been identified playing major roles in a number of physiological processes and disease states. As part of normal physiological processes, they are involved in key steps of antigen presentation as part of the adaptive immune system, remodeling of the extracellular matrix, differentiation of keratinocytes, and processing of peptide hormones. Cysteine cathepsins have been associated with cancer and tumor progression, cardiovascular disease, autoimmune disease, and other human health conditions. Cathepsin K has a role in bone resorption and has been studied as a drug target for osteoporosis.

Parasites
A number of parasites, including helminths (parasitic worms), use papain-like proteases as mechanisms for invasion of their hosts. Examples include Toxoplasma gondii and Giardia lamblia. In many flatworms, there are very high levels of expression of cysteine cathepsins; in the liver fluke Fasciola hepatica, gene duplications have produced over 20 paralogs of a cathepsin L-like enzyme. Cysteine cathepsins are also part of the normal life cycle of the unicellular parasite Leishmania, where they function as virulence factors. The enzyme and potential drug target cruzipain is important for the life cycle of the parasite Trypanosoma cruzi, which causes Chagas' disease.

Plants
Members of the papain-like protease family play a number of important roles in plant development, including seed germination, leaf senescence, and responding to abiotic stress. Papain-like proteases are involved in regulation of programmed cell death in plants, for example in tapetum during development of pollen. They are also important in plant immunity providing defense against pests and pathogens. The relationship between plant papain-like proteases and pathogen responses - such as cystatin inhibitors - have been described as an evolutionary arms race.

Some PLP family members in plants have culinary and commercial applications. The family's namesake member, papain, is a protease derived from papaya, used as a meat tenderizer. Similar but less widely used plant products include bromelain from pineapple and ficin from figs.

Prokaryotes
Although papain-like proteases are found in all domains of life, they have been less well-studied in prokaryotes than in eukaryotes. Only a few prokaryotic PLP enzymes have been characterized by X-ray crystallography or enzymatic studies, mostly from pathogenic bacteria, including streptopain from Streptococcus pyogenes; xylellain, from the plant pathogen Xylella fastidiosa; Cwp84 from Clostridium difficile; and Lpg2622 from Legionella pneumophila.

Viruses
The papain-like protease family includes a number of protein domains that are found in large polyproteins expressed by RNA viruses. Among the best studied viral PLPs are nidoviral papain-like protease domains from nidoviruses, particularly those from coronaviruses. These PLPs are responsible for several cleavage events that process a large polyprotein into viral nonstructural proteins, although they perform fewer cleavages than the 3C-like protease (also known as the main protease). Coronavirus PLPs are multifunctional enzymes that can also act as deubiquitinases (cleaving the isopeptide bond to ubiquitin) and "deISGylating enzymes" with analogous activity against the ubiquitin-like protein ISG15. In human pathogens including SARS-CoV, MERS-CoV, and SARS-CoV-2, the PLP domain is essential for viral replication and is therefore considered a drug target for the development of antiviral drugs.