Degradomics



Degradomics is a sub-discipline of biology encompassing all the genomic and proteomic approaches devoted to the study of proteases, their inhibitors, and their substrates on a system-wide scale. This includes the analysis of the protease and protease-substrate repertoires, also called "protease degradomes". The scope of these degradomes can range from cell, tissue, and organism-wide scales.

Background
As the second largest class of enzymes behind ubiquitin ligases and responsible for ~2% of any organism's genes, proteases have drawn the attention of biologists to develop a field aimed at identifying and quantifying their roles in biology. First coined in 2000 by the Overall Lab in McQuibban et al., degradomics was described as linking proteases to substrates on a proteome basis. The discoveries of novel roles for proteases and breakthroughs in protease-substrate discovery would be summarized later by Dr. Carlos Lopez-Otin and Dr. Chris Overall, introducing degradomics on a system-wide scale. They collated the current and emerging techniques available to describe proteolysis. By drawing attention to how proteolysis serves as an additional irreversible mechanism by which cells could achieve control over biological processes, they outlined the necessity of studying proteases for their functional relevance in processing bioactive molecules. These bioactive molecules play roles in coagulation, complement activation, DNA replication, cell-cycle control, cellular proliferation and migration, hemostasis, immunity, and apoptosis. The degradome was broken down into two concepts, the first referring the entire profile of proteases expressed under by a cell, tissue, or organism under defined circumstances. The second definition applies specifically to the full substrate repertoire of a certain protease in a cell, tissue, or organism.

Dr. Overall's group would go on to annotate the complete human and mouse protease-inhibitor degradomes in 2003. As the complexity of proteolytic networks was uncovered, more thorough descriptions of the degradome and ways to study it became necessary. In 2007, Dr. Overall updated the field with a review co-authored by Dr. Carl Blobel detailing how advanced methods were revolutionizing protease-substrate discovery. They described the process of linking proteases to their substrates on a step by step process, beginning with biochemical and proteomic discovery, validation using cellular based assays, and progressing to whole organism levels using animal models. More recently, as technology and techniques have advanced, the Overall Lab and others have continued to direct the field using more powerful and quantitative techniques.

Gene Microarrays
Traditionally DNA microarrays use complementary DNA or oligonucleotide probes to analyze messenger RNA (mRNA) from genes of interest. Extracted total RNA serves as a template for complementary DNA (cDNA) that is tagged with fluorescent probes before being allowed to hybridize to the microarray for visualization. For proteases, specific probes for protease genes and their inhibitors have been developed to view expression patterns on the mRNA transcript level. The two platforms currently available for this purpose come from corporate and academic sources. Affymetrix's Hu/Mu ProtIn Microarray uses 516 and 456 probe sets to evaluate human and murine proteases, inhibitors, and interactors respectively. CLIP-CHIP™, developed by the Overall Lab, is a complete protease and inhibitor DNA microarray for all 1561 human and murine proteases, non-proteolytic homologues, and their inhibitors. Both of these tools allow comparison of expression patterns between normal and diseased samples and tissues. Unfortunately, as transcript levels often fail to reflect protein expression levels, gene microarrays are limited in representing protein in samples. In addition, proteases recruited from remote sources like nearby tissues are ignored by these DNA based arrays, reiterating the need for protein based methods to confirm the presence and activity of functional enzymes when transcriptome analysis is performed.

Quantitative real-time polymerase chain reaction (qRT-PCR)
A more sensitive approach to transcript analysis of a gene is quantitative real-time polymerase chain reaction (qRT-PCR), which has also seen action in quantifying protease mRNA levels. Again, total RNA is extracted and used to generate cDNA for PCR amplification. As long as there is a specific primer for a protease, protease inhibitor, or interacting molecule, qRT-PCR can serve as a highly sensitive method to detect minuscule amounts of mRNA copy numbers per cell. A drawback that separates it from microarray analysis is its limited scope: a microarray can handle parallel analysis of multiple genes while qRT-PCR must amplify one mRNA for analysis at a time. It also suffers from the same limitation of microarray analysis regarding the lack of correlation between transcript and protein levels. However, its sensitivity lends it as a useful tool in validating microarray findings and quantifying specific protease transcripts of interest.

RNA sequencing (RNA-seq)
Whole transcriptome shotgun sequencing (WTSS) is the latest in gene expression studies, using next generation sequencing (NGS) to quantify RNA in samples on a high throughput scale. As biology trends toward using RNA-seq over microarray analysis in evaluating the transcriptome, so does degradomics. The field adapts the approach to analyzing the presence and quantity of transcripts of proteases, their substrates, and their inhibitors. While developed microarrays remain a major workhorse in studying gene expression in degradomics, its limitations of cross hybridization and dynamic range issues suggest RNA-seq will take a larger role as costs decrease and analysis improves.

Yeast Two-hybrid Screens
Yeast two-hybrid analyses have been adapted for protease-substrate discovery. As protease exosites play roles in protein-protein recognition and interaction, biologists have used exosites as tools to screen for protease interactors and potential substrates. These protease exosite scanning assays use protease exosites as bait to scan a cDNA library for possible interacting partners.

Another early adaptation of yeast two-hybrid screening in protease-substrate discovery is Inactive-catalytic-domain capture (ICDC). This approach attempts to avoid the limitation of protease exosite scanning, which fails to account for any substrates that do not require exosites to for recognition before cleavage. The bait for these assays are immobilized catalytically inactive mutant protease domains that cannot cleave and release their substrates once bound.

While useful in early degradomic studies, the limitations of adapted yeast two-hybrid screens have forced the field to move on to higher-throughput approaches for protease-substrate discovery. Their high rate of false positives and negatives, inability to recognize complex interactions, lack of biologically compartmentalization, and failure to account for post-translational modifications necessary for protein-protein interactions hamper their usefulness. Thus they have been largely replaced by proteomic methods as technology has improved.

Protease-specific Arrays
A protease specific protein array based on immobilized antibodies designed to capture specific proteases from biological samples offers a step up in analysis of protein levels beyond transcript expression. Capture antibodies spotted to nitrocellulose membranes can bind proteases in complex mixtures which have been pre-incubated and bound by detection antibodies allowing for parallel analysis of relative protease levels. These arrays offer parallelization of protein levels over traditional western blot. Unfortunately, these assays fail to provide insight on enzymatic function for proteases and suffer similar drawbacks to western blots regarding reliable quantification.

Immunohistochemistry Approaches
As an antibody technique, immunohistochemistry (IHC) allows for validating protein presence. It, and immunocytochemistry, allow for surveying the localization of proteases on a tissue or cellular scale respectively. It also can evaluate for the localization of cleavage products using monoclonal antibodies raised against neo-epitopes of cleavage sites produced by protease processing. Unfortunately, in addition to providing little functional information, IHC is also non-quantitative, making it an unappealing option for describing degradomics on system-wide scales.

Gel-based Proteomic Methods
Two-dimensional Polyacrylamide Electrophoresis (2D-PAGE) gels historically compared intensities from protease treated and untreated sample spots in order to identify possible candidate substrates. A more recent improvement of this technique, fluorescent 2D difference gel electrophoresis (2D-DIGE), attempts to control standardization between gels for relative quantification. Differentially labelling protease-treated and untreated samples with either Cy3 or Cy5, pooling said samples, and analyzing them together by 2D-PAGE allows substrate and cleavage products to be studied from the fluorescent gel. The spots corresponding to potentially substrate and cleavage products can be later elucidated using Mass Spectrometry or Edman Sequencing. The biggest drawbacks to using these techniques relate to the chemistry of the technique itself and its lack of sensitivity. As they rely on PAGE gels, extremely large, small, highly hydrophobic, acidic, or basic molecules will not be visualized.

Mass Spectrometry-based Proteomic Methods
Conventional shotgun proteomics identification of low abundance proteins in samples remains limited despite advances in Mass Spectrometry (MS) technology. While abundant proteins can be easily detected, possible protease substrates of biological significance, such as cytokines, can be easily overlooked due to their low abundance. Most pre-clearing strategies designed to correct this also risk losing low abundant proteins, thus techniques designed specifically to target protease substrates for identification have been developed. These techniques have coalesced into a new field of positional proteomics or terminomics aimed at identifying protein N- or C-terminal modifications of protease substrates. Terminomic approaches including Terminal Amine Isotopic Labeling of Substrates (TAILS) N-Terminomics, Combined FRActional Diagonal Chromatography (COFRADIC), and C-Terminomics add the level of stringency to conventional shotgun proteomics necessary to make them workhorse of degradomics.

TAILS, or “N-Terminomics,” was designed and developed by the Overall Lab to overcome the functional limitations of conventional proteomics by enriching both mature N-terminal peptides and newly generated N-terminal peptides of proteins produced by protease activity. Formaldehyde or isobaric tags including Isotope-coded Affinity Tags (ICAT), 4 to 8 plex Isobaric tag for relative and absolute quantification (iTRAQ), or 10plex Tandem mass tags (TMT) block primary amines prior to trypsin digestion of proteome samples. The main step of the process is the negative selection of newly generated trypsin peptides using a specialized polymer. The polymer ignores the unreactive primary amines blocked by their tags, allowing them to be separated from trypsin generated peptides by ultrafiltration for Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) analysis. These mature and neo-N-termini will differ in ratios between the protease treated versus untreated samples and make up the proteolytic fingerprint of a protease. TAILS is also compatible with Stable isotope labeling by amino acids in cell culture (SILAC).

COFRADIC was the earliest technique to capitalize on negative selection to enrich for protein N-termini. Sample proteins are first blocked by reduction and alkylation at their primary amines before endopeptidase treatment. Its negative selection method relies on strong cation exchange chromatography (SCX) to enrich for peptides representing N- and C-termini of proteins based on differences in peptide charge and pH. Additional orthogonal chromatography treatments change the biochemical character of the peptides for further enrichment before final LC-MS/MS analysis. Groups have continued to adapt and improve this technology for protease-substrate discovery.

C-terminomics has always been complicated due to the chemical nature of its targets. Carboxyl groups are less reactive than primary amines, making C-terminomic techniques more complex than established N-terminomic approaches. However, adapted TAILS and COFRADIC workflows have been developed specifically to study the C-termini of proteins. Recently, the Overall Lab tackled another difficulty of C-terminomics, using endopeptidase LysargiNase™ to generate C-termini carrying N-terminal lysine or arginine residues. Previously, C-termini lacked basic residues after endopeptidase digestion and could be missed in LC-MS/MS workflows.

Another approach designed at further elucidating protease activity is Proteomic Identification of protease Cleavage Sites (PICS). Beginning with a peptide library generated from endopeptidase digestion of a proteome, this technique allows for screening and characterizing the prime- and non-prime specificity for proteases. After digestion, primary amines and sulfhydryl are chemically blocked before digesting the sample again with the desired protease. Now, protease generated primary amines that constitute the prime site of cleavage can be biotinylated and isolated due to their reactivity and analyzed by LC-MS/MS. Non-prime sides sequences left behind must be determined using bioinformatics analysis of the extracted N-termini and full length protein sequences. These prime and non-prime sites give a full picture of protease cleavage site specificity.

Activity-based Profiling
To achieve functional degradomics, the enzymatic activity of proteases must be analyzed. Methods have been developed to distinguish the proteolytic activity of different enzymes in biological samples and separate active proteases from their inactive forms, namely zymogen precursors and those proteases bound by inhibitors. Two techniques are activity-based probes (ABPs) and Proteolytic Signature Peptides (PSPs).

ABP molecules serve as probes to irreversibly bind only to active proteases and ignore their zymogen precursors and inhibited proteases. Placing a reactive group and a recognizable tag feature on the same molecule using a linker moiety gives an ABP molecule its structure. The reactive molecule, designed after protease inhibitor mechanisms, lends ABPs their specificity towards targeting active proteases. Once bound, the reactive group acts much like an irreversible inhibitor to the protease. Depending on the nature of the tag moiety, the ABP-protease complex can then be visualized or retrieved from biological samples for further studies of localization and quantification. Limitations including difficult production, specificity, stability, and toxicity hamper ABP development but these probes have proved useful in revealing protease biological activity and remain a promising avenue in degradomic technology.

PSPs do not depend on targeting active proteases with tagged compounds but rather on quantitative proteomics using stable isotope labeled standard peptides. Standard peptides synthesized from amino acids labeled with stable isotope atoms serve as internal standards for serial dilutions of a sample. These allow for later absolute quantification of proteins and post-translational modifications by mass spectrometry. This technique has been adapted to absolute quantification of proteases, deciphering both activity states and total amounts in biological samples. This is thanks to trypsin treatment for mass spectrometry generating peptides specific to inactive zymogen precursors, active proteases, or common to both forms. PSPs is one form of standard peptide for absolute quantification and Standard of the Expressed Protease (STEP) is the other. The major difference between the two is PSP sequences are designed to mimic tryptic peptides that contain sequences spanning the zymogen's pro-domain and the protease's final form, whereas STEP sequences match tryptic peptide sequences found in both forms of the protease. This capitalizes on protease activation, where the zymogen's pro-domain is cleaved off and the final form lacks that domain. Experiments using iTRAQ labeling and LC-MS/MS with STEP and PSP peptide internal standards have successfully quantified total and active protease levels in biological samples. One major drawback for this approach is the inability to account for inhibitor bound enzymes. It is also difficult to ensure standard peptides can be generated for this method for each and every protease for study.

Experiments using iTRAQ labeling and LC-MS/MS with STEP and PSP peptide internal standards have successfully quantified total and active protease levels in biological samples. One major drawback for this approach is the inability to account for inhibitor bound enzymes. It is also difficult to ensure standard peptides can be generated for this method for each and every protease for study. However, PSPs hold large potential for translating degradomics into clinical applications, as once a PSP is established it could aid in quantifying proteolytic signature biomarkers in Single Reaction Monitoring (SRM) and Multiple Reaction Monitoring (MRM) type clinical assays.

Bioinformatics
Owing to the increasing complexity of regulation of cellular processes and the roles proteases play in them, bioinformatics continues to be an invaluable tool for degradomics. Software, databases, and projects developed for this purpose have accompanied the advancement in technology. Software developed in the Overall Lab (CLIPPER) statistically evaluates cleavage site candidates determined by degradomic approaches. One web-based data site, WebPICS, incorporates and integrates cleavage site analysis from PICS experiments into MEROPS, the protease database. Another database, Termini oriented protein Function Inferred Database (TopFIND), serves as a knowledge base to integrate protein termini formed by protease processing with functional interpretations. By combining research literature and other biological databases including UniProt, MEROPS, Ensembl, and TisDB, the database comprehensively renders protein termini modifications accessible to a broad scientific community. Using TopFIND, terminal modifications can be identified and visualized across proteins thanks to all available in silico, in vitro, and in vivo findings. Using TopFINDer and Path FINDer software, research findings can be mathematically modelled into a network of pathways regulated by proteases, further contributing to the “protease web”.

Protease Biology
Thanks to rapid advances in proteomics, genomics, and bioinformatics, protease research has been revolutionized. Degradomics emerged with the concept that proteolysis represents a specific mechanism for achieving cellular control over vital processes beyond control afforded by gene expression and translation and continues to produce the research necessary to understand the complex regulation of biology. Where it was thought extracellular proteases degraded extracellular matrix (ECM), these proteases are now known to target and process a vast array of substrates with diverse roles, redefining protease functions and leading to a shift in interest towards new roles previously unknown to biology. Degradomic studies of human tissue have also contributed to the Human Proteome Project (HPP) of the Human Proteome Organization (HUPO).

Precision Medicine
The protease modulatory web represents opportunities to identify novel biomarkers for disease and targets for drug design. Proteolytic processed N-termini have been proposed as potential biomarkers as disease specific proteolysis has been well studied in pathologies such as inflammation and cancer. Contributions to degradomics have identified numerous characterized and novel protease substrates and continue to lead to speculation of previously unknown protease targets. More recently, proteolytic signatures of cell death have been found using N-terminomic techniques on chemotherapy patient plasma samples. Advancements in SRM and MRM clinical assays also allow for analyzing proteolytic signature biomarkers in patient samples and can be complemented by PSP quantification. Deciphering these networks will aid drug design in understanding which substrates perform useful roles versus harmful ones to determine which should be targeted by drugs.