User:Kinkreet/Protein Science/Protein Folding

Christian Anfinsen's experiment
2-mercaptoethanol (HS-CH2-CHOH, formerly β-mercaptoethanol or βME) can be used to disrupt disulphide bonds. It does with by displacing the sulphurs involved in the bond with itself, and thus 2 2-mercaptoethanol molecules are required to break each disulphide bond. 2-mercaptoethanol (2ME) is now the standard for denaturing proteins, such as in SDS-PAGE.

Christian Anfinsen denatured purified ribonuclease A using 8M urea (a chaotrope, a substance that disrupts or denatures a structure) and 2ME; however, upon dilution of the urea and 2ME, the protein refolds. 100% of the proteins were recovered when refolding under reducing conditions (by removing urea and 2ME and adding reducing agents, which must then be removed after the protein has folded). Only 1% were active if refolding in oxidizing conditions (if the 2ME was left in but urea is removed). This is because urea denatures ribonuclease A, but allows the disulphide bonds to form; we now know there are 8 cysteines forming 4 disulphide bonds in the ribonuclease, and thus there are 105 (7×5×3×1) possibilities, and only one (1/105≃1%) will result in the correct, active conformation. When dithiothreitol (DTT, a reducing agent) is added, the activity is recovered. Because the sample was pure, only ribonuclease A was present and so nothing else could have helped in its folding; Anfinsen hereby shown that protein folding is determined entirely on the protein sequence. He also shown that ribonuclease A is only correctly-folded in reducing conditions.

Furthermore, when the ubiquitous protein disulfide isomerase (PDI), an enzyme in the endoplasmic reticulum (ER) in eukaryotes that catalyzes the formation and breakage of disulphide bonds between cysteine residues, is added, the refolding of ribonuclease A is speeded up. This suggests that disulphide bonds are the rate-limiting steps in protein folding in the case of ribonuclease A.

in vivo, PDI helps incorrectly-folded proteins to unfold by breaking the disulphide bond and allowing it to refold correctly. The active site of PDI contains a disulphide bond, and so acts very much like 2ME. It recognizes misfolded proteins because the disulphide bonds are exposed on the surface whereas most correctly-folded proteins have the disulphide bond buried inside the hydrophobic core.

Determinants of folding
Proteins fold and remains in their conformation using stabilizing interactions such as electrostatic salt-bridges, van der Waals dipole-dipole interactions, hydrogen bonds, hydrophobic interactions and disulphide bonds.

Proteins of thermo-and hyperthermo-philes are generally larger and have more stabilizing interactions, especially disulphide bridges, not because they are more stable than human proteins (at 37°C), but because at the temperature they a found in, more of these stabilizing interactions are required to give the protein the same level of flexibility. These proteins (e.g. Taq and Pfu) are used in research to provide heat-stable enzymes for protocols such as a PCR reaction, which requires heating to 95°C.

Features of folding
Folding is spontaneous, meaning it is thermodynamically more stable for the protein to fold than not to fold; therefore, most proteins are compact, forming secondary and tertiary structures.

Most proteins are quite flexible and 'breaths' even in its folded form, and will induce-fit onto the substrate when it binds. FMDV 3C protease is a good example.

Generally, much of the characteristics of a protein are attributed to apolar, hydrophobic residues that aggregate to form the central core of a globular protein. The inner residues affect structure much more than residues on the surface; this is demonstrated by mutating lysine to alanine on the outer surface of RNase A, which yielded little effect on the folding. Hence, it is generally the case that surface mutations are accommodated without affecting fold, such as in virus capsids. This is because the folding of proteins are hierarchial and chronological (and not random), where the sub-domains fold independently to form larger domains, and these domains aggregate to form the final protein. Because the sub-domains fold independently, changes in one residue will only affect folding locally, and if the mutation is not one that is in the core of the enzyme, or between the interfaces of the domains, then it will be unlikely to affect the overall structure of the protein. Therefore, surface residues are more susceptible to mutations that goes unnoticed and can accumulate.

However, conservative mutations - mutations between chemically similar residues - are tolerated, even in the core of the protein and thus may not affect function. This is because the change in conformation is small and local, and the protein is quite flexible anyways. This has been demonstrated in bacteriophage T4 lysozyme, where Matsumura mutated the isoleucine residue at position 3. He found that the conformational stability of the enzyme depends on the hydrophobicity of the mutated residue; and therefore, conservative mutations were tolerated.

The level of sequence identity is not an accurate indication of folding. Murine Ebp1 and human MAP2 have only 20% amino acid sequence identity, yet they have the same general fold. However, a high identity does not imply structural similarity, and vice versa. G. D. Rose and T Creamer formulated the Paracelsus challenge - "transform the conformation of one globular protein into that of another by changing no more than half the sequence". Dalal took on the challenge and mutated the B1 domain of Streptococcal IgG-binding protein G, which has 4 beta-sheet and one alpha-helix, into Rop, a homodimeric four-helix bundle protein. They did this by retaining residues which has a high propensity to alpha-helices while replacing those with a low propensity for alpha-helices with those with high propensity. This highlights that only a subset of the amino acid sequence is required for structural determination, while other residues contribute little to structure. Homology modeling assumes that sequence identity is proportional to structural similarity; hence any protein structure modelling must be taken with caution.

Sequence-structure relationship
The structure of a protein can be obtained from sequence prediction, structural determination and by folding experiments. However, these are often hard to do and involves a wide range of variables. A more manageable method would be to mutate one or a few residues and observe the effect.

The most common amino acid mutated to is alanine, this is because it is the smallest amino acid which still has a Cβ. Mutating each and every residue in a protein to alanine is known as alanine scanning, and is used to study the importance of the residue to the protein's stability. This has been demonstrated in Bovine pancreatic trypsin inhibitor (BPTI).

Unfolded proteins
Some proteins are unfolded, and may fold only when bound to a partner. These proteins have a larger interacting interface which allows it to more easily find their target proteins; this also means they are more easily degraded. They are more flexible and thus easier to transport as it can fit through many holes.

pKID domain of rat cyclic AMP response element binding protein (CREB0 has no standard secondary structure until it binds to the CREB-binding protein (CBP), at which point it shapes into a kinked alpha-helical conformation.

Protein Folding Determination
Several techniques are used to monitor protein folding - circular dichorism, NMR and FRET.

Circular Dichorism
Electromagnetic (EM) radiation consists of an oscillating magnetic and electric field perpendicular to each other. The magnetic field of one photon can be in a different orientation to another. In a beam of light, there are many photons exhibiting many orientations. We call a specific orientation of light a plane. The plane is defined as the superposition of the two components (magnetic and electric) of an EM wave, which creates an oscillating wave on one plane.

If we superposition two such plane-polarised waves, orthogonal to each other, but at ±90° phase difference, we will create a circularly polarised waves. Depending on which wave the circularly polarised wave is rotating, we assign them names of laevo- or dextro-rotatory. Because the two are asymmetric, they are termed chiral.

The superposition of a R and a L circularly polarised wave of equal amplitudes will result in a linearly polarised wave. So any linearly polarised wave can be thought of as two superpositioned circularly polarised wave of equal amplitude; and any circularly polarised can be modelled as two plane polarised wave, phase shifted ±90°. Elliptically polarised light is when two linearly polarised waves of unequal amplitudes are phase shifted by any angle other than 0°or 180°; or where 2 linearly polarised waves of equal amplitudes that are phase shifted by any angle other than 0°, 90° or 180°; or where two circularly polarised light have unequal amplitudes.

When a polarised wave is absorbed, its amplitude decreases, but it does not change the frequency or phase difference of the wave. When a polarised wave is refracted, its speed of propagation changes, and so phase change is induced; in linearly polarised light, this would lead to a linear polarised light in the perpendicular direction (180), a circularly polarised light (if ±90) or elliptical (any other angle). For circularly polarised light, this would change the plane of the resulting linearly polarised light.

Chiral molecules, including all L-amino acids and secondary structures (α-helices is right-handed, and β-sheets also have a slight right-handed twist), exhibits optical activity, where it has a different extinction coefficients and refractive indices for L and R circularly polarised light.

Passing linearly polarized light through a medium which have different absorbency for left and right polarized light leads to a resultant elliptical polarized light(circular if wholly absorbed). This is hard to measure, making it time-consuming and expensive. Instead, one can use a piezoelectric material and apply a current to it so it vibrates (producing a high pitch noise) and this essentially provides an oscillating source of R- and L- polarised light. Pockel cells or other photoelastic modulators can also be used to alternate between LCP and RCP. You measure the absorbance of each of the circularly polarised light for different angles. You then calculates the difference between the absorbance of the LCP light compared to the RCP to get the CD spectra.

Proteins have different extinction coefficient for left and right circularly polarized light under different conformations, and the profile of absorbances can be monitored to determine the level of folding. When a protein is in the unfolded state, it will have a CD spectra close to that of a random coil, but as it becomes more mature, it will conform more towards the spectra for α-helix and/or β-sheets.

CD is good for studying peptide bond, which absorbs at ～200nm; and also determining alpha helical content. UV CD is used to investigate the secondary structure of proteins. UV/Vis CD is used to investigate charge-transfer transitions. Near-infrared CD is used to investigate geometric and electronic structure by probing metal d→d transitions. It can be used in real-time to monitor conformational changes (due to substrate binding, denaturation, or protein folding).

Disadvantages of CD is it lacks resolution at the amino acid-level, as it only gives rough indications of folding.

NMR
Nuclear magnetic resonance (NMR) gives a much higher resolution, by monitoring the number of exposed amine hydrogen with time, using hydorgen-deuterium exchange. As the protein fold, more residues are buried and thus not-exchangable with deuterium.

Hydrogen-deuterium exchange NMR has been used to determine the folding of α-lactalbumin from an unfolded structure to a molten globule, an intermediate where the secondary structure has formed, but the tertiary structure has not formed yet. Schulman incubated the protein at pD2 (where the molten globule is formed) for 5 minutes, 1 hour, 10 hours and 8 days; during this time exchangable hydrogens are able to be displaced with deuterium. The rate at which this exchange occurs depends on the accessibility of the amine group (of the peptide backbone) - slower exchange rates if the amine is buried and faster if it is exposed. At the end of the incubation time, the proteins are freeze-dried, to process all the samples at the same time. When all the samples are collected, the frozen proteins are dissolved in native buffer to allow it fold to the native form. Then meausred the 15N-1H HSQC spectra. The HSQC detects hydrogens and not deuterium, and so points representing exposed amine hydrogens will be present in the 5 minute time point, but not in the 1 hour time point, because it would have exchanged with deuterium, which does not give a signal. Likewise, less exposed residues will remain even in the 10 hour time point; only the most hidden (core) hydrogens will still produce a signal at the 8 days time point. Provided we have a previously-determined spectra, we can obtain residue-specific information about how exposed each residue is at which time-point, and thus can help to determine the mechanism of folding. In this case, it was found that the helical domain is folded before the beta domain.

FRET
Fluorescence resonance energy transfer (FRET) is a method sensitive to the distance between an acceptor and donor. The donor absorbs photons of a specific wavelength and emits back at a different (and longer) wavelength. The acceptor's absorbance profile overlaps with the donor's emission profile, and so will absorb the emitted photons and emit at an even lower frequency.

The acceptor and donor can be intrinsic, though this is rare; more commonly, fluorescent tags are engineered to different residues on the protein, and allow the protein to form, measuring the time it takes for the photon of the acceptor to be emitted.

Folding Pathways
The secondary structure is formed within miliseconds, the tertiary structure is formed much slower, because water must be expelled in a process called hydrophobic collapse, and this is a slow process. A molten globule describes an intermediate between secondary and tertiary structures, and is under going hydrophobic collapse. The final tertiary structure is formed when the last water molecule is expelled.

There are two views of folding pathways. The classical view is that protein folds via a series of well defined intermediates; this view is not consistent with experimental data. The status quo view is that of the landscape theory. In the landscape theory, the initial and final conformations are set, but it can take many paths from initial to final, and thus there may be many intermediates involved. If the energies of the intermediates can be viewed as landscape of free energies, where less-favourable (a high Gibbs free energy, or high potential) intermediates are peaks, and more favourable intermediates (low Gibbs free energy) are valleys. And if you view the folding mechanism as water which flows from the initial conformation to the final conformation, it will flow from high energy to low energy. It will favour the flow to lower energy, and towards more favourable intermediates, but can also fold via a less favourable intermediate. Sometimes, mis-folded proteins often require chaperons to unfold it, or if too damaged, be degraded.

Folding is hierarchical, where the subdomains fold before the domains fold. This makes sense because as the polypeptide emerges from the ribosome, only that section is able to fold. β-sheet proteins fold relatively slowly because to fold correctly they require hydrogen bonding from distant residues (on the primary sequence scale).

Thermodynamics of Folding
Protein folding is a highly entropically-unfavourable process, because it is changing from a random (high entropy) conformation to a more ordered (low entropy) structure. Although this entropic penalty is offset by the elimination of water from hydrophobic surfaces, the overall entropy change is still a decrease, and thus unfavourable. Therefore, the folding process is driven by a high enthalpy contribution, and the peptide must form enough intramolecular sidechain interactions to offset the erntropic penalty, peptides which do not satisfy this will remain unfolded.

Chaperons
The cell have many proteins and is very crowded, there is also a lot of variation in local environments, and so many proteins can potentially become mis-folded if they are in an unfamiliar environment. Chaperon proteins are present to prevent inappropriate interactions between the incorrect complementary surfaces, and when misfolding occurs, it acts to disrupt it. Most chaperons uses ATP.

Protein disulphide isomerase is a chaperon protein which catalyzes the formation of disulphide bonds. It is active in its reduced form; if it encounters a mislocated disulphide bond, it will form a disulphide bond it one of the cystines using its own cysteine. This frees up the other cystine to bond correctly. PDI then becomes oxidized and releases the protein-cysteine to allow it to finish folding. PDI does not dictate which cysteine residues form bonds with which cysteine, but simply gives the protein more chances to fold correctly if mis-folded.

Other chaperons such as peptidyl prolyl cis-trans isomerase, HSP70, trigger factor, chaperonins, HSP90, nucleoplasmins. HSP70 is a heat shock protein which reverses denaturation and aggregation; trigger factor does not require ATP; chaperonins are large, multi-subunit barrel like chaperons; HSP90 is another heat shock protein found in eukaroytes that facilitate late-stage folding of signalling proteins; nucleoplasmins are decameric, acidic nuclear proteins which assembles nucleosomes.

GroEL/GroES
There are two types of chaperonins: type I are found in bacteria, mitochondria and chloroplasts, and type II is found in archaea and eukaryotes. GroEL/GroES (HSP60/10) is a type I chaperonin which helps to refold partially-folded proteins. GroEL is a protein with 3 domains, apical, intermediate and equatorial.

7 GroEL subunits form a symmetrical ring, two of these rings associate equatorial-to-equatorial to form a cylindrical chamber. 7 GroES subunits can also form a symmetrical unit. The GroES complex can associate with one end of the GroEL chamber (made up of 14 GroEL subunits) through a conserved sequence of GGIVLTGSA, and make the chamber slightly wider at the ring where it binds. (If GroES binds at one end it will not bind at the other due to the existing conformational change) The ring where GroES binds is called the cis-ring, and the other the trans ring.

First, the partially-folded protein binds to at least two hydrophobic patches on the apical domains of GroEL through a rough sequence of PXHHHXPXP (P for Polar; X=anything; H=hydrophobic). 1 ATP bind to the equatorial domain of each subunit of the GroEL ring, after which GroES can bind to the same patches and effectively displaces the partially-folded protein into the hydrophilic chamber. It also increases the diameter of the chamber from 25Å to 33Å, creating a volumn of 175000Å3. After the chamber is closed, newly-bound ATP in the equatorial subunits are hydrolysed, this creates an environment in the hydrophilic chamber which promotes unfolding. This might be done by physically stretching the polypeptide before compacting it; the hydrophilic walls also promotes hydrophobic collapse. The shape of the inside of the chamber might also guide the folding and prevent any structures which do not fit into the chamber.

Meanwhile, on the opposite side of the complex, the other ring can also bind mis-folded proteins. Once bound, ATP associates with the GroEL ring. Binding of another misfolded protein on the other side induces the release of GroES and the native (hopefully correctly-folded) protein on this side. The cycle then repeats. The complex work in a two-stroke mechanism, where only after ATP hydrolysis of the cis-ring will misfolded protein be able to bind to the trans-ring.

The GroEL/GroES mechanism is not efficient, and have ~5% success rate. It uses 7ATP per cycle.

GroEL/GroES works on ~10% (~250) of all cytosolic proteins in E. coli, of these 85 solely depends on GroEL/GroES for unfolding. It can distinguish between mis-folded and correctly but partially-folded proteins because they have different polarities.

Secondary structure prediction
We can often predict the secondary structure of a protein by looking at its primary sequence. However, the prediction must be treated with care because proteins of similar structure can share as little as 20-25% sequence identity.

The Chou-Fasman method  looks at a set of known structures and look at the frequencies at which an amino acid is in an α-helix, a β-sheet or a random coil. The frequencies is converted to probabilities; the probability of one amino acid being in an α-helix, say, is divided by the average of all the probabilities of all the amino acids, to give the propensity (or score) of that amino acid being in an α-helix. Similarly, the propensity for each amino acid for each secondary structure can be calculated using the same method.

A propensity of 1 for α-helices means the residue is neither likely nor unlikely to be in an α-helix. We can now generate a propensity table for each amino acid and each possible (assumed) secondary structure. A set of rules laid out by Chou and Fasman then determines whether that segment of polypeptide sequence should be in which secondary structure.

The rules states that the beginning (and thus the end) of an alpha helix must contain a stretch of 4-6 residues with a high propensity for alpha helices. After this start, the helix is estimated to continue until the average propensity of a tetrapeptide falls below 1. Similarly, the start and end of a beta sheet must contain a stretch of 3-5 residues with a high propensity for beta sheets. After this start, the beta sheet is estimated to continue until the average propensity of a tetrapeptide falls below 1. Reverse turns (loops linking beta sheets together) are predicted if the hydropathy is minimum.

These rules provides a very crude guesstimate of the structure, and is reliable only 50-80% of the time. Multiple sequence alignment using homologous proteins can improve the accuracy of the predictions. Jpred3 is a program which uses PSI-BLAST to search for new proteins based on structural predictions.

Tertiary structure prediction
The Protein Structure Prediction Center holds a biannual competition called CASP (Critical Assessment of Structural Prediction), where different groups uses their structure prediction tools to predict the tertiary structure of a protein. This is then compared with the structure derived from experimental data. One of the winners of CASP is Rosetta.

Extracellular Misfolds
If proteins are misfolded inside the cell, then chaperon proteins can be used to unfold it and promote refolding to the correct conformation. However, this mechanism is not available for mis-folded proteins outside the cell.

These misfolded proteins, if not degraded, can aggregate into fibrous, insoluble deposits known as amyloids (a misnomer). Depending on where these amyloids are located, they can cause neurodegenerative diseases such as Alzheimers, Parkinons, and transmissible Spongiform Encephalopathies. It can also affect heart, liver, kidneys and the pancreas.

The amyloid can be made up of a range of normally-soluble proteins, such as transthyretin, lysozyme, fibrinogens etc. These forms fibrils ~10nm in diameter. Its structure contains many beta-strands within the fibril which runs perpendicular to the axis of the fibril. The beta sheets are stabilized by stacking.

Islet Amyloid Polypeptide (IAPP) is a 37 residue peptide secreted by β islet cells of the pancreas. It can form aggregates which can be found in 95% of all type II diabetes patients. Lysozyme is a 130 residue polypeptide which serves a native role in innate immunity; however, two mutants identified (D67H and I56T) disturb the β-sheet domains and make the protein more flexible, lowering the melting temperature (TM) by ~10°. The disorganized lysozyme can aggregate together in order to form more stabilizing β-sheets; this aggregation causes deposits.

These, and similar deposits, if found in the nerve tissues, can tangle neurons together, and cause the death of nearby neurons. In the brain, ~40-42 amyloid-β precursor protein (AβPP) aggregates to form plaques which causes Alzheimer's. Some individuals have a mutation (ApoE4) in their ApoE allele (apolipoprotein cholesterol transporter) which promotes aggregation, and so people with the ApoE4 mutation are more likely to have early-onset Alzheimer's.

Prion disease is a disease responsible for Scrapie in sheep, Bovine Spongiform Encephalopathy (BSE) in pig, Creutzfeld Jakob Disease (CJD), Gerstman-Sträussler Scheinker syndrome (GSS), and Fatal Familial Insomnia (FFI). The cause of Prion disease is thought to be a Prion protein (PrP). PrP is thought to be a 280 residue, membrane-bound protein with no known functions. It its normal form, it is a stable cellular PrPC, but in its misfolded for, it forms amyloid fibrils.

The misfolded PrPSc form is self-propagating and induces the conversion of PrPC to PrPSc; because of this, the PrP protein can enter into a healthy organism and causes the PrPC to change to PrPSc. Thus, Prion disease is infectious. The PrPSc have 45% beta-sheet composition (as opposed to just 3% in the normal form), which means it can more easily form the amyloid fibrils. Prion proteins infection between different species are often slower than within species, as different strains of PrP have different pathologies and structures.