User:Kinkreet/Protein Science/Protein Engineering

Protein can be engineered to create a protein with new functions, to test for functionality of individual residues of an existing protein, to improve on the activity of an enzyme, alter the ligand specificity, and also improve stability.

For protein to be engineered successfully, it must be able to be produced with high purity and in high numbers (for testing, structural determination etc.). It also needs to have a way of inducing mutations, either specifically, or systemically.

Cloning and Expression
Different expression systems exist to express inserts of different length, however, the principles and mechanism of gene expression of these inserts are the same across species and expression systems.

The vector must have a selection marker, such as antibiotic resistance gene, which indicates that vector has successfully entered into the replicative host cell (such as Escherichia coli). It also requires an origin of replication.

The gene must be flanked by elements that ensure its correct transcription, eventual translation and termination. The 5' end (upstream) begins with a transcriptional promoter, which is bound by RNAP and drives transcription; as one moves towards the 3' end, one might find an operator, which is a sequence of DNA that is capable of being bound by negative regulators (repressors), and acts to regulate expression. Downstream of the operator is the ribosome binding site (RBS), a sequence on the transcript that is bound by the ribosome when initiating protein translation. A few base pairs downstream of the RBS is the AUG start start codon, which marks the start of the gene. Downstream of the gene is an optional stop codon that stops translation, without the stop codon, the ribosome will simply translate until the transcript runs out. Downstream of the stop codon is the transcriptional terminator.

The promoters used must be inducible and promotes transcription rigorously. Common strong E. coli promoters include lac (lactose operon), tac (combination of trytophan -35 and alc UV5 -10 promoter), trc' (combination of tryp operon -35 and lac operon -10), ara (arabinose operon, araBAD) and rha (rhamnose operon). Common phage promoters include T7, T3, SP6, T5 and PL.

Note that the lactose operon is not activated by lactose, because lactose break down too quickly; instead, IPTG (isopropyl β-D-1-thiogalactopyranoside), a lactose analogue, is used instead.

Expression is usually most effective during the exponential phase. Therefore, the expression cell is grown in culture, and when the optical density reaches 0.6-0.8, expression is induced.

A new form of cloning which does not require restriction enzyme or ligation, is one which uses recombination. The gene of interest have two different ends added onto it using PCR primers with an extension. After this is amplified, it is added to the vector. The vector also have the same sequence at its end, and by recombination, the gene is inserted into the vector.

Common E. coli strains
When expressing a large amount of foreign proteins, cytotoxicity can occur. The T7 promoter requires the T7 polymerase to be present to allow for expression. T7 polymerase is not present in E. coli, and therefore, the gene can be cloned in E. coli without it being expressed, and then the plasmid can be isolated and expressed elsewhere. This ensures that the E. coli remains viable and not be killed by the high level of foreign proteins. However, special E. coli strains (DE3) have been generated to carry a chromosomal copy of the T7 polymerase on a λ lysogen which allows it to express genes under the control of the T7 promoter. BL21-AI have the T7 polymerase gene under the control of the inducable araBAD promoter, which allows for tight regulation.

Many strains for expression are available, the most common being the BL21 E. coli strain. BL21 is deficient in proteases such as Lon and ompT, which reduces the proteolysis of the expression products. STAR strains are deficient in RNase E, this improves yield as it improves the stability of the mRNA transcripts.

CodonPlus-RIL strains enhances the expression of eukaryotic proteins in E. coli, as it is able to use less common codons such as AFF, AGA, AUA, CUA etc. This is especially good for expressing foreign genes.

trxB strains are deficient in trxB. This strain allows the formation of disulphide bonds even in the reducing environment of the cytoplasm, and aids the protein to fold naturally.

Different DNA polymerases are available, some have 5' → 3' exonuclease activity, some have 3' → 5' exonuclease activity, while some have none. They also have different processivity and error rates.

The resulting 3' end of the transcript also differ; Taq polymerase produce a A 3' end, which means it can be subcloned into the vector (containing a T 5' end) without processing. A topoisomerase can recognize a specific sequence of DNA, and cleaves it to produce a 5' end with a T overhang. The topoisomerase remains bound to the overhang to prevent reformation. The plasmid of interest can be amplified using PCR using the Taq polymerase, afterwhich the product is introduced. Only in the presence of the PCR product will the topoisomerase dissociate. This is known as TOPO TA cloning.

Traditional cloning uses restriction enzymes on the vector and on the PCR product to allow for ligation using T4 DNA ligase.

Synthetic gene
If the sequence is short, having the DNA synthesized and cloned into a vector might be the best option.

Site-directed mutagensis
Primers can be used to induce site-directed mutagenesis. A primer containing a mutation is added. It anneals to the denatured plasmid DNA and DNA polymerase will begin replicating. The newly synthesized strand will contain the mutation whereas the template strand will not. Another difference is that the template strand is methylated, whereas the newly-synthesized strand is not, and so Dpn I can be used to digest methylated parental (template) strands, leaving only plasmids which contains the mutation. The mutated plasmid can then be amplified using PCR with the same primers and transformed into E. coli to be expressed.

Overlap extension PCR
This approach works well with a plasmid or if the mutation is intended to be at the end of a linear DNA. However, when the mutation needs to be in the middle of a gene, two different PCR reactions is required. The first reaction uses primer A to induce a mutation at the mutation site; DNA polymerase extends from this primer towards one end of the DNA until it runs off or dissociates. The second reaction also is exactly the same, only that the primer now points in the opposite direction, and so will extend to give 'the other side' of the DNA. The two PCR products from the two reactions are purified, denatured, mixed and allowed to anneal to each other. After this, DNA polymerase is added to extend from the annealed DNA containing the mutation. The mutation is now incorporated into the middle of the DNA.

The primer must be designed and may be made synthetically, using the synthetic gene route. They are typically 25-45 nucleotides in length, with the mutation located somewhere in the middle. The residues should contain few repeats, especially inverted repeats, to minimize the chances of self-annealing. As G≡C interaction is stronger than an A=T interaction, having the a G at the 3' end of the primer ensures a strong annealing at the end that is to be extended; this increase the efficiency of replication. It should also have a high melting temperature (usually at least 78°C)

Systematic mutagensis
Systematic mutagenesis do not make specific mutations at one loci, instead, it mutates every individual residue, or combinations of residues, in a systematic manner.

Alanine scanning mutagenesis
In alanine scanning mutagenesis, every residue is mutated to alanine (in the case of alanine, mutated to leucine). The enzymatic activity and stability is then tested using a functional assay comparing different mutations and at different temperatures. It might also alter the flexibility of some parts of the protein, which if it is an enzyme, means the enzyme is less able to perform induced-fit catalysis.

Random Mutagenesis
Random mutagenesis can be useful if not a lot of information is available about the protein, instead of specifically mutating each residue, which takes a long time, the same protein is randomly mutated in different ways to create a library of mutated proteins. These libraries can be assayed to pick out proteins which shows a different property, especially enzymatic activity and stability. This mutant can be mutated again to find further mutants which have higher activity and better stability. After a satisfactory protein is obtained, the gene or peptide sequence can be analysed to identify the mutation events that have taken place.

Chemical
Random mutagenesis can be achieved by treating the plasmid with chemicals that induces damage, such as sodium bisulfite, nitrous acid, hydrazine, dimethyl sulphate and ethyl methanesulfonate (EMS). The damaged gene can then be amplified using PCR.

Ethyl methanesulfonate (EMS) alkylate guanine residues, which leads to the DNA replication mechanism often incorporating thymine instead of cytosine into the sequence. Because EMS covalently alters DNA, it can be applied in vitro as well as in vivo to induce the mutation. The mutations are often limited to point mutations of guanine residues.

Nitrous acid oxidatively deaminates adenine and of cytosine residues, effectively causing G/C→A/T mutations.

UV/X-rays can be apploed to the DNA, which intercalates DNA and causes damage.

PCR-based approaches
Errors can also be induced into a PCR reaction. For example, nucleotide analogues such as 8-oxo-dGTP can be used for one round of the PCR reaction. After its incorporation, the typical nucleotides are used in subsequent rounds. In the first round, 8-oxo-dGTP binds to adenine, in further rounds, the incorrect nucleotide will be incorporated and this induces A → C and T → G mutations. Sloppy or error-prone PCR deliberately increases the rate of mutations by the PCR reaction by altering its parameters. Increasing the level of Mn2+ (in the form of MgCl2) or inducing an imbalance in nucleotides will bias a higher rate of mutation (0.6 - 2.0%). A more error-prone polymerase can also be used.

The cloned genes are then incorporated into a vector to be expressed; this is a large limitation as the efficiency of incorporation limits the size of the effective library. Rolling circle error-prone PCR induces mutations only after incorporation, and eliminate the limitations from lack of efficiency in inserting into the vector. However, this also makes replication less efficient and the plasmid may get mutation at the origin or replication or important sites, such as transcription start sites, AUG start codons etc.

After expression, a high throughput screen can be used to pick out the proteins which give an improved property. Another limitation is that although point mutations are the most likely, deletions and frame-shift of the sequences are also possible, and so a single deletion may give rise to a completely different protein.

Mutator strain
The wild-type gene can be cloned into a plasmid and transformed into a mutator strain such as XL1-red. In these strains, the mechanisms for primary DNA repair is mutated (mutS [mutated so it is error-prone in mismatch repair], mutD [mutated to be deficient in 3' → 5' exonuclease of DNAP III] and mutT [mutated so it is unable to hydrolyse 8-oxo-dGTP]), this means more errors are incorporated into the plasmid during replication, some of which will be on the insert. Because the error induction process is based on errors in DNA replication, a variety of mutations are possible - instead of just point mutations, there will also be many substitutions, deletions and frame-shifts. The rate of mutation is ~5000 fold that of the wild-type. However, because the plasmid is mutated along with the insert, after a few rounds, the plasmid itself may become non-viable.

The same principle can also be used where mutation-inducing alleles, such as mutD5 (the dominant negative of mutD), can be induced and overexpressed. During its induction, mutD5 limits the cell's ability for repairing DNA, and so give a higher rate of mutagenesis; when induction is removed, the rate is returned to wild-type. This allows for mutagenesis to be induced at a certain time; the non-induced times allows time for the cell to recover from the mutagenetic events, and so the cell is viable for longer.

DNA Shuffling
DNA shuffling digests members of a gene library (the same gene, each with different mutations) using a DNase I, and then rejoining them using self-priming PCR. This way, mutations of each member of the library is mixed and amplified. The recombination event does not cause much mutations, it is the shuffling of the genes which is the major cause of mutations. More errors can be introduced using other methods, such as error-prone PCR.

Other
Insertion mutagenesis is where a sequence of nucleotide sequences are inserted randomly using a transposon-based system. If the sequence contains nucleotide numbers which is a multiple of 3, then no frame-shift is induced.