DNA shuffling

DNA shuffling, also known as molecular breeding, is an in vitro random recombination method to generate mutant genes for directed evolution and to enable a rapid increase in DNA library size. Three procedures for accomplishing DNA shuffling are molecular breeding which relies on homologous recombination or the similarity of the DNA sequences, restriction enzymes which rely on common restriction sites, and nonhomologous random recombination which requires the use of hairpins. In all of these techniques, the parent genes are fragmented and then recombined.

DNA shuffling utilizes random recombination as opposed to site-directed mutagenesis in order to generate proteins with unique attributes or combinations of desirable characteristics encoded in the parent genes such as thermostability and high activity. The potential for DNA shuffling to produce novel proteins is exemplified by the figure shown on the right which demonstrates the difference between point mutations, insertions and deletions, and DNA shuffling. Specifically, this figure shows the use of DNA shuffling on two parent genes which enables the generation of recombinant proteins that have a random combination of sequences from each parent gene. This is distinct from point mutations in which one nucleotide has been changed, inserted, or deleted and insertions or deletions where a sequence of nucleotides has been added or removed, respectively. As a result of the random recombination, DNA shuffling is able to produce proteins with new qualities or multiple advantageous features derived from the parent genes.

In 1994, Willem P.C. Stemmer published the first paper on DNA shuffling. Since the introduction of the technique, DNA shuffling has been applied to protein and small molecule pharmaceuticals, bioremediation, vaccines, gene therapy, and evolved viruses. Other techniques which yield similar results to DNA shuffling include random chimeragenesis on transient templates (RACHITT), random printing in vitro recombination (RPR), and the staggered extension process (StEP).

History
DNA shuffling by molecular breeding was first reported in 1994 by Willem P.C. Stemmer. He started by fragmenting the β-lactamase gene that had been amplified with the polymerase chain reaction (PCR) by using DNase I, which randomly cleaves DNA. He then completed a modified PCR reaction where primers were not employed which resulted in the annealing of homologous fragments or fragments with similar sequences. Finally, these fragments were amplified by PCR. Stemmer reported that the use of DNA shuffling in combination with backcrossing resulted in the elimination of non-essential mutations and an increase in the production of the antibiotic cefotaxime. He also emphasized the potential for molecular evolution with DNA shuffling. Specifically, he indicated the technique could be used to modify proteins.

DNA shuffling has since been applied to generate libraries of hybrid or chimeric genes and has inspired family shuffling which is defined as the use of related genes in DNA shuffling. Additionally, DNA shuffling has been applied to protein and small molecule pharmaceuticals, bioremediation, gene therapy, vaccines, and evolved viruses.

Molecular breeding
First, DNase I is used to fragment a set of parent genes into segments of double stranded DNA ranging from 10-50 bp to more than 1 kbp. This is followed by a PCR without primers. In the PCR, DNA fragments with sufficiently overlapping sequences will anneal to each other and then be extended by DNA polymerase. The PCR extension will not occur unless there are DNA sequences of high similarity. The important factors influencing the sequences synthesized in DNA shuffling are the DNA polymerase, salt concentrations, and annealing temperature. For example, the use of Taq polymerase for amplification of a 1 kbp fragment in a PCR of 20 cycles results in 33% to 98% of the products containing one or more mutations.

Multiple cycles of PCR extension can be used to amplify the fragments. The addition of primers that are designed to be complementary to the ends of the extended fragments are added to further amplify the sequences with another PCR. Primers may be chosen to have additional sequences added on to their 5’ ends, such as sequences for restriction enzyme recognition sites which are needed for ligation into a cloning vector.

It is possible to recombine portions of the parent genes to generate hybrids or chimeric forms with unique properties, hence the term DNA shuffling. The disadvantage of molecular breeding is the requirement for the similarity between the sequences, which has inspired the development of other procedures for DNA shuffling.

Restriction enzymes
Restriction enzymes are employed to fragment the parent genes. The fragments are then joined together through ligation which can be accomplished with DNA ligase. For example, if two parent genes have three restriction sites fourteen different full-length gene hybrids can be created. The number of unique full-length hybrids is determined by the fact that a gene with three restriction sites can be broken up into four fragments. Thus, there are two options for each of the four positions minus the combinations that would recreate the two parent genes yielding 24 - 2 = 14 different full-length hybrid genes.

The main difference between DNA shuffling with restriction enzymes and molecular breeding is molecular breeding relies on the homology of the sequences for the annealing of the strands and PCR for extension whereas by using restriction enzymes, fragment ends that can be ligated are created. The main advantages of using restriction enzymes include control over the number of recombination events and lack of PCR amplification requirement. The main disadvantage is the requirement of common restriction enzyme sites.

Nonhomologous random recombination
In order to generate segments ranging from 10-50 bp to more than 1 kb, DNase I is utilized. The ends of the fragments are made blunt by adding T4 DNA polymerase. Blunting the fragments is important for combining the fragments as incompatible sticky-ends, or overhangs, prevent end joining. Hairpins with a specific restriction site are then added to the mixture of fragments. Next, T4 DNA ligase is employed to ligate the fragments to form extended sequences. The ligation of the hairpins to the fragments limits the length of the extended sequences by preventing the addition of more fragments. Finally, in order to remove the hairpin loops, a restriction enzyme is utilized.

Nonhomologous random recombination differs from molecular breeding as homology of the ligated sequences is not necessary which is an advantage. However, because this process recombines the fragments randomly it is probable that a large fraction of the recombined DNA sequences will not have the desired characteristics which is a disadvantage. Nonhomologous random recombination also differs from the use of restriction enzymes for DNA shuffling as common restriction enzyme sites on the parent genes are not required and the use of hairpins is necessary which demonstrates an advantage and disadvantage of nonhomologous random recombination over the use of restriction enzymes, respectively.

Protein and small molecule pharmaceuticals
Since DNA shuffling enables the recombination of genes, protein activities can be enhanced. For example, DNA shuffling has been used to increase the potency of phage-displayed recombinant interferons on murine and human cells. Additionally, the improvement of green fluorescent protein (GFP) was accomplished with DNA shuffling by molecular breeding as a 45-fold greater signal than the standard for whole cell fluorescence was obtained. Furthermore, the synthesis of diverse genes can also result in the production of proteins with novel attributes. Therefore, DNA shuffling has been used to develop proteins to detoxify chemicals. For example, the homologous recombination method of DNA shuffling by molecular breeding has been utilized to enhance the detoxification of atrazine and arsenate.

Bioremediation
DNA shuffling has also been used to improve the degradation of biological pollutants. Specifically, a recombinant E. coli strain has been created with the use of DNA shuffling by molecular breeding for the bioremediation of trichloroethylene (TCE), a potential carcinogen, which is less susceptible to toxic epoxide intermediates.

Vaccines
The ability to select desirable recombinants with DNA shuffling has been used in combination with screening strategies to enhance vaccine candidates against infections with an emphasis on improving immunogenicity, vaccine production, stability, and cross-reactivity to multiple strains of pathogens. Some vaccine candidates for Plasmodium falciparum, dengue virus, encephalitic alphaviruses (including: VEEV, WEEV, and EEEV), human immunodeficiency virus-1 (HIV-1), and hepatitis B virus (HBV) have been investigated.

Gene therapy and evolved viruses
The requirements for human gene therapies include high purity, high-titer, and stability. DNA shuffling allows for the fabrication of retroviral vectors with these attributes. For example, DNA shuffling with molecular breeding was applied to six ecotropic murine leukemia virus (MLV) strains which resulted in the compilation of an extensive library of recombinant retrovirus and the identification of multiple clones with increased stability. Furthermore, the application of DNA shuffling by molecular breeding on multiple parent adeno-associated virus (AAV) vectors was employed to generate a library of ten million chimeras. The advantageous attributes obtained include increased resistance to human intravenous immunoglobulin (IVIG) and the production of cell tropism in the novel viruses.

Comparison to other techniques
While DNA shuffling has become a useful technique for random recombination, other methods including RACHITT, RPR, and StEP have also been developed for this purpose. Below are some advantages and disadvantages of these other methods for recombination.

RACHITT
In RACHITT, fragments of single stranded (ss) parent genes are annealed onto a ss template resulting in decreased mismatching which is an advantage. Additionally, RACHIIT enables genes with low sequence similarity to be recombined. However, a major disadvantage is the preparation of the ss fragments of the parent genes and ss template.

RPR
RPR makes use of random primers. These random primers are annealed to template DNA and are then extended by the Klenow fragment. Next, the templates are removed and the fragments are assembled by homology in a process similar to PCR. Some major benefits include the smaller requirement for parent genes due to the use of ss templates and increased sequence diversity by mispriming and misincorporation. One disadvantage of RPR is the preparation of the template.

StEP
In StEP, brief cycles of primer annealing to a template and extension by polymerase are employed to generate full-length sequences. The main advantages of StEP are the simplicity of the method and the lack of fragment purification. The disadvantages of StEP include that it is time consuming and requires sequence homology.